mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-04-27 15:16:38 +00:00

Files

T

Oleksandr Bezdieniezhnykh 9eba1689b3 - Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.

- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.

2026-04-26 14:28:10 +03:00

16 KiB

Raw Blame History

Reasoning Chain — Mode B (Solution Assessment of `solution_draft01.md`)

For each Mode B finding (M-1..M-15 in 02_fact_cards.md), trace the fact → comparison → conclusion path and pin the conclusion's confidence. Conclusions feed solution_draft02.md.

M-1 — ODOMETRY vs GPS_INPUT (Component 6)

Fact. ArduPilot dev docs (S41) say "ODOMETRY (the preferred method)" for sending external-nav to EKF3. ODOMETRY: quaternion + velocity NED + 21-element pos+att covariance + quality 0..100. GPS_INPUT: lat/lon/alt + 3-D velocity + scalar h_acc/v_acc + fix_type. Both supported; both targetable from pymavlink.

Reference comparison. AC-4.3 originally states "Replacement for GPS module … via MAVLink GPS_INPUT, GPS1_TYPE=14". That's GPS-substitute framing, which suggests GPS_INPUT is the right channel. But AC-NEW-4 (false-position safety budget P[err>500m]<0.1%) requires the FC to act on calibrated covariance — and GPS_INPUT collapses our 6-DoF covariance into one scalar, which is information loss.

Conclusion. Hybrid output. Keep GPS_INPUT as the primary "GPS-substitute" channel (matches AC-4.3 framing, plays cleanly with FC operator workflows that expect a GPS_RAW_INT-shaped status). Also emit ODOMETRY when the EKF emits a fix with a full 6-DoF covariance and a non-trivial yaw observability — let the FC's EKF3 fuse the richer signal. Configure FC source priorities so GPS_INPUT is the failover in case ODOMETRY trips a parameter gate (VISO_QUAL_MIN). This is a strict superset of the draft's choice; the only cost is the extra MAVLink emit and the source-switching SITL test scope (M-11).

Confidence. ✅ High. Two L1 sources (S41 dev docs + S42 PR #19563), one L1 confirming the failure path is real (S43 PR #30080).

M-2 — MASt3R off the primary matcher list

Fact. mast3r-runtime Jetson support = "Planned" (S57). Speedy MASt3R = 91 ms / pair on A40 GPU.

Reference comparison. A40 ≈ 38 TFLOPS FP16 (admin-class GPU); Jetson Orin Nano Super 25 W ≈ 1.7 TFLOPS FP16 (~67 TOPS sparse INT8). Throughput ratio ~22× to 30× depending on operator-mix. 91 ms × 22 ≈ 2 s/pair; × 30 ≈ 2.7 s/pair. Even with INT8 quantisation closing the gap by ~2× (typical for ViT-class), MASt3R lands at >1 s/pair — outside the 400 ms p95 budget by a factor of ≥2.5×.

Conclusion. MASt3R drops from the "stretch candidate" row in the draft's bench-off table to a research-track-only label. Bench-off resources should focus on SP+LG / XFeat / GIM-LightGlue / RoMa-distilled.

Confidence. ✅ High. Numbers are conservative — MASt3R has additional overhead from the depth backbone that doesn't exist in pure 2D matchers.

M-3 — Add GIM-LightGlue to the bench-off

Fact. GIM (S48): self-trained generalist matcher, 8.4–18.1 % zero-shot improvement over LightGlue/RoMa/DKM/LoFTR baselines. Pre-trained checkpoints public.

Reference comparison. Our domain (eastern-Ukraine 1 km AGL nadir vs. service satellite tiles) has zero training data publicly available; the bench-off therefore tests zero-shot transfer. GIM's training paradigm (50 h of internet videos covering every kind of scene including aerial) is precisely the regime that maximises zero-shot transfer.

Conclusion. Add GIM-LightGlue to the matcher bench-off shortlist as a peer of vanilla SP+LG. If the published 8–18 % zero-shot gain holds on AerialVL + Mavic, GIM-LightGlue dominates the cost/quality frontier (same TRT path as SP+LG, better accuracy out of the box).

Confidence. ✅ High. ICLR 2024 spotlight; benchmark numbers reproduced by independent users in the GitHub issue tracker.

M-4 — VPR shortlist expansion: + SALAD + BoQ

Fact. SALAD (S47, CVPR 2024): DINOv2 + Sinkhorn optimal-transport VLAD; R@1 = 75 % on MSLS Challenge / 92.2 % MSLS Val / 76 % NordLand; in aero-vloc. BoQ (S46, CVPR 2024): bag of learnable queries, beats NetVLAD/MixVPR/EigenPlaces/Patch-NetVLAD/TransVPR/R2Former on 14 benchmarks; DinoV2 results Nov 2024.

Reference comparison. AnyLoc (draft primary) is unsupervised VLAD over DINOv2 features; SALAD is trained DINOv2-VLAD via Sinkhorn; BoQ is learnable queries over a backbone (DINOv2 or ViT). SALAD strictly beats AnyLoc on the same backbone in published benchmarks. BoQ beats both on standard VPR benchmarks; aerial-specific numbers TBD but well-positioned.

Conclusion. The bench-off table grows from {AnyLoc, MixVPR} to {AnyLoc, SALAD, BoQ, MixVPR}. AnyLoc remains the training-free fallback; SALAD and BoQ are likely primaries.

Confidence. ✅ High on M-4 (sources are CVPR 2024 papers + GitHub repos with published weights). Aerial-domain ranking is empirical — the bench-off resolves it.

M-5 — Latency budget has more headroom than the draft assumed

Fact. Jetson AI Lab (S40): DINOv2-base-patch14 = 126 inf/s on Orin Nano Super → ~8 ms/inf at 224×224, FP16 trtexec.

Reference comparison. Draft estimated 50–80 ms / 224×224 for DINOv2 ViT-B (Component 2 row 1). Real number is ~6–10× better. At 448×448 (more typical for AnyLoc descriptor extraction), expect ~32 ms/inf via near-quadratic scaling.

Conclusion. AC-4.1 (400 ms p95) is comfortably feasible with budget left over for SP+LG / GIM-LightGlue (target ~100 ms/pair) + EKF + MAVLink emit. R2 in the draft's risk table downgraded from High to Medium — empirical confirmation needed but no longer a make-or-break risk.

Confidence. ✅ High. NVIDIA L1 source.

M-6 — mavlink-router CVE-class issue

Fact. S45: stack-based buffer overflow in mavlink-router config parsing, fuzzing-discovered, public, no SECURITY.md.

Reference comparison. mavlink-router is C++ daemon running with the same privileges as our companion process; if the config file is attacker-controlled (e.g., a tampered SD card on the airframe), this becomes RCE on the companion. Even if the config file is operator-controlled, a buggy config-file parser is one bug away from another related issue.

Conclusion. Three options, choose one:

Pin a specific patched version + sandboxed systemd unit (NoNewPrivileges, ReadOnlyPaths=/etc/mavlink-router/, MemoryDenyWriteExecute, RestrictAddressFamilies=AF_UNIX AF_INET).
Replace with an in-process MAVLink endpoint multiplexer (Python or Go, ~150 LOC) — eliminates the dependency entirely.
Distinct system-IDs for MAVSDK + pymavlink sharing the same serial port via ArduPilot's native MAVLink routing, no router daemon at all.

Option 3 is the simplest. Option 2 gives us the most control. Option 1 is the lowest-effort quick fix. Recommend Option 3 for v1, with Option 2 as v1.1 if MAVLink message volume saturates a single endpoint.

Confidence. ✅ High that the issue is real; choice of mitigation is implementation preference.

M-7 — MAVLink2 signing is v1-mandatory

Fact. S44: signing supported in ArduPilot 4.5+ on telemetry links; USB bypasses; keys in FRAM.

Reference comparison. Without signing, anyone with serial-line access (companion side OR an exposed telemetry radio) can inject a GPS_INPUT (or ODOMETRY) frame and crash the vehicle. Signing makes that injection require possession of the FRAM key. The cost is one operator key-provisioning step per airframe.

Conclusion. Promote signing from "Security note (deferred to a Phase-4 security pass)" to a v1 hard configuration item. Document the key-provisioning procedure in the deploy runbook. Verify signing-on at boot and refuse to inject GPS_INPUT/ODOMETRY if the signed-frame ack from the FC indicates signing-off.

Confidence. ✅ High.

M-8 — MBTiles operational recipe

Fact. S54: WAL + connection pool + transaction batching is the established recipe for MBTiles SQLite under concurrent reader+writer load. Default rollback journal mode causes database is locked failures.

Reference comparison. Our workload: many concurrent readers (matcher cache lookup at ≤3 fps × ~30 candidate tiles) + occasional writer (Component 1b ortho-tile write at ≤1–2 Hz × ~30 tiles). Without WAL, every writer commit blocks all readers. With WAL, readers and one writer proceed concurrently.

Conclusion. Update Component 1's "Tile format" row in the architecture table to specify: MBTiles SQLite + WAL + connection pool + per-Component-1b-cycle transaction batching. Add to AC-4.1 latency-budget validation: the tile-cache lookup must hit p95 ≤5 ms.

Confidence. ✅ High.

M-9 — Cache-poisoning safety hazard

Fact (analytical, not a single source). Draft's dedup rule allows onboard tiles to overwrite stale service tiles when "our quality > existing". Quality = inlier count + sharpness; does not include parent-pose covariance as a hard gate. Combined with EKF over-confidence (a known failure mode — see W4.b), this lets a confidently-bad pose write a misaligned tile that becomes the next flight's anchor.

Reference comparison. Cartography literature consistently treats authoritative basemap as immutable and crowdsourced/UAV updates as voting input that requires consensus before promotion. SfM bundle-adjustment treats over-confident poses as the dominant error source.

Conclusion. Three layered mitigations:

Service-source tiles are immutable within freshness budget. Onboard tiles overwrite only stale or other-onboard tiles.
The Suite Service ingest applies a voting layer: an onboard tile gets promoted to "trusted basemap" only after N≥2 independent flights confirm consistent geo-alignment within X m.
Parent-pose covariance is a hard gate in the local quality score: σ_xy must be tighter than the generation-eligibility gate (e.g., σ_xy ≤ 5 m vs. 10 m generation gate), and a tile written above the hard gate is marked "soft" in its sidecar.

Add AC-NEW-7 — Cache-poisoning safety budget: P(onboard tile mis-aligned > 30 m) per flight < 1 %; P(misaligned > 100 m) per flight < 0.1 %. Validation: replay AerialVL with synthetic over-confidence injection.

Confidence. ⚠️ Medium. Hazard is real and qualitatively well-known; specific numeric thresholds need empirical calibration during implementation.

M-10 — Free-threaded Python 3.13 not v1-ready

Fact. S55: experimental, single-threaded perf hit, GIL re-enables on non-FT-aware C extension import.

Reference comparison. Our hot-path includes: numba JIT kernels, TensorRT Python bindings, pymavlink (C extension), numpy/scipy, possibly cv2. Any one of these silently re-enabling the GIL nullifies the benefit. And the non-trivial single-threaded penalty (~10–15 % per various benchmarks) directly hits AC-NEW-1 (cold-start TTFF <30 s).

Conclusion. v1 stays on standard CPython 3.11 or 3.12 (newest stable, well-supported by JetPack / numba / TRT). Sharpen the rationale in the architecture: the choice is not "GIL is fine" but "asyncio + TRT subprocess workers + numba JIT is the production-ready combination today; revisit free-threading in v1.1."

Confidence. ✅ High.

M-11 — ODOMETRY known production gotchas → SITL coverage required

Fact. S41/S42/S43: companion-derived velocity errors, position-estimate resets when external-nav reference loss, source-switching conflicts when running alongside GPS.

Reference comparison. AC-NEW-2 (3 s spoofing-promotion latency) is the source-switching path. Whatever output channel we pick (GPS_INPUT, ODOMETRY, or hybrid), the source switch is the high-risk transition.

Conclusion. Add an explicit testing requirement: F-T9 (SITL: full MAVLink loop) must include source-switching scenarios (jam onset → our channel → spoofed real-GPS recovery → operator-confirmed source restore). Include the EK3_SRC1_* parameter combinations being benchmarked in the test plan.

Confidence. ✅ High.

M-12 — Eastern-Ukraine relief amplitude affects flat-Earth assumption

Fact. S56: ~24 m peak-to-trough relief in Kharkiv-region UAV survey areas, with creek/gully systems.

Reference comparison. At 1 km AGL with 35° HFOV camera, a 24 m elevation offset at frame edge → ~17 m horizontal misalignment when ortho-projected on flat-Earth. AC-1.1 budget = 50 m@80 % (comfortable); AC-1.2 = 20 m@50 % (tight).

Conclusion. Add a per-sector DEM lookup to the pre-flight tile-sync pass. Classify sectors:

flat (≤5 m amplitude) — full ortho-tile generation, full anchor weight.
moderate (5–15 m) — ortho-tile generation, anchor weight × 0.7.
rugged (>15 m) — skip ortho-tile generation, anchor weight × 0.3 with explicit "rugged-sector" flag in confidence telemetry. This is a small one-time pre-flight step (SRTM 30 m DEM is free, ~15 GB global, ~30 MB for 400 km²).

Confidence. ⚠️ Medium. Single regional sample; refine numbers when more terrain data lands.

M-13 — TartanAir V2 reconsideration (open question)

Fact. S51: photo-realistic synthetic, native IMU + 12-cam + season variation + custom camera models.

Reference comparison. User's last-message reasoning was "Mavic-class dynamics ≠ fixed-wing dynamics → synthetic IMU is unlikely to produce a useful signal". TartanAir V2 lets us configure motion patterns, so the dynamics-mismatch argument is weaker than for MidAir-class quadcopter-only sims.

Conclusion. Open question for the user: include TartanAir V2 in the bench-off as an early-stage synthetic baseline (good for sweeping seasons / lighting / pitches), or hold to "real-data-only purism" with AerialVL + Mavic + planned-fixed-wing-flights as the only V&V?

Confidence. ⚠️ Medium. Technical viability is high; the call is product-side.

M-14 — Add AerialExtreMatch + 2chADCNN to V&V plan

Fact. AerialExtreMatch (S49) — 1.5 M synthetic image pairs, 32 difficulty levels (overlap × scale × pitch), real-world UAV localization subset. 2chADCNN (S50) — season-aware UAV↔satellite template-matching.

Reference comparison. Draft's bench-off targets are AerialVL + UAV-VisLoc + internal Mavic. None of those grade against extreme-pitch / extreme-scale / extreme-overlap separately. Without a benchmark that crosses these axes, the bench-off can pick a winner that fails silently in cornered conditions.

Conclusion. Add to the V&V plan:

AerialExtreMatch as a primary structured-difficulty regression bench.
2chADCNN as a season-aware baseline either (a) included in the bench-off, or (b) used as an explicit season-robustness ceiling reference.

Confidence. ✅ High.

M-15 — Real fixed-wing VO is harder than draft implies

Fact. S52 (AFIT thesis): SVO/DSO/ORB-SLAM2 all "had significant difficulty maintaining localisation" on real fixed-wing flights. S53: high-altitude (300–1000 m AGL) VIO drift in the same band as our AC-1.3.

Reference comparison. Draft's choice ("custom 2-frame homography VO via Component-3 matcher") is correct framing — VO between satellite anchors is a much easier problem than standalone metric SLAM. But AC-1.3's drift budget (<100 m without IMU, <50 m with IMU between two satellite-anchored fixes) requires empirical confirmation against a real fixed-wing baseline.

Conclusion. Add to risks: R8 — fixed-wing VO drift under our AC-1.3 budget is unconfirmed. Mitigations:

Borrow AerialVL's fixed-wing trajectories (70 km of real fixed-wing flight) for AC-1.3 regression in F-T1b (new).
Plan the first internal fixed-wing flight before AC lock — not as a stretch goal.

Confidence. ✅ High.

Summary table

Finding	Severity	Affects	Resolution
M-1	High	C-6, AC-4.3, AC-NEW-4	Hybrid GPS_INPUT + ODOMETRY
M-2	High	C-3 bench-off	Drop MASt3R from primary list
M-3	Med	C-3 bench-off	Add GIM-LightGlue
M-4	High	C-2 bench-off	Add SALAD + BoQ
M-5	High (positive)	AC-4.1	Downgrade R2 risk
M-6	High (security)	C-6	Replace mavlink-router OR sandbox & pin
M-7	High (security)	C-6	MAVLink2 signing v1-mandatory
M-8	Med	C-1	MBTiles WAL + pool + batching
M-9	High (safety)	C-1b, AC-NEW	New AC-NEW-7 + dedup-rule changes
M-10	Med	C-9	Stay on CPython 3.11/3.12; sharpen rationale
M-11	Med	C-5/C-6, AC-NEW-2	Add SITL source-switching tests
M-12	Med	C-1b, AC-1.2	Per-sector DEM lookup + anchor weight
M-13	Open question	datasets	Surface to user
M-14	Med	V&V plan	Add AerialExtreMatch + 2chADCNN
M-15	Med	C-4, AC-1.3	Risk R8 + AerialVL F-T1b

16 KiB Raw Blame History Unescape Escape

Reasoning Chain — Mode B (Solution Assessment of solution_draft01.md)

M-1 — ODOMETRY vs GPS_INPUT (Component 6)

M-2 — MASt3R off the primary matcher list

M-3 — Add GIM-LightGlue to the bench-off

M-4 — VPR shortlist expansion: + SALAD + BoQ

M-5 — Latency budget has more headroom than the draft assumed

M-6 — mavlink-router CVE-class issue

M-7 — MAVLink2 signing is v1-mandatory

M-8 — MBTiles operational recipe

M-9 — Cache-poisoning safety hazard

M-10 — Free-threaded Python 3.13 not v1-ready

M-11 — ODOMETRY known production gotchas → SITL coverage required

M-12 — Eastern-Ukraine relief amplitude affects flat-Earth assumption

M-13 — TartanAir V2 reconsideration (open question)

M-14 — Add AerialExtreMatch + 2chADCNN to V&V plan

M-15 — Real fixed-wing VO is harder than draft implies

Summary table

16 KiB

Raw Blame History

Reasoning Chain — Mode B (Solution Assessment of `solution_draft01.md`)