mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-04-27 12:06:37 +00:00

Files

T

Oleksandr Bezdieniezhnykh 9eba1689b3 - Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.

- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.

2026-04-26 14:28:10 +03:00

16 KiB

Raw Blame History

Mode B Decomposition — Adversarial Assessment of `solution_draft01.md`

Mode: B (Solution Assessment). Question type: Problem Diagnosis + Decision Support. Novelty sensitivity: High. Embedded CV/SLAM, ArduPilot MAVLink2 signing maturity, JetPack version, and matcher SOTA all churn fast — prefer 2024-Q4 → 2026-Q2 sources. Goal: per Mode B template, find weak points (functional / security / performance) per draft component and propose either a stronger alternative or an explicit mitigation. Output is solution_draft02.md with an "Assessment Findings" table at the top.

Boundary

Population: a single fixed-wing UAV running the GPS-denied onboard pipeline, 1 km AGL, 60 km/h cruise, 8 h endurance, eastern/southern Ukraine.
Geography: deployed in active-conflict / contested EW environment.
Timeframe: deployment v1 within the next ~4–6 months from now (mid-2026).
Level: companion-computer code + integration. The Suite Satellite Service, the AI-camera detector, the FC firmware, and the airframe are out of scope as components but appear as interfaces under attack.

Perspectives chosen (≥3 mandatory)

Implementer / engineer — what published Jetson Orin Nano Super numbers say about the actual latency budget, what the GIL-on-hot-path failure modes are, what is hard about TRT-deploying DINOv2-VLAD.
Contrarian / devil's advocate — every committed choice in the draft has a "why not X" answer; surface them.
Domain practitioner — what people running ArduPilot + companion CV in production have written about MAVLink2 signing, mavlink-router, GPS_INPUT injection, cross-view matchers in active service.
Security / red-team — GPS_INPUT is a high-trust local channel; tile cache is operationally sensitive. Realistic attack surface and mitigations.

Weak-point sub-questions (drives Mode B web search)

W1. Cross-view matcher commitment (Component 3)

The draft pins SuperPoint+LightGlue / XFeat / MASt3R as the bench-off candidates, with 1024×768 as the working downsample.

W1.a. Is the bench-off shortlist still current as of 2026-Q2? Did GIM (2024), BoQ (2024), Mast3r-SfM (2025), RoMa-DC (2025), or Map-Free-Reloc 2025 leaderboard winners change the picture?
W1.b. Is "1024×768 starting point" empirically defensible on Orin Nano Super 25 W? Published TRT FPS / latency for SP+LG and XFeat at this resolution on the Orin Nano class.
W1.c. Cross-view-specific failure modes at 1 km AGL that the bench-off won't catch — illumination, season, recent-conflict landscape change. Are any matchers explicitly evaluated on temporal change?
W1.d. Why not training-free 3D-grounded matching (MASt3R/Mast3r-SfM) as primary instead of as stretch? What's the realistic Orin Nano latency budget for these.

Query variants: "LightGlue Jetson Orin Nano benchmark 2025 2026", "SuperPoint TensorRT FP16 Orin Nano latency", "MASt3R embedded GPU benchmark", "GIM image matching cross-view 2024", "BoQ visual place recognition", "RoMa DKM aerial cross-view 2025", "image matcher seasonal change benchmark".

W2. VPR backbone commitment (Component 2)

Draft picks AnyLoc (DINOv2-VLAD) primary + MixVPR fast-lane.

W2.a. DINOv2 ViT-B/14 latency on Orin Nano Super 25 W — is the draft's "~50–80 ms / 224×224" empirically backed?
W2.b. 2025 SOTA: SALAD, BoQ (Bag-of-Queries), CricaVPR — do any beat AnyLoc on aerial cross-domain at meaningful latency?
W2.c. AnyLoc unsupervised VLAD is training-free, but is the VLAD codebook quality stable across operational areas (Ukraine specifically)? Any published failure cases?

Query variants: "AnyLoc Jetson benchmark", "DINOv2 ViT-B TensorRT FP16 latency Orin", "SALAD visual place recognition aerial 2024", "BoQ visual place recognition", "CricaVPR aerial benchmark", "VPR aerial Ukraine seasonal".

W3. Process topology — "single Python process + asyncio + TRT subprocess workers via CUDA IPC"

Draft commits to this for v1 (Component 9).

W3.a. GIL on the hot path — is asyncio + subprocess workers actually GIL-safe at 3 fps × 1 km AGL with all the I/O (MAVLink, FDR, tile cache lookups, EKF math)? Real-world failure stories from ArduPilot/PX4 companion-computer projects.
W3.b. CUDA IPC for tensor handoff — known issues on Jetson (unified memory model: is CUDA IPC even meaningful when CPU and GPU share the LPDDR5 pool)?
W3.c. Subinterpreters / free-threaded Python (3.13+) — is the project using a Python old enough that subinterpreters aren't an option?
W3.d. Alternatives: ROS 2 Humble (rejected in draft), C++ core (rejected), single-process with multiprocessing (not discussed).

Query variants: "Jetson CUDA IPC unified memory", "Python asyncio CUDA real-time deadline", "Python GIL drone companion computer", "PX4 ArduPilot companion computer python production", "ROS2 vs Python single-process VIO embedded", "free-threaded Python 3.13 GPU".

W4. Loosely-coupled EKF in Python + numba (Component 5)

Draft writes its own loosely-coupled EKF, fuses IMU @ 100 Hz from FC, satellite anchors irregular, VO @ 3 Hz; emits GPS_INPUT.

W4.a. Why not just feed VISION_POSITION_ESTIMATE to ArduPilot EKF3 and let the FC fuse? Draft mentions this as "alternative" — what does the practitioner literature say about the actual cost of the dual-fusion choice?
W4.b. EKF covariance calibration is famously fragile (AC-NEW-4 false-position budget rides on it). Are there published gotchas for loose-coupled aerial EKF? What's the right Mahalanobis gate value?
W4.c. numba JIT on Jetson — JIT warmup time hurts AC-NEW-1 (cold-start TTFF <30 s). Real numbers on Jetson Orin Nano JIT compile time.
W4.d. Heading observability — at 1 km AGL nadir, satellite anchoring gives (lat, lon, h) but heading is weakly observable from a single anchor unless the matcher emits oriented features. Does the draft's matcher choice cleanly produce yaw with covariance?

Query variants: "ArduPilot VISION_POSITION_ESTIMATE vs GPS_INPUT", "loose coupled EKF aerial gotcha", "EKF Mahalanobis gate visual anchor", "numba Jetson cold start", "monocular yaw observability satellite reference".

W5. ArduPilot MAVLink2 signing + GPS_INPUT injection security (Component 6)

Draft says "MAVLink2 signing recommended", treats GPS_INPUT as high-trust local channel.

W5.a. Production maturity of MAVLink2 signing in ArduPilot 4.5+ as of 2026-Q2 — is it default-on, default-off, key-distribution story?
W5.b. Real attack surface: what does an attacker with serial access to the FC actually need to spoof a GPS_INPUT? Is mavlink-router itself an attack-surface widening?
W5.c. Companion-side defenses — health-gate before injecting, fix_type sanity, jam-detection from the other direction.
W5.d. Failsafe fallback: if our GPS_INPUT is rejected by the FC (signing fail), what does ArduPilot do — does AC-NEW-2 (3 s spoof-promotion latency) survive that?

Query variants: "ArduPilot MAVLink2 signing 4.5 production", "MAVLink2 signing key distribution UAV", "ArduPilot GPS_INPUT signing", "mavlink-router security audit", "GPS_INPUT spoof companion computer attack".

W6. In-flight ortho-tile generation residual error (Component 1b)

Draft: pinhole projection → flat-Earth ground plane → resample to z=20 XYZ tiles. Eligibility gates: σ_xy ≤ 10 m, |bank| / |pitch| ≤ 10°.

W6.a. Flat-Earth residual error in eastern/southern Ukraine — actual relief amplitude. Steppes are not flat at 30 cm/px tile precision; agricultural fields, river valleys, ravines (yary) are common.
W6.b. What's the per-tile geo-alignment error budget that still keeps cross-view anchors valid against the same tile two flights later?
W6.c. MBTiles SQLite at 10 GB scale on NVMe: known issues with concurrent reader+writer (tile-cache miss path is concurrent with tile-write path)? Sharding strategy?
W6.d. Dedup by (z, x, y) only — but the onboard tile carries a parent_pose covariance. If we already overwrite a service-source tile with an "onboard" tile that was written from a 3-σ-bad pose, we've poisoned the next flight's cache. Should the dedup rule include a "trust-only" lock from the Service?

Query variants: "MBTiles concurrent writer reader SQLite", "orthorectification flat earth residual error UAV", "Ukraine eastern terrain relief amplitude", "geotagged tile alignment budget cross-view localization".

W7. Tile dedup poisoning — onboard tile overwrites service tile

This is a sharper version of W6.d.

W7.a. The "highest quality wins" rule treats match_inliers as a proxy for geo-alignment confidence. But a confidently-bad anchor (over-confident covariance from EKF — see W4.b) writes a "high-quality" tile that's actually misaligned by 50 m. Next flight, that misaligned tile becomes the satellite anchor for another anchor, and the error compounds.
W7.b. Best-practice from cartography / SfM for trusting onboard imagery as basemap input.
W7.c. Mitigation: lock tiles whose source is service against onboard overwrite for some grace period; require onboard tiles to be "voted" by N independent flights before promotion.

Query variants: "satellite tile pose error compounding", "uav generated tile basemap update sfm trust", "drone-ortho photo dedup quality score".

W8. Mavic-class footage as deployment-domain proxy

Draft uses internal Mavic flight footage as the deployment-domain V&V proxy. Mavic is a small quadcopter; the deployment platform is a fixed-wing at 1 km AGL.

W8.a. What does the literature say about transferring CV/VO/VPR results from quadcopter footage to fixed-wing? Camera dynamics differ (rolling shutter, vibration spectrum, frame rate, motion-blur profile, AGL band).
W8.b. Synthetic IMU from Mavic video — user already rejected this. But is there a non-synthetic alternative that the draft missed? E.g., MidAir (synthetic but matched dynamics), TartanAir, public ArduPilot SITL log.
W8.c. Risk of false confidence — ground truth is in the absolute satellite anchor, not the Mavic IMU. So how does the Mavic V&V actually validate AC-NEW-4 (false-position safety) when no fixed-wing IMU is in the loop?

Query variants: "fixed wing vs quadcopter visual SLAM transfer", "drone vibration spectrum fixed-wing quad", "TartanAir aerial dataset fixed-wing".

W9. Latency budget — is 400 ms p95 actually realistic?

AC-4.1 budget. Draft acknowledges R2 ("latency budget on Orin Nano Super at 1024×768 input is tight").

W9.a. Real published Jetson Orin Nano Super 25 W numbers for: DINOv2 ViT-B forward (224×224), SuperPoint+LightGlue at 1024×768, FAISS top-K over ~10⁴ vectors, EKF update at 100 Hz IMU.
W9.b. Steady-state vs transient latency — does the budget include EKF-output-to-MAVLink-emit overhead, MAVLink serialisation, and the FC's own gating?
W9.c. Failure mode if budget blows — frame-drop is allowed (AC-4.1 says ~10%) but if matcher latency tail is 600 ms, the EKF rides on VO+IMU for >2 frames, and AC-3.4 reloc trigger hits.

Query variants: "DINOv2 Jetson Orin Nano TensorRT FP16 ms", "LightGlue Jetson benchmark FPS 1024", "FAISS Jetson IVF latency".

W10. AC-NEW-4 false-position safety — Monte Carlo validation realism

P(error >500 m) <0.1%, P(error >1 km) <0.01%.

W10.a. What's the standard practice for validating these probabilities at this magnitude? You need >10⁴ frames of independent failure modes — does the AerialVL + Mavic dataset cover that?
W10.b. What does the literature say about cross-view matcher tail behavior — do failures cluster on specific scene types (forest, repetitive cropland, water, glare)? If yes, dataset bias is the killer.
W10.c. EKF-side gating — Mahalanobis gate is the right tool, but the gate threshold itself is a per-environment tuning parameter. Is there a published recipe?

Query variants: "visual localization tail probability >1km", "cross-view matcher failure clustering forest cropland water", "aerial visual SLAM Monte Carlo safety budget".

W11. Cold-start TTFF <30 s feasibility

AC-NEW-1.

W11.a. TRT engine warm-up cost on Jetson Orin Nano Super for SP+LG + DINOv2 + EKF JIT. Real numbers.
W11.b. FAISS index load + mmap warm: 10 GB tile cache, IVF over ~10⁵ tile vectors — load time on NVMe.
W11.c. First valid GPS_INPUT path includes: IMU-extrap-from-FC, first frame, VPR retrieve, matcher run, PnP, EKF init, GPS_INPUT emit. Anyone published an end-to-end cold-boot number for this kind of stack on Orin?

Query variants: "TensorRT engine load time Jetson", "FAISS mmap warm 10GB", "Jetson companion computer cold boot time GPS substitute".

W12. Imagery freshness reality check — Suite Satellite Service refresh cadence

AC-8.2 + AC-NEW-6: <6 months for active sectors, <12 months for stable.

W12.a. Is a 6-month refresh actually achievable for Maxar Vivid / Pléiades Neo / Pléiades over Ukraine in 2026-Q2? Tasking lead time + cloud-cover acceptance + delivery channel.
W12.b. Practitioner reports on what 30 cm Ukraine 2024–2025 imagery actually looks like (smoke, glare, seasonal mismatch, cratering).
W12.c. In-flight tile generation is meant to backfill — but the Service still needs ground-truth tasking to seed the cache for any new operational area before the first flight. Is there a chicken-and-egg problem for first deployment to a new sector?

Query variants: "Maxar Vivid Ukraine 2025 refresh tasking", "Pleiades Neo Ukraine cloud cover lead time", "30cm satellite imagery refresh cadence active conflict".

W13. Resource contention — 8 GB shared LPDDR5 budget

AC-4.2 = <8 GB shared. Draft loads:

DINOv2 ViT-B TRT engine (~600 MB GPU)
SP+LG TRT engine (~hundreds of MB)
FAISS index over 10⁵ tile descriptors
Tile cache mmap (10 GB on disk, mmap to RAM via OS page cache)
EKF state + IMU ring buffer
Python interpreter + asyncio loop + JIT'd numba kernels
MAVSDK + pymavlink
W13.a. Realistic peak RSS for this stack — is the 8 GB budget headroom or is it a tight squeeze?
W13.b. JetPack 6.2 / Ubuntu 22 baseline RAM consumed before our process even starts.
W13.c. Mitigation: page out the FAISS index, swap, or pin everything?

Query variants: "Jetson Orin Nano 8GB shared budget DINOv2 LightGlue", "JetPack 6.2 base RAM usage", "FAISS pinned memory Jetson".

Completeness audit

Probes (per references/comparison-frameworks.md decomposition probes):

Probe	Covered by	Notes
Cost of failure / blast radius	W5 (signing), W7 (tile poisoning), W10 (false-position)	three-way coverage of safety budget
Time-to-first-result	W11	dedicated to TTFF
Operating envelope	W6 (terrain), W12 (freshness), W13 (memory), W9 (latency)	thermal already in AC-NEW-5
Maintenance cost	W3 (Python topology), W4 (EKF code we own)	both addressed
Substitutability of components	W1 (matcher), W2 (VPR), W3 (process topology), W4 (EKF)	each component has ≥1 alternative-path question
Adversarial / red-team	W5, W7, W10	covered
Data-distribution bias	W8, W10.b, W12	covered
Hardware-supply-chain risk	not covered	Orin Nano Super availability is a project-management risk, not a design risk; deferred to Plan

Output plan

Source registry → append Mode B sources to 01_source_registry.md as IDs S40+.
Fact cards → append Mode B facts to 02_fact_cards.md under "Mode B Findings".
Mode B reasoning chain → write 04_reasoning_chain_mode_b.md.
Validation log → write 05_validation_log_mode_b.md.
Final deliverable → write _docs/01_solution/solution_draft02.md using templates/solution_draft_mode_b.md.

16 KiB Raw Blame History Unescape Escape

Mode B Decomposition — Adversarial Assessment of solution_draft01.md