- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality. - Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage. - Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
16 KiB
Mode B Decomposition — Adversarial Assessment of solution_draft01.md
Mode: B (Solution Assessment).
Question type: Problem Diagnosis + Decision Support.
Novelty sensitivity: High. Embedded CV/SLAM, ArduPilot MAVLink2 signing maturity, JetPack version, and matcher SOTA all churn fast — prefer 2024-Q4 → 2026-Q2 sources.
Goal: per Mode B template, find weak points (functional / security / performance) per draft component and propose either a stronger alternative or an explicit mitigation. Output is solution_draft02.md with an "Assessment Findings" table at the top.
Boundary
- Population: a single fixed-wing UAV running the GPS-denied onboard pipeline, 1 km AGL, 60 km/h cruise, 8 h endurance, eastern/southern Ukraine.
- Geography: deployed in active-conflict / contested EW environment.
- Timeframe: deployment v1 within the next ~4–6 months from now (mid-2026).
- Level: companion-computer code + integration. The Suite Satellite Service, the AI-camera detector, the FC firmware, and the airframe are out of scope as components but appear as interfaces under attack.
Perspectives chosen (≥3 mandatory)
- Implementer / engineer — what published Jetson Orin Nano Super numbers say about the actual latency budget, what the GIL-on-hot-path failure modes are, what is hard about TRT-deploying DINOv2-VLAD.
- Contrarian / devil's advocate — every committed choice in the draft has a "why not X" answer; surface them.
- Domain practitioner — what people running ArduPilot + companion CV in production have written about MAVLink2 signing, mavlink-router, GPS_INPUT injection, cross-view matchers in active service.
- Security / red-team —
GPS_INPUTis a high-trust local channel; tile cache is operationally sensitive. Realistic attack surface and mitigations.
Weak-point sub-questions (drives Mode B web search)
W1. Cross-view matcher commitment (Component 3)
The draft pins SuperPoint+LightGlue / XFeat / MASt3R as the bench-off candidates, with 1024×768 as the working downsample.
- W1.a. Is the bench-off shortlist still current as of 2026-Q2? Did GIM (2024), BoQ (2024), Mast3r-SfM (2025), RoMa-DC (2025), or Map-Free-Reloc 2025 leaderboard winners change the picture?
- W1.b. Is "1024×768 starting point" empirically defensible on Orin Nano Super 25 W? Published TRT FPS / latency for SP+LG and XFeat at this resolution on the Orin Nano class.
- W1.c. Cross-view-specific failure modes at 1 km AGL that the bench-off won't catch — illumination, season, recent-conflict landscape change. Are any matchers explicitly evaluated on temporal change?
- W1.d. Why not training-free 3D-grounded matching (MASt3R/Mast3r-SfM) as primary instead of as stretch? What's the realistic Orin Nano latency budget for these.
Query variants: "LightGlue Jetson Orin Nano benchmark 2025 2026", "SuperPoint TensorRT FP16 Orin Nano latency", "MASt3R embedded GPU benchmark", "GIM image matching cross-view 2024", "BoQ visual place recognition", "RoMa DKM aerial cross-view 2025", "image matcher seasonal change benchmark".
W2. VPR backbone commitment (Component 2)
Draft picks AnyLoc (DINOv2-VLAD) primary + MixVPR fast-lane.
- W2.a. DINOv2 ViT-B/14 latency on Orin Nano Super 25 W — is the draft's "~50–80 ms / 224×224" empirically backed?
- W2.b. 2025 SOTA: SALAD, BoQ (Bag-of-Queries), CricaVPR — do any beat AnyLoc on aerial cross-domain at meaningful latency?
- W2.c. AnyLoc unsupervised VLAD is training-free, but is the VLAD codebook quality stable across operational areas (Ukraine specifically)? Any published failure cases?
Query variants: "AnyLoc Jetson benchmark", "DINOv2 ViT-B TensorRT FP16 latency Orin", "SALAD visual place recognition aerial 2024", "BoQ visual place recognition", "CricaVPR aerial benchmark", "VPR aerial Ukraine seasonal".
W3. Process topology — "single Python process + asyncio + TRT subprocess workers via CUDA IPC"
Draft commits to this for v1 (Component 9).
- W3.a. GIL on the hot path — is asyncio + subprocess workers actually GIL-safe at 3 fps × 1 km AGL with all the I/O (MAVLink, FDR, tile cache lookups, EKF math)? Real-world failure stories from ArduPilot/PX4 companion-computer projects.
- W3.b. CUDA IPC for tensor handoff — known issues on Jetson (unified memory model: is CUDA IPC even meaningful when CPU and GPU share the LPDDR5 pool)?
- W3.c. Subinterpreters / free-threaded Python (3.13+) — is the project using a Python old enough that subinterpreters aren't an option?
- W3.d. Alternatives: ROS 2 Humble (rejected in draft), C++ core (rejected), single-process with multiprocessing (not discussed).
Query variants: "Jetson CUDA IPC unified memory", "Python asyncio CUDA real-time deadline", "Python GIL drone companion computer", "PX4 ArduPilot companion computer python production", "ROS2 vs Python single-process VIO embedded", "free-threaded Python 3.13 GPU".
W4. Loosely-coupled EKF in Python + numba (Component 5)
Draft writes its own loosely-coupled EKF, fuses IMU @ 100 Hz from FC, satellite anchors irregular, VO @ 3 Hz; emits GPS_INPUT.
- W4.a. Why not just feed
VISION_POSITION_ESTIMATEto ArduPilot EKF3 and let the FC fuse? Draft mentions this as "alternative" — what does the practitioner literature say about the actual cost of the dual-fusion choice? - W4.b. EKF covariance calibration is famously fragile (AC-NEW-4 false-position budget rides on it). Are there published gotchas for loose-coupled aerial EKF? What's the right Mahalanobis gate value?
- W4.c. numba JIT on Jetson — JIT warmup time hurts AC-NEW-1 (cold-start TTFF <30 s). Real numbers on Jetson Orin Nano JIT compile time.
- W4.d. Heading observability — at 1 km AGL nadir, satellite anchoring gives
(lat, lon, h)but heading is weakly observable from a single anchor unless the matcher emits oriented features. Does the draft's matcher choice cleanly produce yaw with covariance?
Query variants: "ArduPilot VISION_POSITION_ESTIMATE vs GPS_INPUT", "loose coupled EKF aerial gotcha", "EKF Mahalanobis gate visual anchor", "numba Jetson cold start", "monocular yaw observability satellite reference".
W5. ArduPilot MAVLink2 signing + GPS_INPUT injection security (Component 6)
Draft says "MAVLink2 signing recommended", treats GPS_INPUT as high-trust local channel.
- W5.a. Production maturity of MAVLink2 signing in ArduPilot 4.5+ as of 2026-Q2 — is it default-on, default-off, key-distribution story?
- W5.b. Real attack surface: what does an attacker with serial access to the FC actually need to spoof a GPS_INPUT? Is
mavlink-routeritself an attack-surface widening? - W5.c. Companion-side defenses — health-gate before injecting, fix_type sanity, jam-detection from the other direction.
- W5.d. Failsafe fallback: if our GPS_INPUT is rejected by the FC (signing fail), what does ArduPilot do — does AC-NEW-2 (3 s spoof-promotion latency) survive that?
Query variants: "ArduPilot MAVLink2 signing 4.5 production", "MAVLink2 signing key distribution UAV", "ArduPilot GPS_INPUT signing", "mavlink-router security audit", "GPS_INPUT spoof companion computer attack".
W6. In-flight ortho-tile generation residual error (Component 1b)
Draft: pinhole projection → flat-Earth ground plane → resample to z=20 XYZ tiles. Eligibility gates: σ_xy ≤ 10 m, |bank| / |pitch| ≤ 10°.
- W6.a. Flat-Earth residual error in eastern/southern Ukraine — actual relief amplitude. Steppes are not flat at 30 cm/px tile precision; agricultural fields, river valleys, ravines (yary) are common.
- W6.b. What's the per-tile geo-alignment error budget that still keeps cross-view anchors valid against the same tile two flights later?
- W6.c. MBTiles SQLite at 10 GB scale on NVMe: known issues with concurrent reader+writer (tile-cache miss path is concurrent with tile-write path)? Sharding strategy?
- W6.d. Dedup by (z, x, y) only — but the onboard tile carries a parent_pose covariance. If we already overwrite a service-source tile with an "onboard" tile that was written from a 3-σ-bad pose, we've poisoned the next flight's cache. Should the dedup rule include a "trust-only" lock from the Service?
Query variants: "MBTiles concurrent writer reader SQLite", "orthorectification flat earth residual error UAV", "Ukraine eastern terrain relief amplitude", "geotagged tile alignment budget cross-view localization".
W7. Tile dedup poisoning — onboard tile overwrites service tile
This is a sharper version of W6.d.
- W7.a. The "highest quality wins" rule treats
match_inliersas a proxy for geo-alignment confidence. But a confidently-bad anchor (over-confident covariance from EKF — see W4.b) writes a "high-quality" tile that's actually misaligned by 50 m. Next flight, that misaligned tile becomes the satellite anchor for another anchor, and the error compounds. - W7.b. Best-practice from cartography / SfM for trusting onboard imagery as basemap input.
- W7.c. Mitigation: lock tiles whose source is
serviceagainst onboard overwrite for some grace period; require onboard tiles to be "voted" by N independent flights before promotion.
Query variants: "satellite tile pose error compounding", "uav generated tile basemap update sfm trust", "drone-ortho photo dedup quality score".
W8. Mavic-class footage as deployment-domain proxy
Draft uses internal Mavic flight footage as the deployment-domain V&V proxy. Mavic is a small quadcopter; the deployment platform is a fixed-wing at 1 km AGL.
- W8.a. What does the literature say about transferring CV/VO/VPR results from quadcopter footage to fixed-wing? Camera dynamics differ (rolling shutter, vibration spectrum, frame rate, motion-blur profile, AGL band).
- W8.b. Synthetic IMU from Mavic video — user already rejected this. But is there a non-synthetic alternative that the draft missed? E.g., MidAir (synthetic but matched dynamics), TartanAir, public ArduPilot SITL log.
- W8.c. Risk of false confidence — ground truth is in the absolute satellite anchor, not the Mavic IMU. So how does the Mavic V&V actually validate AC-NEW-4 (false-position safety) when no fixed-wing IMU is in the loop?
Query variants: "fixed wing vs quadcopter visual SLAM transfer", "drone vibration spectrum fixed-wing quad", "TartanAir aerial dataset fixed-wing".
W9. Latency budget — is 400 ms p95 actually realistic?
AC-4.1 budget. Draft acknowledges R2 ("latency budget on Orin Nano Super at 1024×768 input is tight").
- W9.a. Real published Jetson Orin Nano Super 25 W numbers for: DINOv2 ViT-B forward (224×224), SuperPoint+LightGlue at 1024×768, FAISS top-K over ~10⁴ vectors, EKF update at 100 Hz IMU.
- W9.b. Steady-state vs transient latency — does the budget include EKF-output-to-MAVLink-emit overhead, MAVLink serialisation, and the FC's own gating?
- W9.c. Failure mode if budget blows — frame-drop is allowed (AC-4.1 says ~10%) but if matcher latency tail is 600 ms, the EKF rides on VO+IMU for >2 frames, and AC-3.4 reloc trigger hits.
Query variants: "DINOv2 Jetson Orin Nano TensorRT FP16 ms", "LightGlue Jetson benchmark FPS 1024", "FAISS Jetson IVF latency".
W10. AC-NEW-4 false-position safety — Monte Carlo validation realism
P(error >500 m) <0.1%, P(error >1 km) <0.01%.
- W10.a. What's the standard practice for validating these probabilities at this magnitude? You need >10⁴ frames of independent failure modes — does the AerialVL + Mavic dataset cover that?
- W10.b. What does the literature say about cross-view matcher tail behavior — do failures cluster on specific scene types (forest, repetitive cropland, water, glare)? If yes, dataset bias is the killer.
- W10.c. EKF-side gating — Mahalanobis gate is the right tool, but the gate threshold itself is a per-environment tuning parameter. Is there a published recipe?
Query variants: "visual localization tail probability >1km", "cross-view matcher failure clustering forest cropland water", "aerial visual SLAM Monte Carlo safety budget".
W11. Cold-start TTFF <30 s feasibility
AC-NEW-1.
- W11.a. TRT engine warm-up cost on Jetson Orin Nano Super for SP+LG + DINOv2 + EKF JIT. Real numbers.
- W11.b. FAISS index load + mmap warm: 10 GB tile cache, IVF over ~10⁵ tile vectors — load time on NVMe.
- W11.c. First valid GPS_INPUT path includes: IMU-extrap-from-FC, first frame, VPR retrieve, matcher run, PnP, EKF init, GPS_INPUT emit. Anyone published an end-to-end cold-boot number for this kind of stack on Orin?
Query variants: "TensorRT engine load time Jetson", "FAISS mmap warm 10GB", "Jetson companion computer cold boot time GPS substitute".
W12. Imagery freshness reality check — Suite Satellite Service refresh cadence
AC-8.2 + AC-NEW-6: <6 months for active sectors, <12 months for stable.
- W12.a. Is a 6-month refresh actually achievable for Maxar Vivid / Pléiades Neo / Pléiades over Ukraine in 2026-Q2? Tasking lead time + cloud-cover acceptance + delivery channel.
- W12.b. Practitioner reports on what 30 cm Ukraine 2024–2025 imagery actually looks like (smoke, glare, seasonal mismatch, cratering).
- W12.c. In-flight tile generation is meant to backfill — but the Service still needs ground-truth tasking to seed the cache for any new operational area before the first flight. Is there a chicken-and-egg problem for first deployment to a new sector?
Query variants: "Maxar Vivid Ukraine 2025 refresh tasking", "Pleiades Neo Ukraine cloud cover lead time", "30cm satellite imagery refresh cadence active conflict".
W13. Resource contention — 8 GB shared LPDDR5 budget
AC-4.2 = <8 GB shared. Draft loads:
-
DINOv2 ViT-B TRT engine (~600 MB GPU)
-
SP+LG TRT engine (~hundreds of MB)
-
FAISS index over 10⁵ tile descriptors
-
Tile cache mmap (10 GB on disk, mmap to RAM via OS page cache)
-
EKF state + IMU ring buffer
-
Python interpreter + asyncio loop + JIT'd numba kernels
-
MAVSDK + pymavlink
-
W13.a. Realistic peak RSS for this stack — is the 8 GB budget headroom or is it a tight squeeze?
-
W13.b. JetPack 6.2 / Ubuntu 22 baseline RAM consumed before our process even starts.
-
W13.c. Mitigation: page out the FAISS index, swap, or pin everything?
Query variants: "Jetson Orin Nano 8GB shared budget DINOv2 LightGlue", "JetPack 6.2 base RAM usage", "FAISS pinned memory Jetson".
Completeness audit
Probes (per references/comparison-frameworks.md decomposition probes):
| Probe | Covered by | Notes |
|---|---|---|
| Cost of failure / blast radius | W5 (signing), W7 (tile poisoning), W10 (false-position) | three-way coverage of safety budget |
| Time-to-first-result | W11 | dedicated to TTFF |
| Operating envelope | W6 (terrain), W12 (freshness), W13 (memory), W9 (latency) | thermal already in AC-NEW-5 |
| Maintenance cost | W3 (Python topology), W4 (EKF code we own) | both addressed |
| Substitutability of components | W1 (matcher), W2 (VPR), W3 (process topology), W4 (EKF) | each component has ≥1 alternative-path question |
| Adversarial / red-team | W5, W7, W10 | covered |
| Data-distribution bias | W8, W10.b, W12 | covered |
| Hardware-supply-chain risk | not covered | Orin Nano Super availability is a project-management risk, not a design risk; deferred to Plan |
Output plan
- Source registry → append Mode B sources to
01_source_registry.mdas IDsS40+. - Fact cards → append Mode B facts to
02_fact_cards.mdunder "Mode B Findings". - Mode B reasoning chain → write
04_reasoning_chain_mode_b.md. - Validation log → write
05_validation_log_mode_b.md. - Final deliverable → write
_docs/01_solution/solution_draft02.mdusingtemplates/solution_draft_mode_b.md.