# Mode B Decomposition — Adversarial Assessment of `solution_draft01.md`

**Mode**: B (Solution Assessment).
**Question type**: Problem Diagnosis + Decision Support.
**Novelty sensitivity**: **High**. Embedded CV/SLAM, ArduPilot MAVLink2 signing maturity, JetPack version, and matcher SOTA all churn fast — prefer 2024-Q4 → 2026-Q2 sources.
**Goal**: per Mode B template, find weak points (functional / security / performance) per draft component and propose either a stronger alternative or an explicit mitigation. Output is `solution_draft02.md` with an "Assessment Findings" table at the top.

## Boundary
- **Population**: a single fixed-wing UAV running the GPS-denied onboard pipeline, 1 km AGL, 60 km/h cruise, 8 h endurance, eastern/southern Ukraine.
- **Geography**: deployed in active-conflict / contested EW environment.
- **Timeframe**: deployment v1 within the next ~4–6 months from now (mid-2026).
- **Level**: companion-computer code + integration. The Suite Satellite Service, the AI-camera detector, the FC firmware, and the airframe are out of scope as components but appear as interfaces under attack.

## Perspectives chosen (≥3 mandatory)

1. **Implementer / engineer** — what published Jetson Orin Nano Super numbers say about the actual latency budget, what the GIL-on-hot-path failure modes are, what is hard about TRT-deploying DINOv2-VLAD.
2. **Contrarian / devil's advocate** — every committed choice in the draft has a "why not X" answer; surface them.
3. **Domain practitioner** — what people running ArduPilot + companion CV in production have written about MAVLink2 signing, mavlink-router, GPS_INPUT injection, cross-view matchers in active service.
4. **Security / red-team** — `GPS_INPUT` is a high-trust local channel; tile cache is operationally sensitive. Realistic attack surface and mitigations.

## Weak-point sub-questions (drives Mode B web search)

### W1. Cross-view matcher commitment (Component 3)

The draft pins SuperPoint+LightGlue / XFeat / MASt3R as the bench-off candidates, with 1024×768 as the working downsample.

- W1.a. **Is the bench-off shortlist still current as of 2026-Q2?** Did GIM (2024), BoQ (2024), Mast3r-SfM (2025), RoMa-DC (2025), or Map-Free-Reloc 2025 leaderboard winners change the picture?
- W1.b. **Is "1024×768 starting point" empirically defensible on Orin Nano Super 25 W?** Published TRT FPS / latency for SP+LG and XFeat at this resolution on the Orin Nano class.
- W1.c. **Cross-view-specific failure modes at 1 km AGL** that the bench-off won't catch — illumination, season, recent-conflict landscape change. Are any matchers explicitly evaluated on temporal change?
- W1.d. **Why not training-free 3D-grounded matching (MASt3R/Mast3r-SfM) as primary** instead of as stretch? What's the realistic Orin Nano latency budget for these.

Query variants: "LightGlue Jetson Orin Nano benchmark 2025 2026", "SuperPoint TensorRT FP16 Orin Nano latency", "MASt3R embedded GPU benchmark", "GIM image matching cross-view 2024", "BoQ visual place recognition", "RoMa DKM aerial cross-view 2025", "image matcher seasonal change benchmark".

### W2. VPR backbone commitment (Component 2)

Draft picks AnyLoc (DINOv2-VLAD) primary + MixVPR fast-lane.

- W2.a. **DINOv2 ViT-B/14 latency on Orin Nano Super 25 W** — is the draft's "~50–80 ms / 224×224" empirically backed?
- W2.b. **2025 SOTA**: SALAD, BoQ (Bag-of-Queries), CricaVPR — do any beat AnyLoc on aerial cross-domain at meaningful latency?
- W2.c. **AnyLoc unsupervised VLAD** is training-free, but is the VLAD codebook quality stable across operational areas (Ukraine specifically)? Any published failure cases?

Query variants: "AnyLoc Jetson benchmark", "DINOv2 ViT-B TensorRT FP16 latency Orin", "SALAD visual place recognition aerial 2024", "BoQ visual place recognition", "CricaVPR aerial benchmark", "VPR aerial Ukraine seasonal".

### W3. Process topology — "single Python process + asyncio + TRT subprocess workers via CUDA IPC"

Draft commits to this for v1 (Component 9).

- W3.a. **GIL on the hot path** — is asyncio + subprocess workers actually GIL-safe at 3 fps × 1 km AGL with all the I/O (MAVLink, FDR, tile cache lookups, EKF math)? Real-world failure stories from ArduPilot/PX4 companion-computer projects.
- W3.b. **CUDA IPC for tensor handoff** — known issues on Jetson (unified memory model: is CUDA IPC even meaningful when CPU and GPU share the LPDDR5 pool)?
- W3.c. **Subinterpreters / free-threaded Python (3.13+)** — is the project using a Python old enough that subinterpreters aren't an option?
- W3.d. **Alternatives**: ROS 2 Humble (rejected in draft), C++ core (rejected), single-process with multiprocessing (not discussed).

Query variants: "Jetson CUDA IPC unified memory", "Python asyncio CUDA real-time deadline", "Python GIL drone companion computer", "PX4 ArduPilot companion computer python production", "ROS2 vs Python single-process VIO embedded", "free-threaded Python 3.13 GPU".

### W4. Loosely-coupled EKF in Python + numba (Component 5)

Draft writes its own loosely-coupled EKF, fuses IMU @ 100 Hz from FC, satellite anchors irregular, VO @ 3 Hz; emits GPS_INPUT.

- W4.a. **Why not just feed `VISION_POSITION_ESTIMATE` to ArduPilot EKF3 and let the FC fuse?** Draft mentions this as "alternative" — what does the practitioner literature say about the actual cost of the dual-fusion choice?
- W4.b. **EKF covariance calibration is famously fragile** (AC-NEW-4 false-position budget rides on it). Are there published gotchas for loose-coupled aerial EKF? What's the right Mahalanobis gate value?
- W4.c. **numba JIT on Jetson** — JIT warmup time hurts AC-NEW-1 (cold-start TTFF <30 s). Real numbers on Jetson Orin Nano JIT compile time.
- W4.d. **Heading observability** — at 1 km AGL nadir, satellite anchoring gives `(lat, lon, h)` but heading is weakly observable from a single anchor unless the matcher emits oriented features. Does the draft's matcher choice cleanly produce yaw with covariance?

Query variants: "ArduPilot VISION_POSITION_ESTIMATE vs GPS_INPUT", "loose coupled EKF aerial gotcha", "EKF Mahalanobis gate visual anchor", "numba Jetson cold start", "monocular yaw observability satellite reference".

### W5. ArduPilot MAVLink2 signing + GPS_INPUT injection security (Component 6)

Draft says "MAVLink2 signing recommended", treats GPS_INPUT as high-trust local channel.

- W5.a. **Production maturity of MAVLink2 signing in ArduPilot 4.5+** as of 2026-Q2 — is it default-on, default-off, key-distribution story?
- W5.b. **Real attack surface**: what does an attacker with serial access to the FC actually need to spoof a GPS_INPUT? Is `mavlink-router` itself an attack-surface widening?
- W5.c. **Companion-side defenses** — health-gate before injecting, fix_type sanity, jam-detection from the other direction.
- W5.d. **Failsafe fallback**: if our GPS_INPUT is rejected by the FC (signing fail), what does ArduPilot do — does AC-NEW-2 (3 s spoof-promotion latency) survive that?

Query variants: "ArduPilot MAVLink2 signing 4.5 production", "MAVLink2 signing key distribution UAV", "ArduPilot GPS_INPUT signing", "mavlink-router security audit", "GPS_INPUT spoof companion computer attack".

### W6. In-flight ortho-tile generation residual error (Component 1b)

Draft: pinhole projection → flat-Earth ground plane → resample to z=20 XYZ tiles. Eligibility gates: σ_xy ≤ 10 m, |bank| / |pitch| ≤ 10°.

- W6.a. **Flat-Earth residual error in eastern/southern Ukraine** — actual relief amplitude. Steppes are not flat at 30 cm/px tile precision; agricultural fields, river valleys, ravines (yary) are common.
- W6.b. **What's the per-tile geo-alignment error budget** that still keeps cross-view anchors valid against the same tile two flights later?
- W6.c. **MBTiles SQLite at 10 GB scale on NVMe**: known issues with concurrent reader+writer (tile-cache miss path is concurrent with tile-write path)? Sharding strategy?
- W6.d. **Dedup by (z, x, y) only** — but the onboard tile carries a parent_pose covariance. If we already overwrite a service-source tile with an "onboard" tile that was written from a 3-σ-bad pose, we've poisoned the next flight's cache. Should the dedup rule include a "trust-only" lock from the Service?

Query variants: "MBTiles concurrent writer reader SQLite", "orthorectification flat earth residual error UAV", "Ukraine eastern terrain relief amplitude", "geotagged tile alignment budget cross-view localization".

### W7. Tile dedup poisoning — onboard tile overwrites service tile

This is a sharper version of W6.d.

- W7.a. The "highest quality wins" rule treats `match_inliers` as a proxy for geo-alignment confidence. But a confidently-bad anchor (over-confident covariance from EKF — see W4.b) writes a "high-quality" tile that's actually misaligned by 50 m. Next flight, that misaligned tile becomes the satellite anchor for *another* anchor, and the error compounds.
- W7.b. **Best-practice from cartography / SfM** for trusting onboard imagery as basemap input.
- W7.c. **Mitigation**: lock tiles whose source is `service` against onboard overwrite for some grace period; require onboard tiles to be "voted" by N independent flights before promotion.

Query variants: "satellite tile pose error compounding", "uav generated tile basemap update sfm trust", "drone-ortho photo dedup quality score".

### W8. Mavic-class footage as deployment-domain proxy

Draft uses internal Mavic flight footage as the deployment-domain V&V proxy. Mavic is a small quadcopter; the deployment platform is a fixed-wing at 1 km AGL.

- W8.a. **What does the literature say** about transferring CV/VO/VPR results from quadcopter footage to fixed-wing? Camera dynamics differ (rolling shutter, vibration spectrum, frame rate, motion-blur profile, AGL band).
- W8.b. **Synthetic IMU from Mavic video** — user already rejected this. But is there a non-synthetic alternative that the draft missed? E.g., MidAir (synthetic but matched dynamics), TartanAir, public ArduPilot SITL log.
- W8.c. **Risk of false confidence** — ground truth is in the absolute satellite anchor, not the Mavic IMU. So how does the Mavic V&V actually validate AC-NEW-4 (false-position safety) when no fixed-wing IMU is in the loop?

Query variants: "fixed wing vs quadcopter visual SLAM transfer", "drone vibration spectrum fixed-wing quad", "TartanAir aerial dataset fixed-wing".

### W9. Latency budget — is 400 ms p95 actually realistic?

AC-4.1 budget. Draft acknowledges R2 ("latency budget on Orin Nano Super at 1024×768 input is tight").

- W9.a. **Real published Jetson Orin Nano Super 25 W numbers** for: DINOv2 ViT-B forward (224×224), SuperPoint+LightGlue at 1024×768, FAISS top-K over ~10⁴ vectors, EKF update at 100 Hz IMU.
- W9.b. **Steady-state vs transient latency** — does the budget include EKF-output-to-MAVLink-emit overhead, MAVLink serialisation, and the FC's own gating?
- W9.c. **Failure mode if budget blows** — frame-drop is allowed (AC-4.1 says ~10%) but if matcher latency tail is 600 ms, the EKF rides on VO+IMU for >2 frames, and AC-3.4 reloc trigger hits.

Query variants: "DINOv2 Jetson Orin Nano TensorRT FP16 ms", "LightGlue Jetson benchmark FPS 1024", "FAISS Jetson IVF latency".

### W10. AC-NEW-4 false-position safety — Monte Carlo validation realism

P(error >500 m) <0.1%, P(error >1 km) <0.01%.

- W10.a. **What's the standard practice** for validating these probabilities at this magnitude? You need >10⁴ frames of independent failure modes — does the AerialVL + Mavic dataset cover that?
- W10.b. **What does the literature say** about cross-view matcher tail behavior — do failures cluster on specific scene types (forest, repetitive cropland, water, glare)? If yes, dataset bias is the killer.
- W10.c. **EKF-side gating** — Mahalanobis gate is the right tool, but the gate threshold itself is a per-environment tuning parameter. Is there a published recipe?

Query variants: "visual localization tail probability >1km", "cross-view matcher failure clustering forest cropland water", "aerial visual SLAM Monte Carlo safety budget".

### W11. Cold-start TTFF <30 s feasibility

AC-NEW-1.

- W11.a. **TRT engine warm-up cost** on Jetson Orin Nano Super for SP+LG + DINOv2 + EKF JIT. Real numbers.
- W11.b. **FAISS index load + mmap warm**: 10 GB tile cache, IVF over ~10⁵ tile vectors — load time on NVMe.
- W11.c. **First valid GPS_INPUT** path includes: IMU-extrap-from-FC, first frame, VPR retrieve, matcher run, PnP, EKF init, GPS_INPUT emit. Anyone published an end-to-end cold-boot number for this kind of stack on Orin?

Query variants: "TensorRT engine load time Jetson", "FAISS mmap warm 10GB", "Jetson companion computer cold boot time GPS substitute".

### W12. Imagery freshness reality check — Suite Satellite Service refresh cadence

AC-8.2 + AC-NEW-6: <6 months for active sectors, <12 months for stable.

- W12.a. **Is a 6-month refresh actually achievable** for Maxar Vivid / Pléiades Neo / Pléiades over Ukraine in 2026-Q2? Tasking lead time + cloud-cover acceptance + delivery channel.
- W12.b. **Practitioner reports** on what 30 cm Ukraine 2024–2025 imagery actually looks like (smoke, glare, seasonal mismatch, cratering).
- W12.c. **In-flight tile generation** is meant to backfill — but the Service still needs ground-truth tasking to seed the cache for any new operational area before the *first* flight. Is there a chicken-and-egg problem for first deployment to a new sector?

Query variants: "Maxar Vivid Ukraine 2025 refresh tasking", "Pleiades Neo Ukraine cloud cover lead time", "30cm satellite imagery refresh cadence active conflict".

### W13. Resource contention — 8 GB shared LPDDR5 budget

AC-4.2 = <8 GB shared. Draft loads:
- DINOv2 ViT-B TRT engine (~600 MB GPU)
- SP+LG TRT engine (~hundreds of MB)
- FAISS index over 10⁵ tile descriptors
- Tile cache mmap (10 GB on disk, mmap to RAM via OS page cache)
- EKF state + IMU ring buffer
- Python interpreter + asyncio loop + JIT'd numba kernels
- MAVSDK + pymavlink

- W13.a. **Realistic peak RSS** for this stack — is the 8 GB budget headroom or is it a tight squeeze?
- W13.b. **JetPack 6.2 / Ubuntu 22 baseline RAM** consumed before our process even starts.
- W13.c. **Mitigation**: page out the FAISS index, swap, or pin everything?

Query variants: "Jetson Orin Nano 8GB shared budget DINOv2 LightGlue", "JetPack 6.2 base RAM usage", "FAISS pinned memory Jetson".

## Completeness audit

Probes (per `references/comparison-frameworks.md` decomposition probes):

| Probe | Covered by | Notes |
|---|---|---|
| **Cost of failure / blast radius** | W5 (signing), W7 (tile poisoning), W10 (false-position) | three-way coverage of safety budget |
| **Time-to-first-result** | W11 | dedicated to TTFF |
| **Operating envelope** | W6 (terrain), W12 (freshness), W13 (memory), W9 (latency) | thermal already in AC-NEW-5 |
| **Maintenance cost** | W3 (Python topology), W4 (EKF code we own) | both addressed |
| **Substitutability of components** | W1 (matcher), W2 (VPR), W3 (process topology), W4 (EKF) | each component has ≥1 alternative-path question |
| **Adversarial / red-team** | W5, W7, W10 | covered |
| **Data-distribution bias** | W8, W10.b, W12 | covered |
| **Hardware-supply-chain risk** | not covered | Orin Nano Super availability is a project-management risk, not a design risk; deferred to Plan |

## Output plan

1. Source registry → append Mode B sources to `01_source_registry.md` as IDs `S40+`.
2. Fact cards → append Mode B facts to `02_fact_cards.md` under "Mode B Findings".
3. Mode B reasoning chain → write `04_reasoning_chain_mode_b.md`.
4. Validation log → write `05_validation_log_mode_b.md`.
5. Final deliverable → write `_docs/01_solution/solution_draft02.md` using `templates/solution_draft_mode_b.md`.