gps-denied-onboard/_docs/00_research/05_validation_log.md

# Validation Log

> Mode A Phase 2 — engine Step 7 (Use-Case Validation / Sanity Check). Validates the recommended primary stack from `04_reasoning_chain.md` against a typical UAV mission scenario, surfaces counterexamples where they exist, runs the engine's review checklist, and lists conclusions that need revision.
>
> Backing artifacts: source registry [`01_source_registry/00_summary.md`](01_source_registry/00_summary.md) (#1–#121); fact cards [`02_fact_cards/00_summary.md`](02_fact_cards/00_summary.md) (#1–#101); component fit matrix [`06_component_fit_matrix/00_summary.md`](06_component_fit_matrix/00_summary.md); cross-component gates [`06_component_fit_matrix/99_cross_component_gates.md`](06_component_fit_matrix/99_cross_component_gates.md); comparison framework [`03_comparison_framework.md`](03_comparison_framework.md); reasoning chain [`04_reasoning_chain.md`](04_reasoning_chain.md).

---

## Validation Scenarios

The recommended primary stack must hold up across the full envelope of normal-flight + edge-case scenarios called out in the Project Constraint Matrix. Walked through five representative scenarios — one nominal cruise, two edge cases, two adversarial.

### Scenario 1 — Nominal cruise (steady-state visual anchoring)

A fixed-wing UAV at 1 km AGL cruises at 60 km/h over rolling-steppe agricultural terrain east of Dnipro. GPS is jammed. Nav camera produces 3 frames/s (~333 ms cadence). FC delivers 100-200 Hz IMU + attitude over MAVLink. C2 (MixVPR per recommended primary on the BSD/permissive track) retrieves K=3-5 candidate satellite tiles per frame; C3 (DISK+LightGlue + adaptive depth per D-C3-3 mitigation) registers UAV frame against best candidate; C4 (OpenCV `cv::solvePnPRansac` wrapped in GTSAM `Marginals` per D-C4-2 = (b)) emits 6-DoF pose + 6×6 covariance; C5 (GTSAM iSAM2 per D-C5-5 = (c)) fuses with C1 (OKVIS2 frame-to-frame VIO) + IMU; C8 (pymavlink → MAVLink `GPS_INPUT` for ArduPilot Plane / MSP2_SENSOR_GPS for iNav) emits WGS84 + per-FC `horiz_accuracy`/`hPosAccuracy` at 5 Hz per D-C8-5.

### Scenario 2 — Sharp turn with <5% inter-frame overlap (AC-3.2)

UAV banks ±20° to enter a search pattern. Two consecutive frames share <5% overlap. C1 frame-to-frame VIO loses tracking; C5 propagates dead-reckoned via IMU + last-good-anchor. C2/C3 next-frame retrieval recovers a valid satellite-anchor within 1-2 frames per AC-3.2 ("recovery via satellite-reference re-localization"). Within the AC-3.4 budget (≥3 consecutive frames AND ≥2 s without a position before requesting operator re-loc).

### Scenario 3 — Stale tile in active-conflict sector (AC-NEW-6)

Cache contains a tile from 8 months ago for a sector flagged as active-conflict. AC-8.2 freshness threshold is <6 mo for active-conflict. C6 manifest carries `capture_date` per restrictions.md mandate. The retrieval path must reject (or downgrade label to non-`satellite_anchored`) per AC-NEW-6.

### Scenario 4 — Cache file corruption (AC-NEW-7 cache-poisoning safety)

Pre-flight: a malicious actor swaps `/var/lib/onboard/cache/faiss/v_2048_M32.index` with a tampered file containing crafted descriptors that would point to wrong tiles for given UAV-frame queries. Takeoff load via `faiss.read_index` would silently load this file (Source #114 explicit warning: "no internal integrity check, expects validated input").

### Scenario 5 — GPS spoofing + visual blackout (AC-3.5 + AC-NEW-2 + AC-NEW-8)

UAV enters a cloud bank (visual blackout) while FC simultaneously reports GPS signal-quality anomaly indicating spoofing. C1 + C2 + C3 + C4 all fail (no usable visual input); C5 must propagate from last trusted state via IMU only, label every estimate `{dead_reckoned}`, degrade MAVLink fix-quality to "2D fix or worse" when 95% covariance semi-major axis >100 m, escalate to "no fix" when >500 m or blackout >30 s. C8 must NOT promote spoofed real-GPS back into the estimator unless FC GPS health stable + non-spoofed for ≥10 s AND a visual/satellite consistency check has succeeded. AC-NEW-2 spoofing-promotion latency <3 s p95 from spoof onset to companion estimate becoming primary FC source.

---

## Expected behavior under recommended primary stack

### Scenario 1 — Nominal cruise

If using **MixVPR + DISK+LightGlue + OpenCV+GTSAM-Marginals + GTSAM iSAM2 + pymavlink/MSP2** at the recommended primary stack:
- C2 MixVPR query ~10-20 ms FP16 + ~5-10 ms INT8 per frame; K=3-5 retrieval list returned.
- C3 DISK+LightGlue FP16 (per D-C7-6 matchers→FP16-only per-family precision policy) ~30-60 ms per pair × K=3-5 pairs = 90-300 ms (within AC-4.1 400 ms p95 if K=3 + adaptive depth applied per D-C3-3).
- C4 `cv::solvePnPRansac` ~5-15 ms inlier filter + GTSAM `Marginals` recovery ~30-90 ms (Plan-phase Jetson MVE confirms).
- C5 GTSAM iSAM2 with D-C5-5 = (c) PriorFactorPose3-only + IncrementalFixedLagSmoother K=10-20 keyframes per D-C5-3 ~2-5 ms per update.
- C8 pymavlink GPS_INPUT or MSP2_SENSOR_GPS encode + send ~1-5 ms.
- Total end-to-end: ~140-420 ms p95. Within AC-4.1 budget at K=3 + adaptive depth.
- Memory: ~1.5-2.5 GB peak. Well within AC-4.2 8 GB budget.
- AC-NEW-4 satisfied NATIVELY via GTSAM `Marginals.marginalCovariance` per D-C8-8 per-FC unit conversion.

### Scenario 2 — Sharp turn

C1 VIO loses frame-to-frame tracking on the <5% overlap consecutive frames per AC-3.2 ("Sharp-turn frames may fail frame-to-frame registration"). C5 ESKF/iSAM2 propagates from last-good-anchor via IMU per D-C5-2 long-cruise-observability strategy (covariance growth alert if covariance > threshold); IMU bias-stationarity prior (D-C5-2 = (a) accept + monitor) keeps drift bounded. Next 1-2 frames trigger C2+C3 satellite-anchor re-localization per AC-3.2 recovery clause. Within AC-3.4 budget if recovery within 3 frames + 2 s. Per AC-3.3 the system handles ≥3 disconnected segments per flight via satellite-reference re-localization as core capability.

### Scenario 3 — Stale tile

C6 cache entry carries `capture_date` per restrictions.md tile manifest schema mandate. Retrieval path must check `capture_date` against AC-8.2 threshold (<6 mo active-conflict, <12 mo stable rear). If stale, downgrade label to non-`satellite_anchored` per AC-NEW-6 ("verify stale-tile match never produces `satellite_anchored`"). Sector classification (active-conflict vs stable rear) is deferred to Plan-phase per the C10 scope restructure 2026-05-08.

### Scenario 4 — Cache file corruption

D-C10-3 content-hash verification gate at takeoff load: compute `SHA-256(faiss_index_file)` at takeoff load + compare against manifest-recorded hash + reject load + emit `STATUSTEXT` to FC + refuse takeoff if mismatch. ~50 ms one-time hash check at takeoff per Source #115 size formula (~430 MB at 2048-D halfvec × 100K tiles read at SATA SSD ~500 MB/s). Direct AC-NEW-7 satisfaction at the descriptor-cache load layer.

### Scenario 5 — GPS spoofing + visual blackout

C1+C2+C3+C4 all fail; C5 propagates dead-reckoned via IMU only. Per AC-3.5: switch label to `{dead_reckoned}` within ≤1 processed frame OR ≤400 ms; reject spoofed GPS as estimator input. Per AC-NEW-8: continue emitting external-position MAVLink frames from IMU-only propagation for ≤30 s after the last trusted anchor, label every estimate `{dead_reckoned}`, degrade MAVLink fix-quality to "2D fix or worse" when 95% covariance semi-major axis >100 m, escalate to "no fix" + `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT when >500 m OR blackout >30 s. C8 D-C8-2 = (b) companion-driven `MAV_CMD_SET_EKF_SOURCE_SET` switch ownership pattern: companion publishes to source-set 2 + auto-switches FC + switches back to set 1 when companion is unavailable. AC-NEW-2 spoofing-promotion latency <3 s p95 satisfied via the companion-driven switch (no GCS round-trip required).

---

## Actual validation results

| Scenario | Recommended primary stack behavior | Outcome |
|---|---|---|
| 1 — Nominal cruise | Total end-to-end 140-420 ms p95; memory 1.5-2.5 GB peak; AC-NEW-4 NATIVELY satisfied | ✅ **PASS** with K=3 + adaptive depth applied (Plan-phase Jetson MVE confirms exact tail) |
| 2 — Sharp turn AC-3.2 | C5 dead-reckon + C2/C3 re-localize within 1-2 frames; AC-3.3 ≥3 disconnected segments handled | ✅ **PASS** per design |
| 3 — Stale tile AC-NEW-6 | C6 manifest `capture_date` check; downgrade label to non-`satellite_anchored` if stale | ✅ **PASS** at architectural level; sector-classification heuristic deferred to Plan-phase |
| 4 — Cache poisoning AC-NEW-7 | D-C10-3 SHA-256 content-hash gate at takeoff; D-C10-2 atomic-write covers truncation | ✅ **PASS** for descriptor-cache + TensorRT engine path; Suite Sat Service multi-flight ingest voting OUT OF onboard scope (per AC-NEW-7 external-dependency note) |
| 5 — GPS spoofing + visual blackout | C5 dead-reckon, C8 companion-driven source-set switch, AC-NEW-8 escalation thresholds enforced | ✅ **PASS** per AC-3.5 + AC-NEW-2 + AC-NEW-8 + D-C8-2 + D-C8-8 |

---

## Counterexamples

### Counterexample CE-1 — K=10 retrieval pairs in Scenario 1 violates AC-4.1

If C3 K=10 retrieval pairs per frame (canonical default per LightGlue paper §5.4 evaluation methodology) is naively applied without D-C3-3 mitigation, total end-to-end at DISK+LightGlue ~30-60 ms × 10 = 300-600 ms standard / 150-300 ms adaptive — **exceeds AC-4.1 400 ms p95 budget without K reduction**. Mitigation pathway documented in D-C3-3 Choose block: reduce K from 10 to 3-5 / reduce keypoints from 1024 to 512 / accept TIGHT margin and validate at Jetson MVE / parallelize across multiple Jetson GPU streams / elevate ONNX Runtime + TensorRT EP + adaptive depth.

**Address**: this counterexample is already known and gated as D-C3-3; recommendation is K=3 + adaptive depth which satisfies the AC-4.1 budget at the cost of ~5-10% Recall@K loss vs K=10.

### Counterexample CE-2 — D-C5-5 = (a) per-correspondence factor density violates AC-4.1

If C5 GTSAM iSAM2 is configured with D-C5-5 = (a) per-correspondence `GenericProjectionFactorCal3DS2` highest fidelity (1000+ factors per keyframe at K=10 image pairs × 100 inliers per pair), per-update latency is ~50-150 ms on Jetson Orin Nano Super CPU — combined with C3 ~150-300 ms + C4 ~30-90 ms + C2 ~15-30 ms + C8 ~1-5 ms exceeds AC-4.1 400 ms p95 budget.

**Address**: this counterexample is already known and gated as D-C5-5; recommendation is D-C5-5 = (c) `PriorFactorPose3` only with C4 GTSAM Marginals satellite-anchor 6×6 covariance — couples C4 Fact #54 D-C4-2 = (b) with C5 Fact #89 architectural integration via shared GTSAM substrate. ~2-5 ms per update on Jetson Orin Nano Super CPU. CLEANEST cross-component coupling.

### Counterexample CE-3 — Pure ESKF (Manual ESKF without GTSAM iSAM2) loses AC-4.5 look-back

If C5 = Manual ESKF only (no GTSAM iSAM2 secondary), AC-4.5 ("System may refine prior estimates and emit corrections") cannot be satisfied — the recursive forward-time-only Kalman update has no look-back facility per Solà §6 reference recipe. AC-4.5 is a "may" not a "must" but in the project's spoofing-aware AC-NEW-8 dead-reckoning failsafe context, the look-back capability is operationally valuable for retroactively correcting blackout-period estimates once a trusted anchor is recovered.

**Address**: this counterexample is partially mitigated by recommending the **hybrid** Manual ESKF + GTSAM iSAM2 path per the C5 batch 1 closure (Fact #88 + Fact #89 dual-candidate verdict). Manual ESKF is the mandatory simple-baseline (always-running fallback if GTSAM iSAM2 fails to converge); GTSAM iSAM2 is the primary path with NATIVE AC-4.5 look-back. Final lock at Plan-phase per D-C5-3 + D-C5-5.

### Counterexample CE-4 — Cand 3 UBX impersonation for iNav (AC-NEW-7 forgery posture)

If C8 iNav path = Cand 3 UBX impersonation via pyubx2 NAV-PVT (instead of the recommended primary Cand 2 MSP2_SENSOR_GPS), the project takes on an unambiguous forgery posture — companion impersonates a u-blox receiver. AC-NEW-7 ("no covert GPS spoofing without consent") requires an explicit FDR audit trail per D-C8-7 = (a). User chose Cand 2 (MSP2_SENSOR_GPS) as primary for iNav to avoid this posture entirely; Cand 3 remains a documented secondary path with the audit-trail mitigation in case of hard incompatibility.

**Address**: not a counterexample to the recommended primary stack; documents why the user-locked Cand 2 = primary verdict was the right architectural choice.

### Counterexample CE-5 — Sector classification heuristic NOT YET pinned

AC-8.2 freshness threshold (<6 mo active-conflict, <12 mo stable rear) requires a sector classification source. The `00_question_decomposition.md` C10 scope restructure 2026-05-08 deferred the sector classification heuristic to Plan-phase. **At research close, the project does not have a pinned source for "is this sector active-conflict or stable rear?"**. Operator-marked geofence vs Suite Service metadata vs other source is open.

**Address**: deferred to Plan-phase per user choice C `c10_scope=C` cross-coupling minimal. Surfaces as Plan-phase BLOCKING gate. Not a research-layer gap.

---

## Review Checklist

- [x] Draft conclusions consistent with Step 3 fact cards (cross-references across `02_fact_cards/Cx_*.md` files; every Fact # cited in `04_reasoning_chain.md` exists in the corresponding fact-card file).
- [x] No important dimensions missed — twelve dimensions (eight Decision Support + four project-mandatory) cover the AC + restrictions surface comprehensively per the Decomposition Completeness Probe checklist in `references/comparison-frameworks.md`.
- [x] No over-extrapolation — every L3 inferential cell is labeled ⚠️ Medium or ⚠️ Medium-High and tied to a Plan-phase Jetson MVE confirmation gate.
- [x] Conclusions are actionable/verifiable — every recommendation maps to a specific D-Cx-y decision in `99_cross_component_gates.md` with named owner + resolution path.
- [x] Every selected component/tool/pattern matches the Project Constraint Matrix — verified per row in `06_component_fit_matrix/Cx_*.md` Restrictions × Candidate-Modes sub-matrix sections.
- [x] Mismatches marked as disqualifiers instead of hidden as generic "limitations" — canonical SP+LightGlue (Magic Leap noncommercial) is the canonical example, called out explicitly as HARD DISQUALIFIER in D-C3-1.

### Issue found

- **One issue, partially resolved**: AC-8.2 sector-classification source is not pinned at research close (CE-5). Deferred to Plan-phase per `00_question_decomposition.md` C10 scope restructure user choice. Acknowledged as a Plan-phase BLOCKING gate, not a research-layer gap.

---

## Conclusions Requiring Revision

None at this stage. All five validation scenarios PASS under the recommended primary stack with documented mitigation paths for the three counterexamples (CE-1 K=10 → D-C3-3; CE-2 D-C5-5 = (a) → D-C5-5 = (c); CE-3 pure ESKF → ESKF+iSAM2 hybrid). CE-4 (UBX impersonation) is not a counterexample to the recommended stack but a documentation of why the user-locked Cand 2 verdict was correct. CE-5 (sector classification) is a Plan-phase deferred gate, not a research-layer revision.

---

## Sanity check on Step 7.5 Component Applicability Gate

Per `04_engine-analysis.md` Step 7.5.3: a candidate may not be `Selected` while any sub-matrix cell is ❌ or ❓.

**Component Fit Matrix scan** ([`06_component_fit_matrix/`](06_component_fit_matrix/)):
- C1: lead candidates Selected with documented MVE evidence; no open ❌ or ❓ on sub-matrix.
- C2: 5/5 mandatory pre-screen Selected with MVE evidence; conditional pre-screen extensions (AnyLoc/BoQ/DINOv2-VLAD) gated as `Experimental only` per D-C2-5 ViT export prerequisite — correctly NOT marked Selected.
- C3: lead candidates Selected with MVE evidence; canonical SP+LightGlue marked `Rejected` per D-C3-1 hard disqualifier.
- C4: 3 candidates with verdicts; OpenGV `Selected with runtime gate` is valid per the Step 7.5.3 carve-out for runtime-quality gates (D-C4-3 + D-C4-4 are research-layer gates that are closed at the documentary level; license-clearance-counsel-review remains as a Plan-phase routine task, not a runtime-quality gate).
- C5: 2 candidates Selected per closure verdict.
- C6: Cand 1 Selected; Cand 2 Deferred secondary per comparative-improvement verdict.
- C7: 3 candidates Selected per per-family roles.
- C8: 3 candidates Selected per per-FC + per-fallback roles.
- C10: 2 sub-areas Selected per cross-coupling-minimal scope.

**Result**: zero ❌, zero ❓ across all Selected candidates. **Step 7.5 Component Applicability Gate PASSES**. Solution draft (Step 8) may proceed without further blocking gates.