gps-denied-onboard/_docs/02_document/tests/blackbox-tests.md

# Blackbox Tests

> Tier markers (per `environment.md`): `pipeline` (T1), `deferred-corpus` (T2), `deferred-sitl` (T3), `deferred-hil` (T4), `deferred-field` (T5).
> Every test pairs an input/observable with a quantifiable expected result from `_docs/00_problem/input_data/expected_results/results_report.md` or directly from an AC.
> All tests run through the public interfaces defined in `environment.md`. No SUT-internal access.

## Positive Scenarios

### FT-P-01: 60-frame sequential pipeline — ≥80 % within 50 m (pipeline-correctness only)

**Summary**: Sequentially feed the 60 nav-cam JPGs through the SUT and verify the position-error CDF on this corpus.
**Traces to**: AC-1.1 (pipeline-correctness only — see `test-data.md` caveat), results_report row 1.
**Category**: Position Accuracy. Tier: T1 (`pipeline`).

**Preconditions**:
- `nav_cam_60_slice` mounted; `nav_cam_60_slice_imu` synthesised; `satellite_tiles_AD0000xx_z20` placeholder fixture present.
- SUT booted; cuVSLAM warmed; ArduPilot SITL loaded with the corresponding IMU replay; first valid GPS_INPUT received (i.e., AC-NEW-1 cleared).

**Input data**: `nav_cam_60_slice` + `coordinates.csv` + `nav_cam_60_slice_imu`.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Stream the 60 JPGs at 3 fps via the camera-input shim into the SUT | SUT publishes `GPS_INPUT` for each frame |
| 2 | Capture each `GPS_INPUT.lat / lon` at the qgc-mock sniffer | All frames produce a frame within the test window |
| 3 | Compute haversine error vs `coordinates.csv` ground truth per frame | Per-frame errors collected into a CDF |

**Expected outcome**: ≥80 % of frames have error < 50 m. Reported as **pipeline-functional**, not deployment-binding (per `test-data.md` caveat — deployment-binding number from FT-P-T2 / AerialVL).
**Max execution time**: 60 s per run.

---

### FT-P-02: 60-frame sequential pipeline — ≥50 % within 20 m (pipeline-correctness only)

**Summary**: Same corpus as FT-P-01; tighter tolerance.
**Traces to**: AC-1.2 (pipeline-correctness only), results_report row 2.
**Category**: Position Accuracy. Tier: T1.

**Preconditions / Input data**: same as FT-P-01.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay the corpus end-to-end | per-frame `GPS_INPUT` |
| 2 | Compute haversine error CDF | — |

**Expected outcome**: ≥50 % within 20 m on the 60-image slice (functional check). Deployment-binding number comes from AerialVL S03 in FT-P-T2.
**Max execution time**: 60 s.

---

### FT-P-T2: AC-1.1 / AC-1.2 deployment-binding accuracy on AerialVL S03

**Summary**: Re-run AC-1.1 / AC-1.2 on the deployment-binding corpus.
**Traces to**: AC-1.1, AC-1.2. Tier: T2 (`deferred-corpus`). `data_status: deferred-corpus`.

**Preconditions**: `aerialvl_s03` mounted with synced IMU + nav-cam stream + GPS truth.

**Input data**: AerialVL S03.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay AerialVL S03 70 km of fixed-wing flight at 1 km AGL | per-frame `GPS_INPUT` |
| 2 | Compute error CDF vs S03 GPS truth | — |

**Expected outcome**: ≥80 % within 50 m AND ≥50 % within 20 m (deployment-binding).
**Max execution time**: 90 min (replay + analysis).

---

### FT-P-03: Per-frame error bound ≤100 m

**Summary**: No single frame exceeds 100 m error on the 60-image slice.
**Traces to**: AC-1.1 (negative-tail bound), results_report row 3. Tier: T1.

**Preconditions / Input**: same as FT-P-01.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay 60 frames | per-frame GPS_INPUT |
| 2 | Compute max(haversine_err) over all frames | — |

**Expected outcome**: max error ≤ 100 m. Pipeline-functional only.
**Max execution time**: 60 s.

---

### FT-P-04: VO drift bound between satellite anchors

**Summary**: VO drift between successive satellite-anchored fixes stays bounded.
**Traces to**: AC-1.3, AC-NEW-8, results_report row 4, F-T1b. Tier: T1 functional + T2 binding.

**Preconditions**: cuVSLAM in mono+IMU mode (T1) AND mono-only mode (T2 split test).
**Input data**: `nav_cam_60_slice` (T1) + AerialVL S03 (T2).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Identify successive `satellite_anchored` source-label transitions | series of anchor pairs |
| 2 | For each anchor pair, measure VO-extrapolated centre vs next-anchor centre | drift in metres |
| 3 | Compute 95th percentile across all pairs | — |

**Expected outcome**:
- mono+IMU: p95 drift ≤ 50 m (binding on T2 / AerialVL).
- mono-only: p95 drift ≤ 100 m (binding on T2 / AerialVL).
- T1 functional check: drift bounded (no monotonic growth) — exact numbers not deployment-binding.

**Max execution time**: 90 min (T2).

---

### FT-P-05: GPS_INPUT shape under normal tracking

**Summary**: GPS_INPUT messages emitted while tracking is healthy carry the correct schema and value ranges.
**Traces to**: AC-1.4, AC-4.3, AC-6.3, results_report row 5. Tier: T1.

**Preconditions**: SUT in steady-state tracking with recent satellite anchor (<30 s old).
**Input data**: any single frame from `nav_cam_60_slice`.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Sniff GPS_INPUT at qgc-mock | one frame per nav-cam frame |
| 2 | Decode fields: `fix_type`, `horiz_accuracy`, `satellites_visible`, `lat`, `lon`, `alt`, `vel_acc` | as per MAVLink GPS_INPUT spec |
| 3 | Inspect optional ODOMETRY: assert intentional absence in v1 (per AC-4.3 v1-scope clause) | no ODOMETRY frames present |

**Expected outcome**: `fix_type == 3`, `horiz_accuracy ∈ [1, 50] m`, `satellites_visible == 10`, `lat / lon` non-null, WGS84. ODOMETRY count == 0 across the run.
**Max execution time**: 30 s.

---

### FT-P-06: GPS_INPUT shape during VO-only fallback

**Summary**: Fields adapt when no satellite anchor is available for >30 s.
**Traces to**: AC-1.4, AC-4.3, results_report row 6. Tier: T1.

**Preconditions**: Force satellite-match failure for >30 s (cache poisoned with stale tiles).

**Input data**: `nav_cam_60_slice` with `stale_tile_scenarios` injected.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | After 30 s of failed matches, sniff GPS_INPUT | `fix_type == 3`, `horiz_accuracy ∈ [20, 100]` m, source-label `vo_extrapolated` |

**Expected outcome**: as above; horiz_accuracy grows monotonically until next successful match.
**Max execution time**: 60 s.

---

### FT-P-07: GPS_INPUT shape during dead-reckoning

**Summary**: VO lost AND no satellite → IMU-only dead reckoning.
**Traces to**: AC-1.4, AC-5.2, results_report row 7. Tier: T1.

**Preconditions**: Inject cuVSLAM tracking-loss + cache poisoned.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Sniff GPS_INPUT | `fix_type == 2`, `horiz_accuracy ≥ 50 m` and growing |
| 2 | Source label | `dead_reckoned` |

**Expected outcome**: `fix_type == 2`, monotonically growing horiz_accuracy, `source == dead_reckoned`.
**Max execution time**: 60 s.

---

### FT-P-08: GPS_INPUT shape on total failure

**Summary**: 3+ consecutive failures — system signals total failure.
**Traces to**: AC-3.4, results_report row 8. Tier: T1.

**Preconditions**: `cache_poisoning_scenarios` flavour that causes 3 sat failures + cuVSLAM lost.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Wait for 3 consecutive failures | GPS_INPUT continues at the configured rate |
| 2 | Inspect GPS_INPUT | `fix_type == 0`, `horiz_accuracy == 999.0` |
| 3 | Inspect STATUSTEXT | RELOC_REQ regex emitted |

**Expected outcome**: as above.
**Max execution time**: 60 s.

---

### FT-P-09: Confidence tier transitions

**Summary**: Confidence tier label transitions match defined conditions.
**Traces to**: AC-1.4, results_report rows 10–13. Tier: T1.

**Preconditions**: scripted scenario that walks (HIGH → MEDIUM → LOW → FAILED).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | At each scripted state, read the SSE stream confidence field AND the source-label field | matches expected tier |

**Expected outcome**:
- Sat anchor <30 s + cov <400 m² → tier `HIGH`, source `satellite_anchored`.
- cuVSLAM OK + no sat >30 s → tier `MEDIUM`, source `vo_extrapolated`.
- cuVSLAM lost + IMU only → tier `LOW`, source `dead_reckoned`.
- 3+ consecutive failures → tier `FAILED`, fix_type 0.

**Max execution time**: 5 min.

---

### FT-P-10: Image registration rate (functional)

**Summary**: Pipeline registers ≥95 % of normal-flight frames against the previous frame.
**Traces to**: AC-2.1 (pipeline-functional only), results_report row 14. Tier: T1 functional + T2 binding.

**Preconditions**: SUT exposes registration outcome via STATUSTEXT or NAMED_VALUE_FLOAT (`reg_pass_count`, `reg_total_count`).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay `nav_cam_60_slice` (T1) or AerialVL S03 (T2) | registration metrics published |
| 2 | Compute `reg_pass_count / reg_total_count` | percentage |

**Expected outcome**: T1 ≥95 % (functional); T2 ≥95 % (deployment-binding) under normal-flight definition (nadir, ±10° bank/pitch, ≥40 % overlap, daytime, season-matched tile).
**Max execution time**: 60 s (T1) / 90 min (T2).

---

### FT-P-11: Mean Reprojection Error (MRE)

**Summary**: VO and cross-domain MRE under thresholds.
**Traces to**: AC-2.2, results_report row 15. Tier: T1 functional + T2 binding.

**Preconditions**: SUT publishes `mre_vo` (frame-to-frame) and `mre_cross` (cross-view) on the metrics endpoint.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Scrape MRE metrics over a replay | per-frame samples |
| 2 | Compute mean across the run | — |

**Expected outcome**: `mean(mre_vo) < 1.0 px`; `mean(mre_cross) < 2.5 px`. T1 numbers functional only.
**Max execution time**: 60 s (T1) / 90 min (T2).

---

### FT-P-12: Continuous output through turn area (frames 32–43)

**Summary**: SUT keeps producing position estimates through the turn segment of `coordinates.csv`.
**Traces to**: AC-3.2, AC-4.4, results_report row 16. Tier: T1.

**Preconditions**: standard pipeline replay.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay frames 32–43 | per-frame GPS_INPUT |
| 2 | Count outputs vs frames | — |

**Expected outcome**: ≥1 GPS_INPUT per nav-cam frame in the turn region.
**Max execution time**: 30 s.

---

### FT-P-13: 350 m outlier handled (AC-3.1)

**Summary**: Pipeline survives a synthetic 350 m gap between consecutive frames (caused by ±20° tilt outlier).
**Traces to**: AC-3.1, results_report row 17. Tier: T1.

**Input data**: synthetic two-frame pair with 350 m gap injected into `nav_cam_60_slice` mid-replay.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Inject the outlier pair | SUT emits a `vo_extrapolated` or `dead_reckoned` frame, not corrupted output |
| 2 | Continue with next valid frame | error returns to ≤100 m within next valid frame |

**Expected outcome**: error ≤ 100 m on the next valid frame after the outlier.
**Max execution time**: 60 s.

---

### FT-P-14: Sharp-turn re-localization (AC-3.2)

**Summary**: Sharp turn (<5 % overlap, <70°, <200 m drift) — VO fails, satellite re-loc recovers.
**Traces to**: AC-3.2, F-T7, results_report row 18. Tier: T1.

**Input data**: synthetic sharp-turn pair injected into `nav_cam_60_slice`.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Inject the sharp-turn pair | cuVSLAM tracking lost; VPR triggers; matcher re-localizes |
| 2 | Track error over next 3 frames | error ≤ 50 m within 3 frames |

**Expected outcome**: error ≤ 50 m within 3 frames of the turn.
**Max execution time**: 60 s.

---

### FT-P-15: VO loss → satellite recovery → tracking_state == NORMAL

**Summary**: After cuVSLAM tracking loss + sat match success, tracking_state returns to NORMAL.
**Traces to**: AC-3.2, AC-3.3, results_report row 19. Tier: T1.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Force cuVSLAM tracking-loss; deliver a fresh tile that matches | matcher emits absolute pose; calibrator emits satellite-anchored fix |
| 2 | Observe FC EKF3 reconvergence via `EKF_STATUS_REPORT` | EKF3 reconverges |
| 3 | Read SUT-published `tracking_state` | == `NORMAL` |

**Expected outcome**: tracking_state == NORMAL within bounded time.
**Max execution time**: 60 s.

---

### FT-P-16: Cold-start TTFF ≤30 s p95

**Summary**: From companion-computer boot, first valid GPS_INPUT within 30 s.
**Traces to**: AC-NEW-1, results_report row 23, F-T11. Tier: T1 statistical (≤10 boots) + T4 binding (50 boots on real HW).

**Preconditions**: SUT image cold (no warmed engines); FC providing `GLOBAL_POSITION_INT` simulating IMU-extrapolated pose.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Boot SUT container | container start logged |
| 2 | Time from container start to first valid `fix_type==3` GPS_INPUT | t_ttff |
| 3 | Repeat N times (N=10 T1 / N=50 T4) | distribution |

**Expected outcome**: 95th percentile of t_ttff ≤ 30 s.
**Max execution time**: 10 min (T1) / 30 min (T4).

---

### FT-P-17: Validate initial position via first satellite match

**Summary**: First satellite match after startup pulls position to ≤50 m.
**Traces to**: AC-5.1, AC-NEW-1, results_report row 24. Tier: T1.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Provide `GLOBAL_POSITION_INT` with a deliberate 200 m offset from truth | SUT seeds pipeline with 200 m uncertainty |
| 2 | Replay first frame with valid satellite tile | matcher succeeds; calibrator emits anchored fix |
| 3 | Read GPS_INPUT lat/lon | error ≤ 50 m |

**Expected outcome**: position error ≤ 50 m after first match.
**Max execution time**: 90 s.

---

### FT-P-18: Mid-flight reboot recovery ≤30 s

**Summary**: Process kill mid-flight; SUT recovers within AC-NEW-1 budget.
**Traces to**: AC-5.3, AC-NEW-1, results_report row 25. Tier: T1.

**Preconditions**: SUT in steady-state tracking; FC continues to fly.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Send SIGKILL to SUT container | SUT restarts |
| 2 | Time from restart to first `fix_type==3` GPS_INPUT | t_recovery |

**Expected outcome**: t_recovery ≤ 30 s.
**Max execution time**: 60 s.

---

### FT-P-19: Post-reboot first-match accuracy

**Summary**: After reboot, first satellite match restores accuracy.
**Traces to**: AC-5.3, results_report row 26. Tier: T1.

**Steps**: same as FT-P-17 but starting from a reboot.

**Expected outcome**: error ≤ 50 m after first match.
**Max execution time**: 90 s.

---

### FT-P-20: Object localization (level flight)

**Summary**: `POST /objects/locate` returns lat/lon for an object pixel given known UAV pose.
**Traces to**: AC-7.1, AC-7.2, results_report row 27. Tier: T1.

**Preconditions**: SUT has a known anchored fix; AI camera gimbal pose injected via FC `ATTITUDE`.
**Input data**: pixel coordinates + gimbal angle + zoom + altitude in request body.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `POST /objects/locate` with pixel_x, pixel_y, gimbal_pitch, gimbal_yaw, zoom, altitude | 200 + JSON `{lat, lon, alt, accuracy_m, confidence}` |
| 2 | Compare to ground truth | error ≤ accuracy_m |

**Expected outcome**: lat/lon within `accuracy_m` of ground truth; in level flight, `accuracy_m` consistent with frame-center accuracy of the GPS-Denied system. In maneuvering flight, response includes the `altitude × |sin(unknown_bank_or_pitch)|` bound (AC-7.1 second clause) when bank/pitch >5°.
**Max execution time**: 5 s.

---

### FT-P-21: Coordinate transform round-trip ≤0.1 m

**Summary**: GPS → NED → pixel → GPS round-trip preserves position.
**Traces to**: AC-6.3, AC-7.2, results_report row 29. Tier: T1.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Submit a known WGS84 point through the round-trip via `/objects/locate` (or a debug endpoint if exposed) | round-trip lat/lon |
| 2 | Compare to original | ≤ 0.1 m |

**Expected outcome**: round-trip error ≤ 0.1 m.
**Max execution time**: 1 s.

---

### FT-P-22: `GET /health` schema and content

**Summary**: Health endpoint returns 200 with required fields.
**Traces to**: AC-6.1 (telemetry), results_report row 30. Tier: T1.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `GET /health` | HTTP 200, JSON body |
| 2 | Validate schema | contains `status`, `memory_mb`, `gpu_temp_c`, `tracking_state`, `last_anchor_age_s`, `confidence_tier` |

**Expected outcome**: as above; `status ∈ {ok, degraded, failed}`.
**Max execution time**: 1 s.

---

### FT-P-23: `POST /sessions` returns id

**Traces to**: AC-6.1, results_report row 31. Tier: T1.

**Steps**: `POST /sessions` (auth) → 200/201 with session id.

**Expected outcome**: status ∈ {200, 201}; body has `session_id` matching `^[a-f0-9-]{36}$`.
**Max execution time**: 1 s.

---

### FT-P-24: SSE stream emits per-second events

**Traces to**: AC-6.1, results_report row 32. Tier: T1.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `GET /sessions/{id}/stream` | SSE connection; events emitted at ~1 Hz |
| 2 | Sample 30 s of stream | each event matches schema: `type`, `timestamp`, `lat`, `lon`, `alt`, `accuracy_h`, `confidence`, `vo_status` |

**Expected outcome**: rate 1 Hz ± 0.2 Hz; all events conform to schema.
**Max execution time**: 35 s.

---

### FT-P-25: TRT engine load ≤10 s

**Traces to**: AC-NEW-1 (sub-budget), results_report row 39. Tier: T1 (synthetic timing) + T4 (real HW).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | At SUT boot, time from container start to "all engines ready" STATUSTEXT | t_engines |

**Expected outcome**: t_engines ≤ 10 s.
**Max execution time**: 30 s.

---

### FT-P-26: Tile storage size for the operational area

**Traces to**: AC-8.3, restrictions §UAV/Satellite, results_report row 40. Tier: T1.

**Preconditions**: a 200 km mission path × ±2 km buffer × z=18 + z=20 fixture loaded.

**Steps**: read total bytes under `/probe/tiles/`.

**Expected outcome**: 300 MB ≤ size ≤ 1000 MB. (Aligned with restriction's ~10 GB persistent cap for full 400 km².)
**Max execution time**: 5 s.

---

### FT-P-27: Tile mosaic coverage radius ≥500 m

**Traces to**: AC-8.3, results_report row 41. Tier: T1.

**Preconditions**: SUT given EKF position with σ_xy.

**Steps**: capture the assembled mosaic bbox via STATUSTEXT or a debug endpoint.

**Expected outcome**: mosaic radius ≥ 500 m around current position.
**Max execution time**: 5 s.

---

### FT-P-28: Tile dedup — ≤1 onboard tile per ground sector

**Traces to**: AC-8.4, F-T2. Tier: T1.

**Preconditions**: `tile_dedup_replay` (sectors visited ≥2×).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay the flight | onboard tiles written |
| 2 | Inspect MBTiles + sidecar in `/probe/tiles/` | per-sector tile count |

**Expected outcome**: per-sector count ≤ 1; latest/highest-quality wins.
**Max execution time**: 10 min.

---

### FT-P-29: Post-flight upload to candidate pool

**Traces to**: AC-8.4, F-T3. Tier: T1.

**Preconditions**: `service-stub` running.

**Steps**: replay → on landing-event, SUT uploads tiles.

**Expected outcome**: `service-stub` records ≥1 tile with `trust_level=candidate`; promotion only after N≥2 voting flights (so a single flight does not promote).
**Max execution time**: 5 min.

---

### FT-P-30: NAMED_VALUE_FLOAT telemetry rate

**Traces to**: AC-6.1, results_report row 45. Tier: T1.

**Steps**: sniff `gps_conf`, `gps_drift`, `gps_hacc` NAMED_VALUE_FLOAT rates over 30 s.

**Expected outcome**: each at 1 Hz ± 0.2 Hz.
**Max execution time**: 35 s.

---

### FT-P-31: Disconnected segments — ≥3 connected via global retrieval

**Traces to**: AC-3.3, F-T8. Tier: T1.

**Preconditions**: `disconnected_segments_replay` with ≥3 segments.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay each segment with a synthetic gap | for each segment, VPR retrieves top-K candidates; matcher relocalizes |
| 2 | Verify segment-to-segment trajectory continuity | each segment connects to prior trajectory |

**Expected outcome**: 3/3 segments connect within 10 frames of segment start; `tracking_state == NORMAL` after each.
**Max execution time**: 5 min.

---

### FT-P-32: Position refinement / corrections (AC-4.5)

**Traces to**: AC-4.5. Tier: T1.

**Preconditions**: SUT in steady state; ability to refine prior fixes.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Capture sequence of GPS_INPUT for a 10-s window | per-frame fixes |
| 2 | After delayed loop closure / late satellite match, observe whether SUT emits a corrected fix or signals correction via STATUSTEXT | a follow-up GPS_INPUT for an earlier `time_usec` OR a STATUSTEXT correction record |

**Expected outcome**: at least one correction event where the corrected fix replaces the prior fix's `h_acc` (covariance shrinks). System never silently rewrites past output without recording the correction.
**Max execution time**: 60 s.

---

### FT-P-33: Object-localize bank/pitch >5° publishes uncertainty bound

**Traces to**: AC-7.1 (second clause). Tier: T1.

**Preconditions**: FC `ATTITUDE` published with bank > 5°.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `POST /objects/locate` while bank > 5° | response includes bound = `altitude × abs(sin(bank_or_pitch))` |

**Expected outcome**: response body includes `bank_pitch_bound_m` matching the formula within 1 m.
**Max execution time**: 5 s.

---

### FT-P-34: Mid-flight tile generation respects σ_xy ≤ 5 m hard gate

**Traces to**: AC-8.4, AC-NEW-7 hard gate. Tier: T1.

**Preconditions**: scripted scenarios with σ_xy ∈ {2, 4, 6, 8} m.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay σ_xy=2 m frames | tiles written |
| 2 | Replay σ_xy=8 m frames | NO tiles written |
| 3 | Inspect sidecar `trust_level` for σ_xy ∈ (3, 5] m | `trust_level == soft` |
| 4 | Inspect sidecar for σ_xy ≤ 3 m | `trust_level == candidate` |

**Expected outcome**: as above.
**Max execution time**: 5 min.

---

### FT-P-35: NF-T4b cache-poisoning safety budget (Monte Carlo)

**Traces to**: AC-NEW-7. Tier: T2 (`deferred-corpus`). `data_status: deferred-corpus`.

**Preconditions**: ≥100 simulated flights worth of frames from AerialVL + Mavic + AerialExtreMatch with synthetic over-confidence injection (1.5×–3×).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay all flights | per-tile geo-misalignment captured |
| 2 | Compute P(misalign > 30 m) and P(misalign > 100 m) | — |

**Expected outcome**: P(>30 m) < 1 %, P(>100 m) < 0.1 %.
**Max execution time**: 4 h.

---

### FT-P-36: AC-NEW-9 covariance calibration accuracy

**Traces to**: AC-NEW-9, F-T18. Tier: T2.

**Preconditions**: AerialVL S03 replay with ground truth.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | For each emitted GPS_INPUT, capture (`h_acc`, ground-truth error) | series of pairs |
| 2 | Compute fraction of frames where `error ≤ h_acc * Mahalanobis-2D-95% factor` | fraction |

**Expected outcome**: fraction ≥ 95 % (calibration neither over- nor under-claims).
**Max execution time**: 90 min.

---

### FT-P-37: F-T18 calibrator regression (no state propagation)

**Traces to**: AC-NEW-9, F-T18. Tier: T2.

**Preconditions**: replay with logging hooks on Component 5 outputs (publicly exposed counters).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run replay | calibrator counters emitted |
| 2 | Assert `state_propagation_invocations_total == 0` | no propagation |
| 3 | Assert `mahalanobis_gate_rejections_total > 0` | gate active |

**Expected outcome**: as above.
**Max execution time**: 90 min.

---

## Negative Scenarios

### FT-N-01: Corrupted nav-cam frame — no crash, degraded mode

**Traces to**: AC-3.x (resilience), restriction "fixed downward camera". Tier: T1.

**Input data**: a 60-frame replay with frame N replaced by a 10-byte random blob.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Stream the replay | SUT logs decode error; emits STATUSTEXT WARN |
| 2 | Inspect tracking_state | transitions to `DEGRADED` for 1 frame; recovers to `NORMAL` on next valid frame |
| 3 | SUT process | does NOT crash |

**Expected outcome**: process alive; no GPS_INPUT spike with bad data; tracking_state returns to NORMAL within 1 frame of recovery.
**Max execution time**: 30 s.

---

### FT-N-02: Object-localize invalid pixel

**Traces to**: AC-7.1, results_report row 28. Tier: T1.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `POST /objects/locate` with `pixel_x = -10` (out of frame) | HTTP 422 + error body |

**Expected outcome**: status == 422; body contains a structured error code.
**Max execution time**: 1 s.

---

### FT-N-03: Unauthenticated `POST /sessions`

**Traces to**: results_report row 33, security restrictions. Tier: T1.

**Steps**: `POST /sessions` without JWT → 401.

**Expected outcome**: status == 401.
**Max execution time**: 1 s.

---

### FT-N-04: Stale tile beyond grace — must NOT label `satellite_anchored`

**Traces to**: AC-8.2, AC-NEW-6. Tier: T1.

**Preconditions**: `stale_tile_scenarios` with 18-month-old active-conflict tile (well past 6 mo + 30-day grace).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay frames whose only candidate tile is the 18 mo stale one | matcher invocation skipped or scored 0 |
| 2 | Inspect source label on emitted GPS_INPUT | NEVER `satellite_anchored` |
| 3 | Inspect WARN STATUSTEXT | tile rejected event recorded |

**Expected outcome**: no `satellite_anchored` label across the run; rejection event recorded.
**Max execution time**: 60 s.

---

### FT-N-05: Stale tile in 30-day grace — confidence linearly decayed

**Traces to**: AC-NEW-6. Tier: T1.

**Preconditions**: tiles aged at +0, +15, +30 days past the 6-mo budget.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay each | confidence weight in sidecar metric: 1.0, 0.5, 0.0 |

**Expected outcome**: confidence weight decays linearly as specified.
**Max execution time**: 60 s.

---

### FT-N-06: Sharp-turn negative — must trigger satellite re-loc, not silently fail

**Traces to**: AC-3.2 (negative case). Tier: T1.

**Steps**: same as FT-P-14 but assert that **before** re-loc the SUT emits a STATUSTEXT explaining VO loss; assert tracking_state transitions through DEGRADED.

**Expected outcome**: explicit STATUSTEXT log; recovery within 3 frames.
**Max execution time**: 60 s.

---

### FT-N-07: 3-consecutive-failure → RELOC_REQ on STATUSTEXT

**Traces to**: AC-3.4, results_report rows 20, 46. Tier: T1.

**Steps**: see FT-P-08; additionally verify the regex `RELOC_REQ:.*last_lat=.*last_lon=.*uncertainty=.*m`.

**Expected outcome**: regex matches at least one STATUSTEXT after 3 failures; emitted within 2 s of the third failure (per AC-3.4 timing).
**Max execution time**: 60 s.

---

### FT-N-08: Re-loc waiting state behaviour

**Traces to**: AC-3.4, results_report row 21. Tier: T1.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | After RELOC_REQ, observe SUT for 10 s | `fix_type == 0` GPS_INPUT continues; IMU-prediction-only label; satellite-match attempts continue (counter increments) |

**Expected outcome**: as above; SUT does NOT stop emitting GPS_INPUT.
**Max execution time**: 30 s.

---

### FT-N-09: Operator hint — used as 500 m seed

**Traces to**: AC-6.2, AC-3.4, results_report row 22. Tier: T1.

**Preconditions**: SUT in re-loc-waiting; operator hint scenario active.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `qgc-mock` sends STATUSTEXT `RELOC_HINT: lat=… lon=… sigma=500m` | SUT consumes hint; uses as seed for VPR/cross-view |
| 2 | First fix after hint | error ≤ 500 m initially |
| 3 | After next satellite match | error ≤ 50 m |

**Expected outcome**: as above.
**Max execution time**: 60 s.

---

### FT-N-10: Operator hint — malformed value rejected

**Traces to**: AC-6.2 (negative). Tier: T1.

**Steps**: send `RELOC_HINT: lat=NaN lon=… sigma=-10`.

**Expected outcome**: SUT emits STATUSTEXT WARN; hint NOT applied; pipeline state unchanged.
**Max execution time**: 30 s.

---

### FT-N-11: AC-4.3 — ODOMETRY intentionally absent in v1

**Traces to**: AC-4.3 (v1-scope clause), F-T9 Option A. Tier: T1.

**Preconditions**: SUT configured for v1 (default).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run any of FT-P-01 / FT-P-04 / FT-P-T2 | GPS_INPUT emitted |
| 2 | At qgc-mock, count ODOMETRY frames over the run | == 0 |
| 3 | Inspect `EK3_SRC1_*` configuration via FC parameter readback | `POSXY=GPS, VELXY=GPS, YAW=GPS+Compass` |

**Expected outcome**: ODOMETRY count == 0; FC parameters as configured.
**Max execution time**: 60 s.

---

### FT-N-12: Spoofed GPS — SUT promotes within 3 s

**Traces to**: AC-NEW-2, F-T12. Tier: T3 (`deferred-sitl`). `data_status: deferred-sitl`.

**Preconditions**: SITL + `gps-spoof-injector` configured.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | At t=0, inject a malicious `GPS_RAW_INT` with a 1 km offset | FC sees both spoof + SUT GPS_INPUT |
| 2 | Time from spoof onset to SUT promoting its `GPS_INPUT` to primary (raised `fix_type=3` AND STATUSTEXT promotion event) | t_promote |
| 3 | Repeat 50× | distribution |

**Expected outcome**: 95th percentile of t_promote < 3 s.
**Max execution time**: 30 min.

---

### FT-N-13: Failsafe at 3 s no-fix (AC-5.2)

**Traces to**: AC-5.2. Tier: T1+T3.

**Preconditions**: scripted scenario where SUT cannot produce ANY estimate for 3.5 s.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Force pipeline blackout | SUT logs failure |
| 2 | Verify FC behaviour | ArduPilot SITL logs fall-back to IMU-only dead reckoning |

**Expected outcome**: failsafe transition observable in `EKF_STATUS_REPORT` within 4 s of blackout.
**Max execution time**: 60 s.

---

### FT-N-14: Refusal of unsigned MAVLink (S-T1 boundary)

**Traces to**: restrictions §Sensors. Tier: T3.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Send a GPS_INPUT with invalid signing tag from the runner | FC rejects |
| 2 | Inspect FC log + STATUSTEXT to GCS | rejection event recorded |

**Expected outcome**: rejected; FC continues to fly on prior valid source.
**Max execution time**: 30 s.

---

### FT-N-15: SITL F-T9 Option A regression — EKF3 fuses GPS_INPUT only, no double-fusion

**Traces to**: AC-4.3, F-T9 Option A. Tier: T3.

**Preconditions**: SITL with `EK3_SRC1_*=GPS+Compass`, `EK3_SRC2_*=GPS`.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run a representative AerialVL replay | EKF3 fuses GPS_INPUT |
| 2 | Inspect EKF3 logs for double-fusion symptoms (issues #30076, #32506) | none |
| 3 | Trigger backup-GPS failover via SITL parameter | EKF3 switches to `EK3_SRC2_*` cleanly |

**Expected outcome**: no double-fusion; clean failover.
**Max execution time**: 30 min.

---

### FT-N-16: SITL F-T9 Option B regression (v1.1 candidate)

**Traces to**: AC-4.3 Option B (v1.1+), F-T9. Tier: T3 (`deferred-sitl`). `data_status: deferred-sitl`.

**Preconditions**: SITL with PR #30080-class build; SUT switched to ODOMETRY-primary mode (build flag).

**Steps**: ODOMETRY primary; GPS_INPUT held in reserve; verify clean source-switching, no double-fusion.

**Expected outcome**: as above. **Test runs but build flag is OFF for v1 release gate.**
**Max execution time**: 30 min.

---

### FT-N-17: AC-NEW-7 single-flight tile NOT promoted to trusted basemap

**Traces to**: AC-NEW-7 (Service-side voting). Tier: T1 (`service-stub`).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run a single-flight upload to candidate pool | tile recorded as `trust_level=candidate` |
| 2 | Query `service-stub` for promoted basemap content | tile NOT in promoted basemap |

**Expected outcome**: as above; promotion only after N≥2 voting flights confirm.
**Max execution time**: 5 min.

---

### FT-N-18: AC-8.5 — raw frames are NOT retained in FDR

**Traces to**: AC-8.5, AC-NEW-3. Tier: T1.

**Preconditions**: replay 60-frame slice; `nav_cam_60_slice` written to camera input.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run the replay | FDR populates |
| 2 | Inspect `/probe/fdr/` for raw nav-cam frames | no JPEGs / no AI-cam frames |
| 3 | Inspect for thumbnail log of failed-tile-generation frames | present, ≤0.1 Hz, within FDR cap |

**Expected outcome**: no raw frames retained; only the failure thumbnail log within budget.
**Max execution time**: 60 s.

---

### FT-N-19: Free public Sentinel-2 tile rejected at cache boundary

**Traces to**: AC-8.1 (resolution floor), restrictions §Satellite. Tier: T1.

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Inject a synthetic tile at 10 m/px into the cache | cache index marks as below-resolution |
| 2 | Replay frames over that area | matcher does NOT use the tile; never emits `satellite_anchored` from it |

**Expected outcome**: as above.
**Max execution time**: 60 s.

---

### FT-N-20: Photo-count cap removed — system runs without arbitrary cap

**Traces to**: restrictions §UAV ("no photo-count cap"). Tier: T1 (smoke).

**Steps**:

| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay `synthetic_8h_load` for 30 min | SUT continues operating |
| 2 | Inspect logs for any "photo count exceeded" condition | none |

**Expected outcome**: no cap-related condition; pipeline degrades only against FDR cap (AC-NEW-3) and tile-cache cap.
**Max execution time**: 35 min.

---

### FT-N-21: AC-7.1 in maneuvering flight — uncertainty bound published, not a fixed number

**Traces to**: AC-7.1 second clause. Tier: T1.

**Steps**: like FT-P-33 but assert that for bank ∈ {0°, 5°, 15°, 30°}, the published `bank_pitch_bound_m` matches `altitude × |sin(bank)|` within 1 m.

**Expected outcome**: bound published correctly across the range.
**Max execution time**: 30 s.

---

## Coverage notes

- **Pipeline-correctness boundary**: T1 tests on the 60-image slice are NOT deployment-binding. AC-1.1, AC-1.2, AC-2.1, AC-2.2, AC-1.3, AC-NEW-8 deployment numbers come from T2 (FT-P-T2, FT-P-04 binding split, FT-P-10 binding, FT-P-11 binding).
- **Behavioural-shape tests**: FT-P-08, FT-P-15, FT-P-16, FT-P-18, FT-N-04, FT-N-11, FT-N-12, FT-N-13, FT-N-14, FT-N-17, FT-N-18, FT-N-19 use the behavioural shape (trigger + observable + quantifiable verdict) — no input-data input/output mapping required.
- **Untraced tests**: none. Every test traces to ≥1 AC or restriction.