Files
gps-denied-onboard/_docs/02_document/tests/blackbox-tests.md
T

1033 lines
36 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Blackbox Tests
> Tier markers (per `environment.md`): `pipeline` (T1), `deferred-corpus` (T2), `deferred-sitl` (T3), `deferred-hil` (T4), `deferred-field` (T5).
> Every test pairs an input/observable with a quantifiable expected result from `_docs/00_problem/input_data/expected_results/results_report.md` or directly from an AC.
> All tests run through the public interfaces defined in `environment.md`. No SUT-internal access.
## Positive Scenarios
### FT-P-01: 60-frame sequential pipeline — ≥80 % within 50 m (pipeline-correctness only)
**Summary**: Sequentially feed the 60 nav-cam JPGs through the SUT and verify the position-error CDF on this corpus.
**Traces to**: AC-1.1 (pipeline-correctness only — see `test-data.md` caveat), results_report row 1.
**Category**: Position Accuracy. Tier: T1 (`pipeline`).
**Preconditions**:
- `nav_cam_60_slice` mounted; `nav_cam_60_slice_imu` synthesised; `satellite_tiles_AD0000xx_z20` placeholder fixture present.
- SUT booted; cuVSLAM warmed; ArduPilot SITL loaded with the corresponding IMU replay; first valid GPS_INPUT received (i.e., AC-NEW-1 cleared).
**Input data**: `nav_cam_60_slice` + `coordinates.csv` + `nav_cam_60_slice_imu`.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Stream the 60 JPGs at 3 fps via the camera-input shim into the SUT | SUT publishes `GPS_INPUT` for each frame |
| 2 | Capture each `GPS_INPUT.lat / lon` at the qgc-mock sniffer | All frames produce a frame within the test window |
| 3 | Compute haversine error vs `coordinates.csv` ground truth per frame | Per-frame errors collected into a CDF |
**Expected outcome**: ≥80 % of frames have error < 50 m. Reported as **pipeline-functional**, not deployment-binding (per `test-data.md` caveat — deployment-binding number from FT-P-T2 / AerialVL).
**Max execution time**: 60 s per run.
---
### FT-P-02: 60-frame sequential pipeline — ≥50 % within 20 m (pipeline-correctness only)
**Summary**: Same corpus as FT-P-01; tighter tolerance.
**Traces to**: AC-1.2 (pipeline-correctness only), results_report row 2.
**Category**: Position Accuracy. Tier: T1.
**Preconditions / Input data**: same as FT-P-01.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay the corpus end-to-end | per-frame `GPS_INPUT` |
| 2 | Compute haversine error CDF | — |
**Expected outcome**: ≥50 % within 20 m on the 60-image slice (functional check). Deployment-binding number comes from AerialVL S03 in FT-P-T2.
**Max execution time**: 60 s.
---
### FT-P-T2: AC-1.1 / AC-1.2 deployment-binding accuracy on AerialVL S03
**Summary**: Re-run AC-1.1 / AC-1.2 on the deployment-binding corpus.
**Traces to**: AC-1.1, AC-1.2. Tier: T2 (`deferred-corpus`). `data_status: deferred-corpus`.
**Preconditions**: `aerialvl_s03` mounted with synced IMU + nav-cam stream + GPS truth.
**Input data**: AerialVL S03.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay AerialVL S03 70 km of fixed-wing flight at 1 km AGL | per-frame `GPS_INPUT` |
| 2 | Compute error CDF vs S03 GPS truth | — |
**Expected outcome**: ≥80 % within 50 m AND ≥50 % within 20 m (deployment-binding).
**Max execution time**: 90 min (replay + analysis).
---
### FT-P-03: Per-frame error bound ≤100 m
**Summary**: No single frame exceeds 100 m error on the 60-image slice.
**Traces to**: AC-1.1 (negative-tail bound), results_report row 3. Tier: T1.
**Preconditions / Input**: same as FT-P-01.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay 60 frames | per-frame GPS_INPUT |
| 2 | Compute max(haversine_err) over all frames | — |
**Expected outcome**: max error ≤ 100 m. Pipeline-functional only.
**Max execution time**: 60 s.
---
### FT-P-04: VO drift bound between satellite anchors
**Summary**: VO drift between successive satellite-anchored fixes stays bounded.
**Traces to**: AC-1.3, AC-NEW-8, results_report row 4, F-T1b. Tier: T1 functional + T2 binding.
**Preconditions**: cuVSLAM in mono+IMU mode (T1) AND mono-only mode (T2 split test).
**Input data**: `nav_cam_60_slice` (T1) + AerialVL S03 (T2).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Identify successive `satellite_anchored` source-label transitions | series of anchor pairs |
| 2 | For each anchor pair, measure VO-extrapolated centre vs next-anchor centre | drift in metres |
| 3 | Compute 95th percentile across all pairs | — |
**Expected outcome**:
- mono+IMU: p95 drift ≤ 50 m (binding on T2 / AerialVL).
- mono-only: p95 drift ≤ 100 m (binding on T2 / AerialVL).
- T1 functional check: drift bounded (no monotonic growth) — exact numbers not deployment-binding.
**Max execution time**: 90 min (T2).
---
### FT-P-05: GPS_INPUT shape under normal tracking
**Summary**: GPS_INPUT messages emitted while tracking is healthy carry the correct schema and value ranges.
**Traces to**: AC-1.4, AC-4.3, AC-6.3, results_report row 5. Tier: T1.
**Preconditions**: SUT in steady-state tracking with recent satellite anchor (<30 s old).
**Input data**: any single frame from `nav_cam_60_slice`.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Sniff GPS_INPUT at qgc-mock | one frame per nav-cam frame |
| 2 | Decode fields: `fix_type`, `horiz_accuracy`, `satellites_visible`, `lat`, `lon`, `alt`, `vel_acc` | as per MAVLink GPS_INPUT spec |
| 3 | Inspect optional ODOMETRY: assert intentional absence in v1 (per AC-4.3 v1-scope clause) | no ODOMETRY frames present |
**Expected outcome**: `fix_type == 3`, `horiz_accuracy ∈ [1, 50] m`, `satellites_visible == 10`, `lat / lon` non-null, WGS84. ODOMETRY count == 0 across the run.
**Max execution time**: 30 s.
---
### FT-P-06: GPS_INPUT shape during VO-only fallback
**Summary**: Fields adapt when no satellite anchor is available for >30 s.
**Traces to**: AC-1.4, AC-4.3, results_report row 6. Tier: T1.
**Preconditions**: Force satellite-match failure for >30 s (cache poisoned with stale tiles).
**Input data**: `nav_cam_60_slice` with `stale_tile_scenarios` injected.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | After 30 s of failed matches, sniff GPS_INPUT | `fix_type == 3`, `horiz_accuracy ∈ [20, 100]` m, source-label `vo_extrapolated` |
**Expected outcome**: as above; horiz_accuracy grows monotonically until next successful match.
**Max execution time**: 60 s.
---
### FT-P-07: GPS_INPUT shape during dead-reckoning
**Summary**: VO lost AND no satellite → IMU-only dead reckoning.
**Traces to**: AC-1.4, AC-5.2, results_report row 7. Tier: T1.
**Preconditions**: Inject cuVSLAM tracking-loss + cache poisoned.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Sniff GPS_INPUT | `fix_type == 2`, `horiz_accuracy ≥ 50 m` and growing |
| 2 | Source label | `dead_reckoned` |
**Expected outcome**: `fix_type == 2`, monotonically growing horiz_accuracy, `source == dead_reckoned`.
**Max execution time**: 60 s.
---
### FT-P-08: GPS_INPUT shape on total failure
**Summary**: 3+ consecutive failures — system signals total failure.
**Traces to**: AC-3.4, results_report row 8. Tier: T1.
**Preconditions**: `cache_poisoning_scenarios` flavour that causes 3 sat failures + cuVSLAM lost.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Wait for 3 consecutive failures | GPS_INPUT continues at the configured rate |
| 2 | Inspect GPS_INPUT | `fix_type == 0`, `horiz_accuracy == 999.0` |
| 3 | Inspect STATUSTEXT | RELOC_REQ regex emitted |
**Expected outcome**: as above.
**Max execution time**: 60 s.
---
### FT-P-09: Confidence tier transitions
**Summary**: Confidence tier label transitions match defined conditions.
**Traces to**: AC-1.4, results_report rows 1013. Tier: T1.
**Preconditions**: scripted scenario that walks (HIGH → MEDIUM → LOW → FAILED).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | At each scripted state, read the SSE stream confidence field AND the source-label field | matches expected tier |
**Expected outcome**:
- Sat anchor <30 s + cov <400 m² → tier `HIGH`, source `satellite_anchored`.
- cuVSLAM OK + no sat >30 s → tier `MEDIUM`, source `vo_extrapolated`.
- cuVSLAM lost + IMU only → tier `LOW`, source `dead_reckoned`.
- 3+ consecutive failures → tier `FAILED`, fix_type 0.
**Max execution time**: 5 min.
---
### FT-P-10: Image registration rate (functional)
**Summary**: Pipeline registers ≥95 % of normal-flight frames against the previous frame.
**Traces to**: AC-2.1 (pipeline-functional only), results_report row 14. Tier: T1 functional + T2 binding.
**Preconditions**: SUT exposes registration outcome via STATUSTEXT or NAMED_VALUE_FLOAT (`reg_pass_count`, `reg_total_count`).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay `nav_cam_60_slice` (T1) or AerialVL S03 (T2) | registration metrics published |
| 2 | Compute `reg_pass_count / reg_total_count` | percentage |
**Expected outcome**: T1 ≥95 % (functional); T2 ≥95 % (deployment-binding) under normal-flight definition (nadir, ±10° bank/pitch, ≥40 % overlap, daytime, season-matched tile).
**Max execution time**: 60 s (T1) / 90 min (T2).
---
### FT-P-11: Mean Reprojection Error (MRE)
**Summary**: VO and cross-domain MRE under thresholds.
**Traces to**: AC-2.2, results_report row 15. Tier: T1 functional + T2 binding.
**Preconditions**: SUT publishes `mre_vo` (frame-to-frame) and `mre_cross` (cross-view) on the metrics endpoint.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Scrape MRE metrics over a replay | per-frame samples |
| 2 | Compute mean across the run | — |
**Expected outcome**: `mean(mre_vo) < 1.0 px`; `mean(mre_cross) < 2.5 px`. T1 numbers functional only.
**Max execution time**: 60 s (T1) / 90 min (T2).
---
### FT-P-12: Continuous output through turn area (frames 3243)
**Summary**: SUT keeps producing position estimates through the turn segment of `coordinates.csv`.
**Traces to**: AC-3.2, AC-4.4, results_report row 16. Tier: T1.
**Preconditions**: standard pipeline replay.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay frames 3243 | per-frame GPS_INPUT |
| 2 | Count outputs vs frames | — |
**Expected outcome**: ≥1 GPS_INPUT per nav-cam frame in the turn region.
**Max execution time**: 30 s.
---
### FT-P-13: 350 m outlier handled (AC-3.1)
**Summary**: Pipeline survives a synthetic 350 m gap between consecutive frames (caused by ±20° tilt outlier).
**Traces to**: AC-3.1, results_report row 17. Tier: T1.
**Input data**: synthetic two-frame pair with 350 m gap injected into `nav_cam_60_slice` mid-replay.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Inject the outlier pair | SUT emits a `vo_extrapolated` or `dead_reckoned` frame, not corrupted output |
| 2 | Continue with next valid frame | error returns to ≤100 m within next valid frame |
**Expected outcome**: error ≤ 100 m on the next valid frame after the outlier.
**Max execution time**: 60 s.
---
### FT-P-14: Sharp-turn re-localization (AC-3.2)
**Summary**: Sharp turn (<5 % overlap, <70°, <200 m drift) — VO fails, satellite re-loc recovers.
**Traces to**: AC-3.2, F-T7, results_report row 18. Tier: T1.
**Input data**: synthetic sharp-turn pair injected into `nav_cam_60_slice`.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Inject the sharp-turn pair | cuVSLAM tracking lost; VPR triggers; matcher re-localizes |
| 2 | Track error over next 3 frames | error ≤ 50 m within 3 frames |
**Expected outcome**: error ≤ 50 m within 3 frames of the turn.
**Max execution time**: 60 s.
---
### FT-P-15: VO loss → satellite recovery → tracking_state == NORMAL
**Summary**: After cuVSLAM tracking loss + sat match success, tracking_state returns to NORMAL.
**Traces to**: AC-3.2, AC-3.3, results_report row 19. Tier: T1.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Force cuVSLAM tracking-loss; deliver a fresh tile that matches | matcher emits absolute pose; calibrator emits satellite-anchored fix |
| 2 | Observe FC EKF3 reconvergence via `EKF_STATUS_REPORT` | EKF3 reconverges |
| 3 | Read SUT-published `tracking_state` | == `NORMAL` |
**Expected outcome**: tracking_state == NORMAL within bounded time.
**Max execution time**: 60 s.
---
### FT-P-16: Cold-start TTFF ≤30 s p95
**Summary**: From companion-computer boot, first valid GPS_INPUT within 30 s.
**Traces to**: AC-NEW-1, results_report row 23, F-T11. Tier: T1 statistical (≤10 boots) + T4 binding (50 boots on real HW).
**Preconditions**: SUT image cold (no warmed engines); FC providing `GLOBAL_POSITION_INT` simulating IMU-extrapolated pose.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Boot SUT container | container start logged |
| 2 | Time from container start to first valid `fix_type==3` GPS_INPUT | t_ttff |
| 3 | Repeat N times (N=10 T1 / N=50 T4) | distribution |
**Expected outcome**: 95th percentile of t_ttff ≤ 30 s.
**Max execution time**: 10 min (T1) / 30 min (T4).
---
### FT-P-17: Validate initial position via first satellite match
**Summary**: First satellite match after startup pulls position to ≤50 m.
**Traces to**: AC-5.1, AC-NEW-1, results_report row 24. Tier: T1.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Provide `GLOBAL_POSITION_INT` with a deliberate 200 m offset from truth | SUT seeds pipeline with 200 m uncertainty |
| 2 | Replay first frame with valid satellite tile | matcher succeeds; calibrator emits anchored fix |
| 3 | Read GPS_INPUT lat/lon | error ≤ 50 m |
**Expected outcome**: position error ≤ 50 m after first match.
**Max execution time**: 90 s.
---
### FT-P-18: Mid-flight reboot recovery ≤30 s
**Summary**: Process kill mid-flight; SUT recovers within AC-NEW-1 budget.
**Traces to**: AC-5.3, AC-NEW-1, results_report row 25. Tier: T1.
**Preconditions**: SUT in steady-state tracking; FC continues to fly.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Send SIGKILL to SUT container | SUT restarts |
| 2 | Time from restart to first `fix_type==3` GPS_INPUT | t_recovery |
**Expected outcome**: t_recovery ≤ 30 s.
**Max execution time**: 60 s.
---
### FT-P-19: Post-reboot first-match accuracy
**Summary**: After reboot, first satellite match restores accuracy.
**Traces to**: AC-5.3, results_report row 26. Tier: T1.
**Steps**: same as FT-P-17 but starting from a reboot.
**Expected outcome**: error ≤ 50 m after first match.
**Max execution time**: 90 s.
---
### FT-P-20: Object localization (level flight)
**Summary**: `POST /objects/locate` returns lat/lon for an object pixel given known UAV pose.
**Traces to**: AC-7.1, AC-7.2, results_report row 27. Tier: T1.
**Preconditions**: SUT has a known anchored fix; AI camera gimbal pose injected via FC `ATTITUDE`.
**Input data**: pixel coordinates + gimbal angle + zoom + altitude in request body.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `POST /objects/locate` with pixel_x, pixel_y, gimbal_pitch, gimbal_yaw, zoom, altitude | 200 + JSON `{lat, lon, alt, accuracy_m, confidence}` |
| 2 | Compare to ground truth | error ≤ accuracy_m |
**Expected outcome**: lat/lon within `accuracy_m` of ground truth; in level flight, `accuracy_m` consistent with frame-center accuracy of the GPS-Denied system. In maneuvering flight, response includes the `altitude × |sin(unknown_bank_or_pitch)|` bound (AC-7.1 second clause) when bank/pitch >5°.
**Max execution time**: 5 s.
---
### FT-P-21: Coordinate transform round-trip ≤0.1 m
**Summary**: GPS → NED → pixel → GPS round-trip preserves position.
**Traces to**: AC-6.3, AC-7.2, results_report row 29. Tier: T1.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Submit a known WGS84 point through the round-trip via `/objects/locate` (or a debug endpoint if exposed) | round-trip lat/lon |
| 2 | Compare to original | ≤ 0.1 m |
**Expected outcome**: round-trip error ≤ 0.1 m.
**Max execution time**: 1 s.
---
### FT-P-22: `GET /health` schema and content
**Summary**: Health endpoint returns 200 with required fields.
**Traces to**: AC-6.1 (telemetry), results_report row 30. Tier: T1.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `GET /health` | HTTP 200, JSON body |
| 2 | Validate schema | contains `status`, `memory_mb`, `gpu_temp_c`, `tracking_state`, `last_anchor_age_s`, `confidence_tier` |
**Expected outcome**: as above; `status ∈ {ok, degraded, failed}`.
**Max execution time**: 1 s.
---
### FT-P-23: `POST /sessions` returns id
**Traces to**: AC-6.1, results_report row 31. Tier: T1.
**Steps**: `POST /sessions` (auth) → 200/201 with session id.
**Expected outcome**: status ∈ {200, 201}; body has `session_id` matching `^[a-f0-9-]{36}$`.
**Max execution time**: 1 s.
---
### FT-P-24: SSE stream emits per-second events
**Traces to**: AC-6.1, results_report row 32. Tier: T1.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `GET /sessions/{id}/stream` | SSE connection; events emitted at ~1 Hz |
| 2 | Sample 30 s of stream | each event matches schema: `type`, `timestamp`, `lat`, `lon`, `alt`, `accuracy_h`, `confidence`, `vo_status` |
**Expected outcome**: rate 1 Hz ± 0.2 Hz; all events conform to schema.
**Max execution time**: 35 s.
---
### FT-P-25: TRT engine load ≤10 s
**Traces to**: AC-NEW-1 (sub-budget), results_report row 39. Tier: T1 (synthetic timing) + T4 (real HW).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | At SUT boot, time from container start to "all engines ready" STATUSTEXT | t_engines |
**Expected outcome**: t_engines ≤ 10 s.
**Max execution time**: 30 s.
---
### FT-P-26: Tile storage size for the operational area
**Traces to**: AC-8.3, restrictions §UAV/Satellite, results_report row 40. Tier: T1.
**Preconditions**: a 200 km mission path × ±2 km buffer × z=18 + z=20 fixture loaded.
**Steps**: read total bytes under `/probe/tiles/`.
**Expected outcome**: 300 MB ≤ size ≤ 1000 MB. (Aligned with restriction's ~10 GB persistent cap for full 400 km².)
**Max execution time**: 5 s.
---
### FT-P-27: Tile mosaic coverage radius ≥500 m
**Traces to**: AC-8.3, results_report row 41. Tier: T1.
**Preconditions**: SUT given EKF position with σ_xy.
**Steps**: capture the assembled mosaic bbox via STATUSTEXT or a debug endpoint.
**Expected outcome**: mosaic radius ≥ 500 m around current position.
**Max execution time**: 5 s.
---
### FT-P-28: Tile dedup — ≤1 onboard tile per ground sector
**Traces to**: AC-8.4, F-T2. Tier: T1.
**Preconditions**: `tile_dedup_replay` (sectors visited ≥2×).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay the flight | onboard tiles written |
| 2 | Inspect MBTiles + sidecar in `/probe/tiles/` | per-sector tile count |
**Expected outcome**: per-sector count ≤ 1; latest/highest-quality wins.
**Max execution time**: 10 min.
---
### FT-P-29: Post-flight upload to candidate pool
**Traces to**: AC-8.4, F-T3. Tier: T1.
**Preconditions**: `service-stub` running.
**Steps**: replay → on landing-event, SUT uploads tiles.
**Expected outcome**: `service-stub` records ≥1 tile with `trust_level=candidate`; promotion only after N≥2 voting flights (so a single flight does not promote).
**Max execution time**: 5 min.
---
### FT-P-30: NAMED_VALUE_FLOAT telemetry rate
**Traces to**: AC-6.1, results_report row 45. Tier: T1.
**Steps**: sniff `gps_conf`, `gps_drift`, `gps_hacc` NAMED_VALUE_FLOAT rates over 30 s.
**Expected outcome**: each at 1 Hz ± 0.2 Hz.
**Max execution time**: 35 s.
---
### FT-P-31: Disconnected segments — ≥3 connected via global retrieval
**Traces to**: AC-3.3, F-T8. Tier: T1.
**Preconditions**: `disconnected_segments_replay` with ≥3 segments.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay each segment with a synthetic gap | for each segment, VPR retrieves top-K candidates; matcher relocalizes |
| 2 | Verify segment-to-segment trajectory continuity | each segment connects to prior trajectory |
**Expected outcome**: 3/3 segments connect within 10 frames of segment start; `tracking_state == NORMAL` after each.
**Max execution time**: 5 min.
---
### FT-P-32: Position refinement / corrections (AC-4.5)
**Traces to**: AC-4.5. Tier: T1.
**Preconditions**: SUT in steady state; ability to refine prior fixes.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Capture sequence of GPS_INPUT for a 10-s window | per-frame fixes |
| 2 | After delayed loop closure / late satellite match, observe whether SUT emits a corrected fix or signals correction via STATUSTEXT | a follow-up GPS_INPUT for an earlier `time_usec` OR a STATUSTEXT correction record |
**Expected outcome**: at least one correction event where the corrected fix replaces the prior fix's `h_acc` (covariance shrinks). System never silently rewrites past output without recording the correction.
**Max execution time**: 60 s.
---
### FT-P-33: Object-localize bank/pitch >5° publishes uncertainty bound
**Traces to**: AC-7.1 (second clause). Tier: T1.
**Preconditions**: FC `ATTITUDE` published with bank > 5°.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `POST /objects/locate` while bank > 5° | response includes bound = `altitude × abs(sin(bank_or_pitch))` |
**Expected outcome**: response body includes `bank_pitch_bound_m` matching the formula within 1 m.
**Max execution time**: 5 s.
---
### FT-P-34: Mid-flight tile generation respects σ_xy ≤ 5 m hard gate
**Traces to**: AC-8.4, AC-NEW-7 hard gate. Tier: T1.
**Preconditions**: scripted scenarios with σ_xy ∈ {2, 4, 6, 8} m.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay σ_xy=2 m frames | tiles written |
| 2 | Replay σ_xy=8 m frames | NO tiles written |
| 3 | Inspect sidecar `trust_level` for σ_xy ∈ (3, 5] m | `trust_level == soft` |
| 4 | Inspect sidecar for σ_xy ≤ 3 m | `trust_level == candidate` |
**Expected outcome**: as above.
**Max execution time**: 5 min.
---
### FT-P-35: NF-T4b cache-poisoning safety budget (Monte Carlo)
**Traces to**: AC-NEW-7. Tier: T2 (`deferred-corpus`). `data_status: deferred-corpus`.
**Preconditions**: ≥100 simulated flights worth of frames from AerialVL + Mavic + AerialExtreMatch with synthetic over-confidence injection (1.5×–3×).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay all flights | per-tile geo-misalignment captured |
| 2 | Compute P(misalign > 30 m) and P(misalign > 100 m) | — |
**Expected outcome**: P(>30 m) < 1 %, P(>100 m) < 0.1 %.
**Max execution time**: 4 h.
---
### FT-P-36: AC-NEW-9 covariance calibration accuracy
**Traces to**: AC-NEW-9, F-T18. Tier: T2.
**Preconditions**: AerialVL S03 replay with ground truth.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | For each emitted GPS_INPUT, capture (`h_acc`, ground-truth error) | series of pairs |
| 2 | Compute fraction of frames where `error ≤ h_acc * Mahalanobis-2D-95% factor` | fraction |
**Expected outcome**: fraction ≥ 95 % (calibration neither over- nor under-claims).
**Max execution time**: 90 min.
---
### FT-P-37: F-T18 calibrator regression (no state propagation)
**Traces to**: AC-NEW-9, F-T18. Tier: T2.
**Preconditions**: replay with logging hooks on Component 5 outputs (publicly exposed counters).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run replay | calibrator counters emitted |
| 2 | Assert `state_propagation_invocations_total == 0` | no propagation |
| 3 | Assert `mahalanobis_gate_rejections_total > 0` | gate active |
**Expected outcome**: as above.
**Max execution time**: 90 min.
---
## Negative Scenarios
### FT-N-01: Corrupted nav-cam frame — no crash, degraded mode
**Traces to**: AC-3.x (resilience), restriction "fixed downward camera". Tier: T1.
**Input data**: a 60-frame replay with frame N replaced by a 10-byte random blob.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Stream the replay | SUT logs decode error; emits STATUSTEXT WARN |
| 2 | Inspect tracking_state | transitions to `DEGRADED` for 1 frame; recovers to `NORMAL` on next valid frame |
| 3 | SUT process | does NOT crash |
**Expected outcome**: process alive; no GPS_INPUT spike with bad data; tracking_state returns to NORMAL within 1 frame of recovery.
**Max execution time**: 30 s.
---
### FT-N-02: Object-localize invalid pixel
**Traces to**: AC-7.1, results_report row 28. Tier: T1.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `POST /objects/locate` with `pixel_x = -10` (out of frame) | HTTP 422 + error body |
**Expected outcome**: status == 422; body contains a structured error code.
**Max execution time**: 1 s.
---
### FT-N-03: Unauthenticated `POST /sessions`
**Traces to**: results_report row 33, security restrictions. Tier: T1.
**Steps**: `POST /sessions` without JWT → 401.
**Expected outcome**: status == 401.
**Max execution time**: 1 s.
---
### FT-N-04: Stale tile beyond grace — must NOT label `satellite_anchored`
**Traces to**: AC-8.2, AC-NEW-6. Tier: T1.
**Preconditions**: `stale_tile_scenarios` with 18-month-old active-conflict tile (well past 6 mo + 30-day grace).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay frames whose only candidate tile is the 18 mo stale one | matcher invocation skipped or scored 0 |
| 2 | Inspect source label on emitted GPS_INPUT | NEVER `satellite_anchored` |
| 3 | Inspect WARN STATUSTEXT | tile rejected event recorded |
**Expected outcome**: no `satellite_anchored` label across the run; rejection event recorded.
**Max execution time**: 60 s.
---
### FT-N-05: Stale tile in 30-day grace — confidence linearly decayed
**Traces to**: AC-NEW-6. Tier: T1.
**Preconditions**: tiles aged at +0, +15, +30 days past the 6-mo budget.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay each | confidence weight in sidecar metric: 1.0, 0.5, 0.0 |
**Expected outcome**: confidence weight decays linearly as specified.
**Max execution time**: 60 s.
---
### FT-N-06: Sharp-turn negative — must trigger satellite re-loc, not silently fail
**Traces to**: AC-3.2 (negative case). Tier: T1.
**Steps**: same as FT-P-14 but assert that **before** re-loc the SUT emits a STATUSTEXT explaining VO loss; assert tracking_state transitions through DEGRADED.
**Expected outcome**: explicit STATUSTEXT log; recovery within 3 frames.
**Max execution time**: 60 s.
---
### FT-N-07: 3-consecutive-failure → RELOC_REQ on STATUSTEXT
**Traces to**: AC-3.4, results_report rows 20, 46. Tier: T1.
**Steps**: see FT-P-08; additionally verify the regex `RELOC_REQ:.*last_lat=.*last_lon=.*uncertainty=.*m`.
**Expected outcome**: regex matches at least one STATUSTEXT after 3 failures; emitted within 2 s of the third failure (per AC-3.4 timing).
**Max execution time**: 60 s.
---
### FT-N-08: Re-loc waiting state behaviour
**Traces to**: AC-3.4, results_report row 21. Tier: T1.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | After RELOC_REQ, observe SUT for 10 s | `fix_type == 0` GPS_INPUT continues; IMU-prediction-only label; satellite-match attempts continue (counter increments) |
**Expected outcome**: as above; SUT does NOT stop emitting GPS_INPUT.
**Max execution time**: 30 s.
---
### FT-N-09: Operator hint — used as 500 m seed
**Traces to**: AC-6.2, AC-3.4, results_report row 22. Tier: T1.
**Preconditions**: SUT in re-loc-waiting; operator hint scenario active.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `qgc-mock` sends STATUSTEXT `RELOC_HINT: lat=… lon=… sigma=500m` | SUT consumes hint; uses as seed for VPR/cross-view |
| 2 | First fix after hint | error ≤ 500 m initially |
| 3 | After next satellite match | error ≤ 50 m |
**Expected outcome**: as above.
**Max execution time**: 60 s.
---
### FT-N-10: Operator hint — malformed value rejected
**Traces to**: AC-6.2 (negative). Tier: T1.
**Steps**: send `RELOC_HINT: lat=NaN lon=… sigma=-10`.
**Expected outcome**: SUT emits STATUSTEXT WARN; hint NOT applied; pipeline state unchanged.
**Max execution time**: 30 s.
---
### FT-N-11: AC-4.3 — ODOMETRY intentionally absent in v1
**Traces to**: AC-4.3 (v1-scope clause), F-T9 Option A. Tier: T1.
**Preconditions**: SUT configured for v1 (default).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run any of FT-P-01 / FT-P-04 / FT-P-T2 | GPS_INPUT emitted |
| 2 | At qgc-mock, count ODOMETRY frames over the run | == 0 |
| 3 | Inspect `EK3_SRC1_*` configuration via FC parameter readback | `POSXY=GPS, VELXY=GPS, YAW=GPS+Compass` |
**Expected outcome**: ODOMETRY count == 0; FC parameters as configured.
**Max execution time**: 60 s.
---
### FT-N-12: Spoofed GPS — SUT promotes within 3 s
**Traces to**: AC-NEW-2, F-T12. Tier: T3 (`deferred-sitl`). `data_status: deferred-sitl`.
**Preconditions**: SITL + `gps-spoof-injector` configured.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | At t=0, inject a malicious `GPS_RAW_INT` with a 1 km offset | FC sees both spoof + SUT GPS_INPUT |
| 2 | Time from spoof onset to SUT promoting its `GPS_INPUT` to primary (raised `fix_type=3` AND STATUSTEXT promotion event) | t_promote |
| 3 | Repeat 50× | distribution |
**Expected outcome**: 95th percentile of t_promote < 3 s.
**Max execution time**: 30 min.
---
### FT-N-13: Failsafe at 3 s no-fix (AC-5.2)
**Traces to**: AC-5.2. Tier: T1+T3.
**Preconditions**: scripted scenario where SUT cannot produce ANY estimate for 3.5 s.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Force pipeline blackout | SUT logs failure |
| 2 | Verify FC behaviour | ArduPilot SITL logs fall-back to IMU-only dead reckoning |
**Expected outcome**: failsafe transition observable in `EKF_STATUS_REPORT` within 4 s of blackout.
**Max execution time**: 60 s.
---
### FT-N-14: Refusal of unsigned MAVLink (S-T1 boundary)
**Traces to**: restrictions §Sensors. Tier: T3.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Send a GPS_INPUT with invalid signing tag from the runner | FC rejects |
| 2 | Inspect FC log + STATUSTEXT to GCS | rejection event recorded |
**Expected outcome**: rejected; FC continues to fly on prior valid source.
**Max execution time**: 30 s.
---
### FT-N-15: SITL F-T9 Option A regression — EKF3 fuses GPS_INPUT only, no double-fusion
**Traces to**: AC-4.3, F-T9 Option A. Tier: T3.
**Preconditions**: SITL with `EK3_SRC1_*=GPS+Compass`, `EK3_SRC2_*=GPS`.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run a representative AerialVL replay | EKF3 fuses GPS_INPUT |
| 2 | Inspect EKF3 logs for double-fusion symptoms (issues #30076, #32506) | none |
| 3 | Trigger backup-GPS failover via SITL parameter | EKF3 switches to `EK3_SRC2_*` cleanly |
**Expected outcome**: no double-fusion; clean failover.
**Max execution time**: 30 min.
---
### FT-N-16: SITL F-T9 Option B regression (v1.1 candidate)
**Traces to**: AC-4.3 Option B (v1.1+), F-T9. Tier: T3 (`deferred-sitl`). `data_status: deferred-sitl`.
**Preconditions**: SITL with PR #30080-class build; SUT switched to ODOMETRY-primary mode (build flag).
**Steps**: ODOMETRY primary; GPS_INPUT held in reserve; verify clean source-switching, no double-fusion.
**Expected outcome**: as above. **Test runs but build flag is OFF for v1 release gate.**
**Max execution time**: 30 min.
---
### FT-N-17: AC-NEW-7 single-flight tile NOT promoted to trusted basemap
**Traces to**: AC-NEW-7 (Service-side voting). Tier: T1 (`service-stub`).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run a single-flight upload to candidate pool | tile recorded as `trust_level=candidate` |
| 2 | Query `service-stub` for promoted basemap content | tile NOT in promoted basemap |
**Expected outcome**: as above; promotion only after N≥2 voting flights confirm.
**Max execution time**: 5 min.
---
### FT-N-18: AC-8.5 — raw frames are NOT retained in FDR
**Traces to**: AC-8.5, AC-NEW-3. Tier: T1.
**Preconditions**: replay 60-frame slice; `nav_cam_60_slice` written to camera input.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run the replay | FDR populates |
| 2 | Inspect `/probe/fdr/` for raw nav-cam frames | no JPEGs / no AI-cam frames |
| 3 | Inspect for thumbnail log of failed-tile-generation frames | present, ≤0.1 Hz, within FDR cap |
**Expected outcome**: no raw frames retained; only the failure thumbnail log within budget.
**Max execution time**: 60 s.
---
### FT-N-19: Free public Sentinel-2 tile rejected at cache boundary
**Traces to**: AC-8.1 (resolution floor), restrictions §Satellite. Tier: T1.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Inject a synthetic tile at 10 m/px into the cache | cache index marks as below-resolution |
| 2 | Replay frames over that area | matcher does NOT use the tile; never emits `satellite_anchored` from it |
**Expected outcome**: as above.
**Max execution time**: 60 s.
---
### FT-N-20: Photo-count cap removed — system runs without arbitrary cap
**Traces to**: restrictions §UAV ("no photo-count cap"). Tier: T1 (smoke).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay `synthetic_8h_load` for 30 min | SUT continues operating |
| 2 | Inspect logs for any "photo count exceeded" condition | none |
**Expected outcome**: no cap-related condition; pipeline degrades only against FDR cap (AC-NEW-3) and tile-cache cap.
**Max execution time**: 35 min.
---
### FT-N-21: AC-7.1 in maneuvering flight — uncertainty bound published, not a fixed number
**Traces to**: AC-7.1 second clause. Tier: T1.
**Steps**: like FT-P-33 but assert that for bank ∈ {0°, 5°, 15°, 30°}, the published `bank_pitch_bound_m` matches `altitude × |sin(bank)|` within 1 m.
**Expected outcome**: bound published correctly across the range.
**Max execution time**: 30 s.
---
## Coverage notes
- **Pipeline-correctness boundary**: T1 tests on the 60-image slice are NOT deployment-binding. AC-1.1, AC-1.2, AC-2.1, AC-2.2, AC-1.3, AC-NEW-8 deployment numbers come from T2 (FT-P-T2, FT-P-04 binding split, FT-P-10 binding, FT-P-11 binding).
- **Behavioural-shape tests**: FT-P-08, FT-P-15, FT-P-16, FT-P-18, FT-N-04, FT-N-11, FT-N-12, FT-N-13, FT-N-14, FT-N-17, FT-N-18, FT-N-19 use the behavioural shape (trigger + observable + quantifiable verdict) — no input-data input/output mapping required.
- **Untraced tests**: none. Every test traces to ≥1 AC or restriction.