# Blackbox Tests > Tier markers (per `environment.md`): `pipeline` (T1), `deferred-corpus` (T2), `deferred-sitl` (T3), `deferred-hil` (T4), `deferred-field` (T5). > Every test pairs an input/observable with a quantifiable expected result from `_docs/00_problem/input_data/expected_results/results_report.md` or directly from an AC. > All tests run through the public interfaces defined in `environment.md`. No SUT-internal access. ## Positive Scenarios ### FT-P-01: 60-frame sequential pipeline — ≥80 % within 50 m (pipeline-correctness only) **Summary**: Sequentially feed the 60 nav-cam JPGs through the SUT and verify the position-error CDF on this corpus. **Traces to**: AC-1.1 (pipeline-correctness only — see `test-data.md` caveat), results_report row 1. **Category**: Position Accuracy. Tier: T1 (`pipeline`). **Preconditions**: - `nav_cam_60_slice` mounted; `nav_cam_60_slice_imu` synthesised; `satellite_tiles_AD0000xx_z20` placeholder fixture present. - SUT booted; cuVSLAM warmed; ArduPilot SITL loaded with the corresponding IMU replay; first valid GPS_INPUT received (i.e., AC-NEW-1 cleared). **Input data**: `nav_cam_60_slice` + `coordinates.csv` + `nav_cam_60_slice_imu`. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Stream the 60 JPGs at 3 fps via the camera-input shim into the SUT | SUT publishes `GPS_INPUT` for each frame | | 2 | Capture each `GPS_INPUT.lat / lon` at the qgc-mock sniffer | All frames produce a frame within the test window | | 3 | Compute haversine error vs `coordinates.csv` ground truth per frame | Per-frame errors collected into a CDF | **Expected outcome**: ≥80 % of frames have error < 50 m. Reported as **pipeline-functional**, not deployment-binding (per `test-data.md` caveat — deployment-binding number from FT-P-T2 / AerialVL). **Max execution time**: 60 s per run. --- ### FT-P-02: 60-frame sequential pipeline — ≥50 % within 20 m (pipeline-correctness only) **Summary**: Same corpus as FT-P-01; tighter tolerance. **Traces to**: AC-1.2 (pipeline-correctness only), results_report row 2. **Category**: Position Accuracy. Tier: T1. **Preconditions / Input data**: same as FT-P-01. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay the corpus end-to-end | per-frame `GPS_INPUT` | | 2 | Compute haversine error CDF | — | **Expected outcome**: ≥50 % within 20 m on the 60-image slice (functional check). Deployment-binding number comes from AerialVL S03 in FT-P-T2. **Max execution time**: 60 s. --- ### FT-P-T2: AC-1.1 / AC-1.2 deployment-binding accuracy on AerialVL S03 **Summary**: Re-run AC-1.1 / AC-1.2 on the deployment-binding corpus. **Traces to**: AC-1.1, AC-1.2. Tier: T2 (`deferred-corpus`). `data_status: deferred-corpus`. **Preconditions**: `aerialvl_s03` mounted with synced IMU + nav-cam stream + GPS truth. **Input data**: AerialVL S03. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay AerialVL S03 70 km of fixed-wing flight at 1 km AGL | per-frame `GPS_INPUT` | | 2 | Compute error CDF vs S03 GPS truth | — | **Expected outcome**: ≥80 % within 50 m AND ≥50 % within 20 m (deployment-binding). **Max execution time**: 90 min (replay + analysis). --- ### FT-P-03: Per-frame error bound ≤100 m **Summary**: No single frame exceeds 100 m error on the 60-image slice. **Traces to**: AC-1.1 (negative-tail bound), results_report row 3. Tier: T1. **Preconditions / Input**: same as FT-P-01. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay 60 frames | per-frame GPS_INPUT | | 2 | Compute max(haversine_err) over all frames | — | **Expected outcome**: max error ≤ 100 m. Pipeline-functional only. **Max execution time**: 60 s. --- ### FT-P-04: VO drift bound between satellite anchors **Summary**: VO drift between successive satellite-anchored fixes stays bounded. **Traces to**: AC-1.3, AC-NEW-8, results_report row 4, F-T1b. Tier: T1 functional + T2 binding. **Preconditions**: cuVSLAM in mono+IMU mode (T1) AND mono-only mode (T2 split test). **Input data**: `nav_cam_60_slice` (T1) + AerialVL S03 (T2). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Identify successive `satellite_anchored` source-label transitions | series of anchor pairs | | 2 | For each anchor pair, measure VO-extrapolated centre vs next-anchor centre | drift in metres | | 3 | Compute 95th percentile across all pairs | — | **Expected outcome**: - mono+IMU: p95 drift ≤ 50 m (binding on T2 / AerialVL). - mono-only: p95 drift ≤ 100 m (binding on T2 / AerialVL). - T1 functional check: drift bounded (no monotonic growth) — exact numbers not deployment-binding. **Max execution time**: 90 min (T2). --- ### FT-P-05: GPS_INPUT shape under normal tracking **Summary**: GPS_INPUT messages emitted while tracking is healthy carry the correct schema and value ranges. **Traces to**: AC-1.4, AC-4.3, AC-6.3, results_report row 5. Tier: T1. **Preconditions**: SUT in steady-state tracking with recent satellite anchor (<30 s old). **Input data**: any single frame from `nav_cam_60_slice`. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Sniff GPS_INPUT at qgc-mock | one frame per nav-cam frame | | 2 | Decode fields: `fix_type`, `horiz_accuracy`, `satellites_visible`, `lat`, `lon`, `alt`, `vel_acc` | as per MAVLink GPS_INPUT spec | | 3 | Inspect optional ODOMETRY: assert intentional absence in v1 (per AC-4.3 v1-scope clause) | no ODOMETRY frames present | **Expected outcome**: `fix_type == 3`, `horiz_accuracy ∈ [1, 50] m`, `satellites_visible == 10`, `lat / lon` non-null, WGS84. ODOMETRY count == 0 across the run. **Max execution time**: 30 s. --- ### FT-P-06: GPS_INPUT shape during VO-only fallback **Summary**: Fields adapt when no satellite anchor is available for >30 s. **Traces to**: AC-1.4, AC-4.3, results_report row 6. Tier: T1. **Preconditions**: Force satellite-match failure for >30 s (cache poisoned with stale tiles). **Input data**: `nav_cam_60_slice` with `stale_tile_scenarios` injected. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | After 30 s of failed matches, sniff GPS_INPUT | `fix_type == 3`, `horiz_accuracy ∈ [20, 100]` m, source-label `vo_extrapolated` | **Expected outcome**: as above; horiz_accuracy grows monotonically until next successful match. **Max execution time**: 60 s. --- ### FT-P-07: GPS_INPUT shape during dead-reckoning **Summary**: VO lost AND no satellite → IMU-only dead reckoning. **Traces to**: AC-1.4, AC-5.2, results_report row 7. Tier: T1. **Preconditions**: Inject cuVSLAM tracking-loss + cache poisoned. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Sniff GPS_INPUT | `fix_type == 2`, `horiz_accuracy ≥ 50 m` and growing | | 2 | Source label | `dead_reckoned` | **Expected outcome**: `fix_type == 2`, monotonically growing horiz_accuracy, `source == dead_reckoned`. **Max execution time**: 60 s. --- ### FT-P-08: GPS_INPUT shape on total failure **Summary**: 3+ consecutive failures — system signals total failure. **Traces to**: AC-3.4, results_report row 8. Tier: T1. **Preconditions**: `cache_poisoning_scenarios` flavour that causes 3 sat failures + cuVSLAM lost. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Wait for 3 consecutive failures | GPS_INPUT continues at the configured rate | | 2 | Inspect GPS_INPUT | `fix_type == 0`, `horiz_accuracy == 999.0` | | 3 | Inspect STATUSTEXT | RELOC_REQ regex emitted | **Expected outcome**: as above. **Max execution time**: 60 s. --- ### FT-P-09: Confidence tier transitions **Summary**: Confidence tier label transitions match defined conditions. **Traces to**: AC-1.4, results_report rows 10–13. Tier: T1. **Preconditions**: scripted scenario that walks (HIGH → MEDIUM → LOW → FAILED). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | At each scripted state, read the SSE stream confidence field AND the source-label field | matches expected tier | **Expected outcome**: - Sat anchor <30 s + cov <400 m² → tier `HIGH`, source `satellite_anchored`. - cuVSLAM OK + no sat >30 s → tier `MEDIUM`, source `vo_extrapolated`. - cuVSLAM lost + IMU only → tier `LOW`, source `dead_reckoned`. - 3+ consecutive failures → tier `FAILED`, fix_type 0. **Max execution time**: 5 min. --- ### FT-P-10: Image registration rate (functional) **Summary**: Pipeline registers ≥95 % of normal-flight frames against the previous frame. **Traces to**: AC-2.1 (pipeline-functional only), results_report row 14. Tier: T1 functional + T2 binding. **Preconditions**: SUT exposes registration outcome via STATUSTEXT or NAMED_VALUE_FLOAT (`reg_pass_count`, `reg_total_count`). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay `nav_cam_60_slice` (T1) or AerialVL S03 (T2) | registration metrics published | | 2 | Compute `reg_pass_count / reg_total_count` | percentage | **Expected outcome**: T1 ≥95 % (functional); T2 ≥95 % (deployment-binding) under normal-flight definition (nadir, ±10° bank/pitch, ≥40 % overlap, daytime, season-matched tile). **Max execution time**: 60 s (T1) / 90 min (T2). --- ### FT-P-11: Mean Reprojection Error (MRE) **Summary**: VO and cross-domain MRE under thresholds. **Traces to**: AC-2.2, results_report row 15. Tier: T1 functional + T2 binding. **Preconditions**: SUT publishes `mre_vo` (frame-to-frame) and `mre_cross` (cross-view) on the metrics endpoint. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Scrape MRE metrics over a replay | per-frame samples | | 2 | Compute mean across the run | — | **Expected outcome**: `mean(mre_vo) < 1.0 px`; `mean(mre_cross) < 2.5 px`. T1 numbers functional only. **Max execution time**: 60 s (T1) / 90 min (T2). --- ### FT-P-12: Continuous output through turn area (frames 32–43) **Summary**: SUT keeps producing position estimates through the turn segment of `coordinates.csv`. **Traces to**: AC-3.2, AC-4.4, results_report row 16. Tier: T1. **Preconditions**: standard pipeline replay. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay frames 32–43 | per-frame GPS_INPUT | | 2 | Count outputs vs frames | — | **Expected outcome**: ≥1 GPS_INPUT per nav-cam frame in the turn region. **Max execution time**: 30 s. --- ### FT-P-13: 350 m outlier handled (AC-3.1) **Summary**: Pipeline survives a synthetic 350 m gap between consecutive frames (caused by ±20° tilt outlier). **Traces to**: AC-3.1, results_report row 17. Tier: T1. **Input data**: synthetic two-frame pair with 350 m gap injected into `nav_cam_60_slice` mid-replay. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Inject the outlier pair | SUT emits a `vo_extrapolated` or `dead_reckoned` frame, not corrupted output | | 2 | Continue with next valid frame | error returns to ≤100 m within next valid frame | **Expected outcome**: error ≤ 100 m on the next valid frame after the outlier. **Max execution time**: 60 s. --- ### FT-P-14: Sharp-turn re-localization (AC-3.2) **Summary**: Sharp turn (<5 % overlap, <70°, <200 m drift) — VO fails, satellite re-loc recovers. **Traces to**: AC-3.2, F-T7, results_report row 18. Tier: T1. **Input data**: synthetic sharp-turn pair injected into `nav_cam_60_slice`. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Inject the sharp-turn pair | cuVSLAM tracking lost; VPR triggers; matcher re-localizes | | 2 | Track error over next 3 frames | error ≤ 50 m within 3 frames | **Expected outcome**: error ≤ 50 m within 3 frames of the turn. **Max execution time**: 60 s. --- ### FT-P-15: VO loss → satellite recovery → tracking_state == NORMAL **Summary**: After cuVSLAM tracking loss + sat match success, tracking_state returns to NORMAL. **Traces to**: AC-3.2, AC-3.3, results_report row 19. Tier: T1. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Force cuVSLAM tracking-loss; deliver a fresh tile that matches | matcher emits absolute pose; calibrator emits satellite-anchored fix | | 2 | Observe FC EKF3 reconvergence via `EKF_STATUS_REPORT` | EKF3 reconverges | | 3 | Read SUT-published `tracking_state` | == `NORMAL` | **Expected outcome**: tracking_state == NORMAL within bounded time. **Max execution time**: 60 s. --- ### FT-P-16: Cold-start TTFF ≤30 s p95 **Summary**: From companion-computer boot, first valid GPS_INPUT within 30 s. **Traces to**: AC-NEW-1, results_report row 23, F-T11. Tier: T1 statistical (≤10 boots) + T4 binding (50 boots on real HW). **Preconditions**: SUT image cold (no warmed engines); FC providing `GLOBAL_POSITION_INT` simulating IMU-extrapolated pose. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Boot SUT container | container start logged | | 2 | Time from container start to first valid `fix_type==3` GPS_INPUT | t_ttff | | 3 | Repeat N times (N=10 T1 / N=50 T4) | distribution | **Expected outcome**: 95th percentile of t_ttff ≤ 30 s. **Max execution time**: 10 min (T1) / 30 min (T4). --- ### FT-P-17: Validate initial position via first satellite match **Summary**: First satellite match after startup pulls position to ≤50 m. **Traces to**: AC-5.1, AC-NEW-1, results_report row 24. Tier: T1. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Provide `GLOBAL_POSITION_INT` with a deliberate 200 m offset from truth | SUT seeds pipeline with 200 m uncertainty | | 2 | Replay first frame with valid satellite tile | matcher succeeds; calibrator emits anchored fix | | 3 | Read GPS_INPUT lat/lon | error ≤ 50 m | **Expected outcome**: position error ≤ 50 m after first match. **Max execution time**: 90 s. --- ### FT-P-18: Mid-flight reboot recovery ≤30 s **Summary**: Process kill mid-flight; SUT recovers within AC-NEW-1 budget. **Traces to**: AC-5.3, AC-NEW-1, results_report row 25. Tier: T1. **Preconditions**: SUT in steady-state tracking; FC continues to fly. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Send SIGKILL to SUT container | SUT restarts | | 2 | Time from restart to first `fix_type==3` GPS_INPUT | t_recovery | **Expected outcome**: t_recovery ≤ 30 s. **Max execution time**: 60 s. --- ### FT-P-19: Post-reboot first-match accuracy **Summary**: After reboot, first satellite match restores accuracy. **Traces to**: AC-5.3, results_report row 26. Tier: T1. **Steps**: same as FT-P-17 but starting from a reboot. **Expected outcome**: error ≤ 50 m after first match. **Max execution time**: 90 s. --- ### FT-P-20: Object localization (level flight) **Summary**: `POST /objects/locate` returns lat/lon for an object pixel given known UAV pose. **Traces to**: AC-7.1, AC-7.2, results_report row 27. Tier: T1. **Preconditions**: SUT has a known anchored fix; AI camera gimbal pose injected via FC `ATTITUDE`. **Input data**: pixel coordinates + gimbal angle + zoom + altitude in request body. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | `POST /objects/locate` with pixel_x, pixel_y, gimbal_pitch, gimbal_yaw, zoom, altitude | 200 + JSON `{lat, lon, alt, accuracy_m, confidence}` | | 2 | Compare to ground truth | error ≤ accuracy_m | **Expected outcome**: lat/lon within `accuracy_m` of ground truth; in level flight, `accuracy_m` consistent with frame-center accuracy of the GPS-Denied system. In maneuvering flight, response includes the `altitude × |sin(unknown_bank_or_pitch)|` bound (AC-7.1 second clause) when bank/pitch >5°. **Max execution time**: 5 s. --- ### FT-P-21: Coordinate transform round-trip ≤0.1 m **Summary**: GPS → NED → pixel → GPS round-trip preserves position. **Traces to**: AC-6.3, AC-7.2, results_report row 29. Tier: T1. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Submit a known WGS84 point through the round-trip via `/objects/locate` (or a debug endpoint if exposed) | round-trip lat/lon | | 2 | Compare to original | ≤ 0.1 m | **Expected outcome**: round-trip error ≤ 0.1 m. **Max execution time**: 1 s. --- ### FT-P-22: `GET /health` schema and content **Summary**: Health endpoint returns 200 with required fields. **Traces to**: AC-6.1 (telemetry), results_report row 30. Tier: T1. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | `GET /health` | HTTP 200, JSON body | | 2 | Validate schema | contains `status`, `memory_mb`, `gpu_temp_c`, `tracking_state`, `last_anchor_age_s`, `confidence_tier` | **Expected outcome**: as above; `status ∈ {ok, degraded, failed}`. **Max execution time**: 1 s. --- ### FT-P-23: `POST /sessions` returns id **Traces to**: AC-6.1, results_report row 31. Tier: T1. **Steps**: `POST /sessions` (auth) → 200/201 with session id. **Expected outcome**: status ∈ {200, 201}; body has `session_id` matching `^[a-f0-9-]{36}$`. **Max execution time**: 1 s. --- ### FT-P-24: SSE stream emits per-second events **Traces to**: AC-6.1, results_report row 32. Tier: T1. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | `GET /sessions/{id}/stream` | SSE connection; events emitted at ~1 Hz | | 2 | Sample 30 s of stream | each event matches schema: `type`, `timestamp`, `lat`, `lon`, `alt`, `accuracy_h`, `confidence`, `vo_status` | **Expected outcome**: rate 1 Hz ± 0.2 Hz; all events conform to schema. **Max execution time**: 35 s. --- ### FT-P-25: TRT engine load ≤10 s **Traces to**: AC-NEW-1 (sub-budget), results_report row 39. Tier: T1 (synthetic timing) + T4 (real HW). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | At SUT boot, time from container start to "all engines ready" STATUSTEXT | t_engines | **Expected outcome**: t_engines ≤ 10 s. **Max execution time**: 30 s. --- ### FT-P-26: Tile storage size for the operational area **Traces to**: AC-8.3, restrictions §UAV/Satellite, results_report row 40. Tier: T1. **Preconditions**: a 200 km mission path × ±2 km buffer × z=18 + z=20 fixture loaded. **Steps**: read total bytes under `/probe/tiles/`. **Expected outcome**: 300 MB ≤ size ≤ 1000 MB. (Aligned with restriction's ~10 GB persistent cap for full 400 km².) **Max execution time**: 5 s. --- ### FT-P-27: Tile mosaic coverage radius ≥500 m **Traces to**: AC-8.3, results_report row 41. Tier: T1. **Preconditions**: SUT given EKF position with σ_xy. **Steps**: capture the assembled mosaic bbox via STATUSTEXT or a debug endpoint. **Expected outcome**: mosaic radius ≥ 500 m around current position. **Max execution time**: 5 s. --- ### FT-P-28: Tile dedup — ≤1 onboard tile per ground sector **Traces to**: AC-8.4, F-T2. Tier: T1. **Preconditions**: `tile_dedup_replay` (sectors visited ≥2×). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay the flight | onboard tiles written | | 2 | Inspect MBTiles + sidecar in `/probe/tiles/` | per-sector tile count | **Expected outcome**: per-sector count ≤ 1; latest/highest-quality wins. **Max execution time**: 10 min. --- ### FT-P-29: Post-flight upload to candidate pool **Traces to**: AC-8.4, F-T3. Tier: T1. **Preconditions**: `service-stub` running. **Steps**: replay → on landing-event, SUT uploads tiles. **Expected outcome**: `service-stub` records ≥1 tile with `trust_level=candidate`; promotion only after N≥2 voting flights (so a single flight does not promote). **Max execution time**: 5 min. --- ### FT-P-30: NAMED_VALUE_FLOAT telemetry rate **Traces to**: AC-6.1, results_report row 45. Tier: T1. **Steps**: sniff `gps_conf`, `gps_drift`, `gps_hacc` NAMED_VALUE_FLOAT rates over 30 s. **Expected outcome**: each at 1 Hz ± 0.2 Hz. **Max execution time**: 35 s. --- ### FT-P-31: Disconnected segments — ≥3 connected via global retrieval **Traces to**: AC-3.3, F-T8. Tier: T1. **Preconditions**: `disconnected_segments_replay` with ≥3 segments. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay each segment with a synthetic gap | for each segment, VPR retrieves top-K candidates; matcher relocalizes | | 2 | Verify segment-to-segment trajectory continuity | each segment connects to prior trajectory | **Expected outcome**: 3/3 segments connect within 10 frames of segment start; `tracking_state == NORMAL` after each. **Max execution time**: 5 min. --- ### FT-P-32: Position refinement / corrections (AC-4.5) **Traces to**: AC-4.5. Tier: T1. **Preconditions**: SUT in steady state; ability to refine prior fixes. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Capture sequence of GPS_INPUT for a 10-s window | per-frame fixes | | 2 | After delayed loop closure / late satellite match, observe whether SUT emits a corrected fix or signals correction via STATUSTEXT | a follow-up GPS_INPUT for an earlier `time_usec` OR a STATUSTEXT correction record | **Expected outcome**: at least one correction event where the corrected fix replaces the prior fix's `h_acc` (covariance shrinks). System never silently rewrites past output without recording the correction. **Max execution time**: 60 s. --- ### FT-P-33: Object-localize bank/pitch >5° publishes uncertainty bound **Traces to**: AC-7.1 (second clause). Tier: T1. **Preconditions**: FC `ATTITUDE` published with bank > 5°. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | `POST /objects/locate` while bank > 5° | response includes bound = `altitude × abs(sin(bank_or_pitch))` | **Expected outcome**: response body includes `bank_pitch_bound_m` matching the formula within 1 m. **Max execution time**: 5 s. --- ### FT-P-34: Mid-flight tile generation respects σ_xy ≤ 5 m hard gate **Traces to**: AC-8.4, AC-NEW-7 hard gate. Tier: T1. **Preconditions**: scripted scenarios with σ_xy ∈ {2, 4, 6, 8} m. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay σ_xy=2 m frames | tiles written | | 2 | Replay σ_xy=8 m frames | NO tiles written | | 3 | Inspect sidecar `trust_level` for σ_xy ∈ (3, 5] m | `trust_level == soft` | | 4 | Inspect sidecar for σ_xy ≤ 3 m | `trust_level == candidate` | **Expected outcome**: as above. **Max execution time**: 5 min. --- ### FT-P-35: NF-T4b cache-poisoning safety budget (Monte Carlo) **Traces to**: AC-NEW-7. Tier: T2 (`deferred-corpus`). `data_status: deferred-corpus`. **Preconditions**: ≥100 simulated flights worth of frames from AerialVL + Mavic + AerialExtreMatch with synthetic over-confidence injection (1.5×–3×). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay all flights | per-tile geo-misalignment captured | | 2 | Compute P(misalign > 30 m) and P(misalign > 100 m) | — | **Expected outcome**: P(>30 m) < 1 %, P(>100 m) < 0.1 %. **Max execution time**: 4 h. --- ### FT-P-36: AC-NEW-9 covariance calibration accuracy **Traces to**: AC-NEW-9, F-T18. Tier: T2. **Preconditions**: AerialVL S03 replay with ground truth. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | For each emitted GPS_INPUT, capture (`h_acc`, ground-truth error) | series of pairs | | 2 | Compute fraction of frames where `error ≤ h_acc * Mahalanobis-2D-95% factor` | fraction | **Expected outcome**: fraction ≥ 95 % (calibration neither over- nor under-claims). **Max execution time**: 90 min. --- ### FT-P-37: F-T18 calibrator regression (no state propagation) **Traces to**: AC-NEW-9, F-T18. Tier: T2. **Preconditions**: replay with logging hooks on Component 5 outputs (publicly exposed counters). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Run replay | calibrator counters emitted | | 2 | Assert `state_propagation_invocations_total == 0` | no propagation | | 3 | Assert `mahalanobis_gate_rejections_total > 0` | gate active | **Expected outcome**: as above. **Max execution time**: 90 min. --- ## Negative Scenarios ### FT-N-01: Corrupted nav-cam frame — no crash, degraded mode **Traces to**: AC-3.x (resilience), restriction "fixed downward camera". Tier: T1. **Input data**: a 60-frame replay with frame N replaced by a 10-byte random blob. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Stream the replay | SUT logs decode error; emits STATUSTEXT WARN | | 2 | Inspect tracking_state | transitions to `DEGRADED` for 1 frame; recovers to `NORMAL` on next valid frame | | 3 | SUT process | does NOT crash | **Expected outcome**: process alive; no GPS_INPUT spike with bad data; tracking_state returns to NORMAL within 1 frame of recovery. **Max execution time**: 30 s. --- ### FT-N-02: Object-localize invalid pixel **Traces to**: AC-7.1, results_report row 28. Tier: T1. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | `POST /objects/locate` with `pixel_x = -10` (out of frame) | HTTP 422 + error body | **Expected outcome**: status == 422; body contains a structured error code. **Max execution time**: 1 s. --- ### FT-N-03: Unauthenticated `POST /sessions` **Traces to**: results_report row 33, security restrictions. Tier: T1. **Steps**: `POST /sessions` without JWT → 401. **Expected outcome**: status == 401. **Max execution time**: 1 s. --- ### FT-N-04: Stale tile beyond grace — must NOT label `satellite_anchored` **Traces to**: AC-8.2, AC-NEW-6. Tier: T1. **Preconditions**: `stale_tile_scenarios` with 18-month-old active-conflict tile (well past 6 mo + 30-day grace). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay frames whose only candidate tile is the 18 mo stale one | matcher invocation skipped or scored 0 | | 2 | Inspect source label on emitted GPS_INPUT | NEVER `satellite_anchored` | | 3 | Inspect WARN STATUSTEXT | tile rejected event recorded | **Expected outcome**: no `satellite_anchored` label across the run; rejection event recorded. **Max execution time**: 60 s. --- ### FT-N-05: Stale tile in 30-day grace — confidence linearly decayed **Traces to**: AC-NEW-6. Tier: T1. **Preconditions**: tiles aged at +0, +15, +30 days past the 6-mo budget. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay each | confidence weight in sidecar metric: 1.0, 0.5, 0.0 | **Expected outcome**: confidence weight decays linearly as specified. **Max execution time**: 60 s. --- ### FT-N-06: Sharp-turn negative — must trigger satellite re-loc, not silently fail **Traces to**: AC-3.2 (negative case). Tier: T1. **Steps**: same as FT-P-14 but assert that **before** re-loc the SUT emits a STATUSTEXT explaining VO loss; assert tracking_state transitions through DEGRADED. **Expected outcome**: explicit STATUSTEXT log; recovery within 3 frames. **Max execution time**: 60 s. --- ### FT-N-07: 3-consecutive-failure → RELOC_REQ on STATUSTEXT **Traces to**: AC-3.4, results_report rows 20, 46. Tier: T1. **Steps**: see FT-P-08; additionally verify the regex `RELOC_REQ:.*last_lat=.*last_lon=.*uncertainty=.*m`. **Expected outcome**: regex matches at least one STATUSTEXT after 3 failures; emitted within 2 s of the third failure (per AC-3.4 timing). **Max execution time**: 60 s. --- ### FT-N-08: Re-loc waiting state behaviour **Traces to**: AC-3.4, results_report row 21. Tier: T1. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | After RELOC_REQ, observe SUT for 10 s | `fix_type == 0` GPS_INPUT continues; IMU-prediction-only label; satellite-match attempts continue (counter increments) | **Expected outcome**: as above; SUT does NOT stop emitting GPS_INPUT. **Max execution time**: 30 s. --- ### FT-N-09: Operator hint — used as 500 m seed **Traces to**: AC-6.2, AC-3.4, results_report row 22. Tier: T1. **Preconditions**: SUT in re-loc-waiting; operator hint scenario active. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | `qgc-mock` sends STATUSTEXT `RELOC_HINT: lat=… lon=… sigma=500m` | SUT consumes hint; uses as seed for VPR/cross-view | | 2 | First fix after hint | error ≤ 500 m initially | | 3 | After next satellite match | error ≤ 50 m | **Expected outcome**: as above. **Max execution time**: 60 s. --- ### FT-N-10: Operator hint — malformed value rejected **Traces to**: AC-6.2 (negative). Tier: T1. **Steps**: send `RELOC_HINT: lat=NaN lon=… sigma=-10`. **Expected outcome**: SUT emits STATUSTEXT WARN; hint NOT applied; pipeline state unchanged. **Max execution time**: 30 s. --- ### FT-N-11: AC-4.3 — ODOMETRY intentionally absent in v1 **Traces to**: AC-4.3 (v1-scope clause), F-T9 Option A. Tier: T1. **Preconditions**: SUT configured for v1 (default). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Run any of FT-P-01 / FT-P-04 / FT-P-T2 | GPS_INPUT emitted | | 2 | At qgc-mock, count ODOMETRY frames over the run | == 0 | | 3 | Inspect `EK3_SRC1_*` configuration via FC parameter readback | `POSXY=GPS, VELXY=GPS, YAW=GPS+Compass` | **Expected outcome**: ODOMETRY count == 0; FC parameters as configured. **Max execution time**: 60 s. --- ### FT-N-12: Spoofed GPS — SUT promotes within 3 s **Traces to**: AC-NEW-2, F-T12. Tier: T3 (`deferred-sitl`). `data_status: deferred-sitl`. **Preconditions**: SITL + `gps-spoof-injector` configured. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | At t=0, inject a malicious `GPS_RAW_INT` with a 1 km offset | FC sees both spoof + SUT GPS_INPUT | | 2 | Time from spoof onset to SUT promoting its `GPS_INPUT` to primary (raised `fix_type=3` AND STATUSTEXT promotion event) | t_promote | | 3 | Repeat 50× | distribution | **Expected outcome**: 95th percentile of t_promote < 3 s. **Max execution time**: 30 min. --- ### FT-N-13: Failsafe at 3 s no-fix (AC-5.2) **Traces to**: AC-5.2. Tier: T1+T3. **Preconditions**: scripted scenario where SUT cannot produce ANY estimate for 3.5 s. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Force pipeline blackout | SUT logs failure | | 2 | Verify FC behaviour | ArduPilot SITL logs fall-back to IMU-only dead reckoning | **Expected outcome**: failsafe transition observable in `EKF_STATUS_REPORT` within 4 s of blackout. **Max execution time**: 60 s. --- ### FT-N-14: Refusal of unsigned MAVLink (S-T1 boundary) **Traces to**: restrictions §Sensors. Tier: T3. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Send a GPS_INPUT with invalid signing tag from the runner | FC rejects | | 2 | Inspect FC log + STATUSTEXT to GCS | rejection event recorded | **Expected outcome**: rejected; FC continues to fly on prior valid source. **Max execution time**: 30 s. --- ### FT-N-15: SITL F-T9 Option A regression — EKF3 fuses GPS_INPUT only, no double-fusion **Traces to**: AC-4.3, F-T9 Option A. Tier: T3. **Preconditions**: SITL with `EK3_SRC1_*=GPS+Compass`, `EK3_SRC2_*=GPS`. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Run a representative AerialVL replay | EKF3 fuses GPS_INPUT | | 2 | Inspect EKF3 logs for double-fusion symptoms (issues #30076, #32506) | none | | 3 | Trigger backup-GPS failover via SITL parameter | EKF3 switches to `EK3_SRC2_*` cleanly | **Expected outcome**: no double-fusion; clean failover. **Max execution time**: 30 min. --- ### FT-N-16: SITL F-T9 Option B regression (v1.1 candidate) **Traces to**: AC-4.3 Option B (v1.1+), F-T9. Tier: T3 (`deferred-sitl`). `data_status: deferred-sitl`. **Preconditions**: SITL with PR #30080-class build; SUT switched to ODOMETRY-primary mode (build flag). **Steps**: ODOMETRY primary; GPS_INPUT held in reserve; verify clean source-switching, no double-fusion. **Expected outcome**: as above. **Test runs but build flag is OFF for v1 release gate.** **Max execution time**: 30 min. --- ### FT-N-17: AC-NEW-7 single-flight tile NOT promoted to trusted basemap **Traces to**: AC-NEW-7 (Service-side voting). Tier: T1 (`service-stub`). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Run a single-flight upload to candidate pool | tile recorded as `trust_level=candidate` | | 2 | Query `service-stub` for promoted basemap content | tile NOT in promoted basemap | **Expected outcome**: as above; promotion only after N≥2 voting flights confirm. **Max execution time**: 5 min. --- ### FT-N-18: AC-8.5 — raw frames are NOT retained in FDR **Traces to**: AC-8.5, AC-NEW-3. Tier: T1. **Preconditions**: replay 60-frame slice; `nav_cam_60_slice` written to camera input. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Run the replay | FDR populates | | 2 | Inspect `/probe/fdr/` for raw nav-cam frames | no JPEGs / no AI-cam frames | | 3 | Inspect for thumbnail log of failed-tile-generation frames | present, ≤0.1 Hz, within FDR cap | **Expected outcome**: no raw frames retained; only the failure thumbnail log within budget. **Max execution time**: 60 s. --- ### FT-N-19: Free public Sentinel-2 tile rejected at cache boundary **Traces to**: AC-8.1 (resolution floor), restrictions §Satellite. Tier: T1. **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Inject a synthetic tile at 10 m/px into the cache | cache index marks as below-resolution | | 2 | Replay frames over that area | matcher does NOT use the tile; never emits `satellite_anchored` from it | **Expected outcome**: as above. **Max execution time**: 60 s. --- ### FT-N-20: Photo-count cap removed — system runs without arbitrary cap **Traces to**: restrictions §UAV ("no photo-count cap"). Tier: T1 (smoke). **Steps**: | Step | Consumer Action | Expected System Response | |------|----------------|------------------------| | 1 | Replay `synthetic_8h_load` for 30 min | SUT continues operating | | 2 | Inspect logs for any "photo count exceeded" condition | none | **Expected outcome**: no cap-related condition; pipeline degrades only against FDR cap (AC-NEW-3) and tile-cache cap. **Max execution time**: 35 min. --- ### FT-N-21: AC-7.1 in maneuvering flight — uncertainty bound published, not a fixed number **Traces to**: AC-7.1 second clause. Tier: T1. **Steps**: like FT-P-33 but assert that for bank ∈ {0°, 5°, 15°, 30°}, the published `bank_pitch_bound_m` matches `altitude × |sin(bank)|` within 1 m. **Expected outcome**: bound published correctly across the range. **Max execution time**: 30 s. --- ## Coverage notes - **Pipeline-correctness boundary**: T1 tests on the 60-image slice are NOT deployment-binding. AC-1.1, AC-1.2, AC-2.1, AC-2.2, AC-1.3, AC-NEW-8 deployment numbers come from T2 (FT-P-T2, FT-P-04 binding split, FT-P-10 binding, FT-P-11 binding). - **Behavioural-shape tests**: FT-P-08, FT-P-15, FT-P-16, FT-P-18, FT-N-04, FT-N-11, FT-N-12, FT-N-13, FT-N-14, FT-N-17, FT-N-18, FT-N-19 use the behavioural shape (trigger + observable + quantifiable verdict) — no input-data input/output mapping required. - **Untraced tests**: none. Every test traces to ≥1 AC or restriction.