Files
gps-denied-onboard/_docs/02_document/tests/blackbox-tests.md
T

36 KiB
Raw Blame History

Blackbox Tests

Tier markers (per environment.md): pipeline (T1), deferred-corpus (T2), deferred-sitl (T3), deferred-hil (T4), deferred-field (T5). Every test pairs an input/observable with a quantifiable expected result from _docs/00_problem/input_data/expected_results/results_report.md or directly from an AC. All tests run through the public interfaces defined in environment.md. No SUT-internal access.

Positive Scenarios

FT-P-01: 60-frame sequential pipeline — ≥80 % within 50 m (pipeline-correctness only)

Summary: Sequentially feed the 60 nav-cam JPGs through the SUT and verify the position-error CDF on this corpus. Traces to: AC-1.1 (pipeline-correctness only — see test-data.md caveat), results_report row 1. Category: Position Accuracy. Tier: T1 (pipeline).

Preconditions:

  • nav_cam_60_slice mounted; nav_cam_60_slice_imu synthesised; satellite_tiles_AD0000xx_z20 placeholder fixture present.
  • SUT booted; cuVSLAM warmed; ArduPilot SITL loaded with the corresponding IMU replay; first valid GPS_INPUT received (i.e., AC-NEW-1 cleared).

Input data: nav_cam_60_slice + coordinates.csv + nav_cam_60_slice_imu.

Steps:

Step Consumer Action Expected System Response
1 Stream the 60 JPGs at 3 fps via the camera-input shim into the SUT SUT publishes GPS_INPUT for each frame
2 Capture each GPS_INPUT.lat / lon at the qgc-mock sniffer All frames produce a frame within the test window
3 Compute haversine error vs coordinates.csv ground truth per frame Per-frame errors collected into a CDF

Expected outcome: ≥80 % of frames have error < 50 m. Reported as pipeline-functional, not deployment-binding (per test-data.md caveat — deployment-binding number from FT-P-T2 / AerialVL). Max execution time: 60 s per run.


FT-P-02: 60-frame sequential pipeline — ≥50 % within 20 m (pipeline-correctness only)

Summary: Same corpus as FT-P-01; tighter tolerance. Traces to: AC-1.2 (pipeline-correctness only), results_report row 2. Category: Position Accuracy. Tier: T1.

Preconditions / Input data: same as FT-P-01.

Steps:

Step Consumer Action Expected System Response
1 Replay the corpus end-to-end per-frame GPS_INPUT
2 Compute haversine error CDF

Expected outcome: ≥50 % within 20 m on the 60-image slice (functional check). Deployment-binding number comes from AerialVL S03 in FT-P-T2. Max execution time: 60 s.


FT-P-T2: AC-1.1 / AC-1.2 deployment-binding accuracy on AerialVL S03

Summary: Re-run AC-1.1 / AC-1.2 on the deployment-binding corpus. Traces to: AC-1.1, AC-1.2. Tier: T2 (deferred-corpus). data_status: deferred-corpus.

Preconditions: aerialvl_s03 mounted with synced IMU + nav-cam stream + GPS truth.

Input data: AerialVL S03.

Steps:

Step Consumer Action Expected System Response
1 Replay AerialVL S03 70 km of fixed-wing flight at 1 km AGL per-frame GPS_INPUT
2 Compute error CDF vs S03 GPS truth

Expected outcome: ≥80 % within 50 m AND ≥50 % within 20 m (deployment-binding). Max execution time: 90 min (replay + analysis).


FT-P-03: Per-frame error bound ≤100 m

Summary: No single frame exceeds 100 m error on the 60-image slice. Traces to: AC-1.1 (negative-tail bound), results_report row 3. Tier: T1.

Preconditions / Input: same as FT-P-01.

Steps:

Step Consumer Action Expected System Response
1 Replay 60 frames per-frame GPS_INPUT
2 Compute max(haversine_err) over all frames

Expected outcome: max error ≤ 100 m. Pipeline-functional only. Max execution time: 60 s.


FT-P-04: VO drift bound between satellite anchors

Summary: VO drift between successive satellite-anchored fixes stays bounded. Traces to: AC-1.3, AC-NEW-8, results_report row 4, F-T1b. Tier: T1 functional + T2 binding.

Preconditions: cuVSLAM in mono+IMU mode (T1) AND mono-only mode (T2 split test). Input data: nav_cam_60_slice (T1) + AerialVL S03 (T2).

Steps:

Step Consumer Action Expected System Response
1 Identify successive satellite_anchored source-label transitions series of anchor pairs
2 For each anchor pair, measure VO-extrapolated centre vs next-anchor centre drift in metres
3 Compute 95th percentile across all pairs

Expected outcome:

  • mono+IMU: p95 drift ≤ 50 m (binding on T2 / AerialVL).
  • mono-only: p95 drift ≤ 100 m (binding on T2 / AerialVL).
  • T1 functional check: drift bounded (no monotonic growth) — exact numbers not deployment-binding.

Max execution time: 90 min (T2).


FT-P-05: GPS_INPUT shape under normal tracking

Summary: GPS_INPUT messages emitted while tracking is healthy carry the correct schema and value ranges. Traces to: AC-1.4, AC-4.3, AC-6.3, results_report row 5. Tier: T1.

Preconditions: SUT in steady-state tracking with recent satellite anchor (<30 s old). Input data: any single frame from nav_cam_60_slice.

Steps:

Step Consumer Action Expected System Response
1 Sniff GPS_INPUT at qgc-mock one frame per nav-cam frame
2 Decode fields: fix_type, horiz_accuracy, satellites_visible, lat, lon, alt, vel_acc as per MAVLink GPS_INPUT spec
3 Inspect optional ODOMETRY: assert intentional absence in v1 (per AC-4.3 v1-scope clause) no ODOMETRY frames present

Expected outcome: fix_type == 3, horiz_accuracy ∈ [1, 50] m, satellites_visible == 10, lat / lon non-null, WGS84. ODOMETRY count == 0 across the run. Max execution time: 30 s.


FT-P-06: GPS_INPUT shape during VO-only fallback

Summary: Fields adapt when no satellite anchor is available for >30 s. Traces to: AC-1.4, AC-4.3, results_report row 6. Tier: T1.

Preconditions: Force satellite-match failure for >30 s (cache poisoned with stale tiles).

Input data: nav_cam_60_slice with stale_tile_scenarios injected.

Steps:

Step Consumer Action Expected System Response
1 After 30 s of failed matches, sniff GPS_INPUT fix_type == 3, horiz_accuracy ∈ [20, 100] m, source-label vo_extrapolated

Expected outcome: as above; horiz_accuracy grows monotonically until next successful match. Max execution time: 60 s.


FT-P-07: GPS_INPUT shape during dead-reckoning

Summary: VO lost AND no satellite → IMU-only dead reckoning. Traces to: AC-1.4, AC-5.2, results_report row 7. Tier: T1.

Preconditions: Inject cuVSLAM tracking-loss + cache poisoned.

Steps:

Step Consumer Action Expected System Response
1 Sniff GPS_INPUT fix_type == 2, horiz_accuracy ≥ 50 m and growing
2 Source label dead_reckoned

Expected outcome: fix_type == 2, monotonically growing horiz_accuracy, source == dead_reckoned. Max execution time: 60 s.


FT-P-08: GPS_INPUT shape on total failure

Summary: 3+ consecutive failures — system signals total failure. Traces to: AC-3.4, results_report row 8. Tier: T1.

Preconditions: cache_poisoning_scenarios flavour that causes 3 sat failures + cuVSLAM lost.

Steps:

Step Consumer Action Expected System Response
1 Wait for 3 consecutive failures GPS_INPUT continues at the configured rate
2 Inspect GPS_INPUT fix_type == 0, horiz_accuracy == 999.0
3 Inspect STATUSTEXT RELOC_REQ regex emitted

Expected outcome: as above. Max execution time: 60 s.


FT-P-09: Confidence tier transitions

Summary: Confidence tier label transitions match defined conditions. Traces to: AC-1.4, results_report rows 1013. Tier: T1.

Preconditions: scripted scenario that walks (HIGH → MEDIUM → LOW → FAILED).

Steps:

Step Consumer Action Expected System Response
1 At each scripted state, read the SSE stream confidence field AND the source-label field matches expected tier

Expected outcome:

  • Sat anchor <30 s + cov <400 m² → tier HIGH, source satellite_anchored.
  • cuVSLAM OK + no sat >30 s → tier MEDIUM, source vo_extrapolated.
  • cuVSLAM lost + IMU only → tier LOW, source dead_reckoned.
  • 3+ consecutive failures → tier FAILED, fix_type 0.

Max execution time: 5 min.


FT-P-10: Image registration rate (functional)

Summary: Pipeline registers ≥95 % of normal-flight frames against the previous frame. Traces to: AC-2.1 (pipeline-functional only), results_report row 14. Tier: T1 functional + T2 binding.

Preconditions: SUT exposes registration outcome via STATUSTEXT or NAMED_VALUE_FLOAT (reg_pass_count, reg_total_count).

Steps:

Step Consumer Action Expected System Response
1 Replay nav_cam_60_slice (T1) or AerialVL S03 (T2) registration metrics published
2 Compute reg_pass_count / reg_total_count percentage

Expected outcome: T1 ≥95 % (functional); T2 ≥95 % (deployment-binding) under normal-flight definition (nadir, ±10° bank/pitch, ≥40 % overlap, daytime, season-matched tile). Max execution time: 60 s (T1) / 90 min (T2).


FT-P-11: Mean Reprojection Error (MRE)

Summary: VO and cross-domain MRE under thresholds. Traces to: AC-2.2, results_report row 15. Tier: T1 functional + T2 binding.

Preconditions: SUT publishes mre_vo (frame-to-frame) and mre_cross (cross-view) on the metrics endpoint.

Steps:

Step Consumer Action Expected System Response
1 Scrape MRE metrics over a replay per-frame samples
2 Compute mean across the run

Expected outcome: mean(mre_vo) < 1.0 px; mean(mre_cross) < 2.5 px. T1 numbers functional only. Max execution time: 60 s (T1) / 90 min (T2).


FT-P-12: Continuous output through turn area (frames 3243)

Summary: SUT keeps producing position estimates through the turn segment of coordinates.csv. Traces to: AC-3.2, AC-4.4, results_report row 16. Tier: T1.

Preconditions: standard pipeline replay.

Steps:

Step Consumer Action Expected System Response
1 Replay frames 3243 per-frame GPS_INPUT
2 Count outputs vs frames

Expected outcome: ≥1 GPS_INPUT per nav-cam frame in the turn region. Max execution time: 30 s.


FT-P-13: 350 m outlier handled (AC-3.1)

Summary: Pipeline survives a synthetic 350 m gap between consecutive frames (caused by ±20° tilt outlier). Traces to: AC-3.1, results_report row 17. Tier: T1.

Input data: synthetic two-frame pair with 350 m gap injected into nav_cam_60_slice mid-replay.

Steps:

Step Consumer Action Expected System Response
1 Inject the outlier pair SUT emits a vo_extrapolated or dead_reckoned frame, not corrupted output
2 Continue with next valid frame error returns to ≤100 m within next valid frame

Expected outcome: error ≤ 100 m on the next valid frame after the outlier. Max execution time: 60 s.


FT-P-14: Sharp-turn re-localization (AC-3.2)

Summary: Sharp turn (<5 % overlap, <70°, <200 m drift) — VO fails, satellite re-loc recovers. Traces to: AC-3.2, F-T7, results_report row 18. Tier: T1.

Input data: synthetic sharp-turn pair injected into nav_cam_60_slice.

Steps:

Step Consumer Action Expected System Response
1 Inject the sharp-turn pair cuVSLAM tracking lost; VPR triggers; matcher re-localizes
2 Track error over next 3 frames error ≤ 50 m within 3 frames

Expected outcome: error ≤ 50 m within 3 frames of the turn. Max execution time: 60 s.


FT-P-15: VO loss → satellite recovery → tracking_state == NORMAL

Summary: After cuVSLAM tracking loss + sat match success, tracking_state returns to NORMAL. Traces to: AC-3.2, AC-3.3, results_report row 19. Tier: T1.

Steps:

Step Consumer Action Expected System Response
1 Force cuVSLAM tracking-loss; deliver a fresh tile that matches matcher emits absolute pose; calibrator emits satellite-anchored fix
2 Observe FC EKF3 reconvergence via EKF_STATUS_REPORT EKF3 reconverges
3 Read SUT-published tracking_state == NORMAL

Expected outcome: tracking_state == NORMAL within bounded time. Max execution time: 60 s.


FT-P-16: Cold-start TTFF ≤30 s p95

Summary: From companion-computer boot, first valid GPS_INPUT within 30 s. Traces to: AC-NEW-1, results_report row 23, F-T11. Tier: T1 statistical (≤10 boots) + T4 binding (50 boots on real HW).

Preconditions: SUT image cold (no warmed engines); FC providing GLOBAL_POSITION_INT simulating IMU-extrapolated pose.

Steps:

Step Consumer Action Expected System Response
1 Boot SUT container container start logged
2 Time from container start to first valid fix_type==3 GPS_INPUT t_ttff
3 Repeat N times (N=10 T1 / N=50 T4) distribution

Expected outcome: 95th percentile of t_ttff ≤ 30 s. Max execution time: 10 min (T1) / 30 min (T4).


FT-P-17: Validate initial position via first satellite match

Summary: First satellite match after startup pulls position to ≤50 m. Traces to: AC-5.1, AC-NEW-1, results_report row 24. Tier: T1.

Steps:

Step Consumer Action Expected System Response
1 Provide GLOBAL_POSITION_INT with a deliberate 200 m offset from truth SUT seeds pipeline with 200 m uncertainty
2 Replay first frame with valid satellite tile matcher succeeds; calibrator emits anchored fix
3 Read GPS_INPUT lat/lon error ≤ 50 m

Expected outcome: position error ≤ 50 m after first match. Max execution time: 90 s.


FT-P-18: Mid-flight reboot recovery ≤30 s

Summary: Process kill mid-flight; SUT recovers within AC-NEW-1 budget. Traces to: AC-5.3, AC-NEW-1, results_report row 25. Tier: T1.

Preconditions: SUT in steady-state tracking; FC continues to fly.

Steps:

Step Consumer Action Expected System Response
1 Send SIGKILL to SUT container SUT restarts
2 Time from restart to first fix_type==3 GPS_INPUT t_recovery

Expected outcome: t_recovery ≤ 30 s. Max execution time: 60 s.


FT-P-19: Post-reboot first-match accuracy

Summary: After reboot, first satellite match restores accuracy. Traces to: AC-5.3, results_report row 26. Tier: T1.

Steps: same as FT-P-17 but starting from a reboot.

Expected outcome: error ≤ 50 m after first match. Max execution time: 90 s.


FT-P-20: Object localization (level flight)

Summary: POST /objects/locate returns lat/lon for an object pixel given known UAV pose. Traces to: AC-7.1, AC-7.2, results_report row 27. Tier: T1.

Preconditions: SUT has a known anchored fix; AI camera gimbal pose injected via FC ATTITUDE. Input data: pixel coordinates + gimbal angle + zoom + altitude in request body.

Steps:

Step Consumer Action Expected System Response
1 POST /objects/locate with pixel_x, pixel_y, gimbal_pitch, gimbal_yaw, zoom, altitude 200 + JSON {lat, lon, alt, accuracy_m, confidence}
2 Compare to ground truth error ≤ accuracy_m

Expected outcome: lat/lon within accuracy_m of ground truth; in level flight, accuracy_m consistent with frame-center accuracy of the GPS-Denied system. In maneuvering flight, response includes the altitude × |sin(unknown_bank_or_pitch)| bound (AC-7.1 second clause) when bank/pitch >5°. Max execution time: 5 s.


FT-P-21: Coordinate transform round-trip ≤0.1 m

Summary: GPS → NED → pixel → GPS round-trip preserves position. Traces to: AC-6.3, AC-7.2, results_report row 29. Tier: T1.

Steps:

Step Consumer Action Expected System Response
1 Submit a known WGS84 point through the round-trip via /objects/locate (or a debug endpoint if exposed) round-trip lat/lon
2 Compare to original ≤ 0.1 m

Expected outcome: round-trip error ≤ 0.1 m. Max execution time: 1 s.


FT-P-22: GET /health schema and content

Summary: Health endpoint returns 200 with required fields. Traces to: AC-6.1 (telemetry), results_report row 30. Tier: T1.

Steps:

Step Consumer Action Expected System Response
1 GET /health HTTP 200, JSON body
2 Validate schema contains status, memory_mb, gpu_temp_c, tracking_state, last_anchor_age_s, confidence_tier

Expected outcome: as above; status ∈ {ok, degraded, failed}. Max execution time: 1 s.


FT-P-23: POST /sessions returns id

Traces to: AC-6.1, results_report row 31. Tier: T1.

Steps: POST /sessions (auth) → 200/201 with session id.

Expected outcome: status ∈ {200, 201}; body has session_id matching ^[a-f0-9-]{36}$. Max execution time: 1 s.


FT-P-24: SSE stream emits per-second events

Traces to: AC-6.1, results_report row 32. Tier: T1.

Steps:

Step Consumer Action Expected System Response
1 GET /sessions/{id}/stream SSE connection; events emitted at ~1 Hz
2 Sample 30 s of stream each event matches schema: type, timestamp, lat, lon, alt, accuracy_h, confidence, vo_status

Expected outcome: rate 1 Hz ± 0.2 Hz; all events conform to schema. Max execution time: 35 s.


FT-P-25: TRT engine load ≤10 s

Traces to: AC-NEW-1 (sub-budget), results_report row 39. Tier: T1 (synthetic timing) + T4 (real HW).

Steps:

Step Consumer Action Expected System Response
1 At SUT boot, time from container start to "all engines ready" STATUSTEXT t_engines

Expected outcome: t_engines ≤ 10 s. Max execution time: 30 s.


FT-P-26: Tile storage size for the operational area

Traces to: AC-8.3, restrictions §UAV/Satellite, results_report row 40. Tier: T1.

Preconditions: a 200 km mission path × ±2 km buffer × z=18 + z=20 fixture loaded.

Steps: read total bytes under /probe/tiles/.

Expected outcome: 300 MB ≤ size ≤ 1000 MB. (Aligned with restriction's ~10 GB persistent cap for full 400 km².) Max execution time: 5 s.


FT-P-27: Tile mosaic coverage radius ≥500 m

Traces to: AC-8.3, results_report row 41. Tier: T1.

Preconditions: SUT given EKF position with σ_xy.

Steps: capture the assembled mosaic bbox via STATUSTEXT or a debug endpoint.

Expected outcome: mosaic radius ≥ 500 m around current position. Max execution time: 5 s.


FT-P-28: Tile dedup — ≤1 onboard tile per ground sector

Traces to: AC-8.4, F-T2. Tier: T1.

Preconditions: tile_dedup_replay (sectors visited ≥2×).

Steps:

Step Consumer Action Expected System Response
1 Replay the flight onboard tiles written
2 Inspect MBTiles + sidecar in /probe/tiles/ per-sector tile count

Expected outcome: per-sector count ≤ 1; latest/highest-quality wins. Max execution time: 10 min.


FT-P-29: Post-flight upload to candidate pool

Traces to: AC-8.4, F-T3. Tier: T1.

Preconditions: service-stub running.

Steps: replay → on landing-event, SUT uploads tiles.

Expected outcome: service-stub records ≥1 tile with trust_level=candidate; promotion only after N≥2 voting flights (so a single flight does not promote). Max execution time: 5 min.


FT-P-30: NAMED_VALUE_FLOAT telemetry rate

Traces to: AC-6.1, results_report row 45. Tier: T1.

Steps: sniff gps_conf, gps_drift, gps_hacc NAMED_VALUE_FLOAT rates over 30 s.

Expected outcome: each at 1 Hz ± 0.2 Hz. Max execution time: 35 s.


FT-P-31: Disconnected segments — ≥3 connected via global retrieval

Traces to: AC-3.3, F-T8. Tier: T1.

Preconditions: disconnected_segments_replay with ≥3 segments.

Steps:

Step Consumer Action Expected System Response
1 Replay each segment with a synthetic gap for each segment, VPR retrieves top-K candidates; matcher relocalizes
2 Verify segment-to-segment trajectory continuity each segment connects to prior trajectory

Expected outcome: 3/3 segments connect within 10 frames of segment start; tracking_state == NORMAL after each. Max execution time: 5 min.


FT-P-32: Position refinement / corrections (AC-4.5)

Traces to: AC-4.5. Tier: T1.

Preconditions: SUT in steady state; ability to refine prior fixes.

Steps:

Step Consumer Action Expected System Response
1 Capture sequence of GPS_INPUT for a 10-s window per-frame fixes
2 After delayed loop closure / late satellite match, observe whether SUT emits a corrected fix or signals correction via STATUSTEXT a follow-up GPS_INPUT for an earlier time_usec OR a STATUSTEXT correction record

Expected outcome: at least one correction event where the corrected fix replaces the prior fix's h_acc (covariance shrinks). System never silently rewrites past output without recording the correction. Max execution time: 60 s.


FT-P-33: Object-localize bank/pitch >5° publishes uncertainty bound

Traces to: AC-7.1 (second clause). Tier: T1.

Preconditions: FC ATTITUDE published with bank > 5°.

Steps:

Step Consumer Action Expected System Response
1 POST /objects/locate while bank > 5° response includes bound = altitude × abs(sin(bank_or_pitch))

Expected outcome: response body includes bank_pitch_bound_m matching the formula within 1 m. Max execution time: 5 s.


FT-P-34: Mid-flight tile generation respects σ_xy ≤ 5 m hard gate

Traces to: AC-8.4, AC-NEW-7 hard gate. Tier: T1.

Preconditions: scripted scenarios with σ_xy ∈ {2, 4, 6, 8} m.

Steps:

Step Consumer Action Expected System Response
1 Replay σ_xy=2 m frames tiles written
2 Replay σ_xy=8 m frames NO tiles written
3 Inspect sidecar trust_level for σ_xy ∈ (3, 5] m trust_level == soft
4 Inspect sidecar for σ_xy ≤ 3 m trust_level == candidate

Expected outcome: as above. Max execution time: 5 min.


FT-P-35: NF-T4b cache-poisoning safety budget (Monte Carlo)

Traces to: AC-NEW-7. Tier: T2 (deferred-corpus). data_status: deferred-corpus.

Preconditions: ≥100 simulated flights worth of frames from AerialVL + Mavic + AerialExtreMatch with synthetic over-confidence injection (1.5×–3×).

Steps:

Step Consumer Action Expected System Response
1 Replay all flights per-tile geo-misalignment captured
2 Compute P(misalign > 30 m) and P(misalign > 100 m)

Expected outcome: P(>30 m) < 1 %, P(>100 m) < 0.1 %. Max execution time: 4 h.


FT-P-36: AC-NEW-9 covariance calibration accuracy

Traces to: AC-NEW-9, F-T18. Tier: T2.

Preconditions: AerialVL S03 replay with ground truth.

Steps:

Step Consumer Action Expected System Response
1 For each emitted GPS_INPUT, capture (h_acc, ground-truth error) series of pairs
2 Compute fraction of frames where error ≤ h_acc * Mahalanobis-2D-95% factor fraction

Expected outcome: fraction ≥ 95 % (calibration neither over- nor under-claims). Max execution time: 90 min.


FT-P-37: F-T18 calibrator regression (no state propagation)

Traces to: AC-NEW-9, F-T18. Tier: T2.

Preconditions: replay with logging hooks on Component 5 outputs (publicly exposed counters).

Steps:

Step Consumer Action Expected System Response
1 Run replay calibrator counters emitted
2 Assert state_propagation_invocations_total == 0 no propagation
3 Assert mahalanobis_gate_rejections_total > 0 gate active

Expected outcome: as above. Max execution time: 90 min.


Negative Scenarios

FT-N-01: Corrupted nav-cam frame — no crash, degraded mode

Traces to: AC-3.x (resilience), restriction "fixed downward camera". Tier: T1.

Input data: a 60-frame replay with frame N replaced by a 10-byte random blob.

Steps:

Step Consumer Action Expected System Response
1 Stream the replay SUT logs decode error; emits STATUSTEXT WARN
2 Inspect tracking_state transitions to DEGRADED for 1 frame; recovers to NORMAL on next valid frame
3 SUT process does NOT crash

Expected outcome: process alive; no GPS_INPUT spike with bad data; tracking_state returns to NORMAL within 1 frame of recovery. Max execution time: 30 s.


FT-N-02: Object-localize invalid pixel

Traces to: AC-7.1, results_report row 28. Tier: T1.

Steps:

Step Consumer Action Expected System Response
1 POST /objects/locate with pixel_x = -10 (out of frame) HTTP 422 + error body

Expected outcome: status == 422; body contains a structured error code. Max execution time: 1 s.


FT-N-03: Unauthenticated POST /sessions

Traces to: results_report row 33, security restrictions. Tier: T1.

Steps: POST /sessions without JWT → 401.

Expected outcome: status == 401. Max execution time: 1 s.


FT-N-04: Stale tile beyond grace — must NOT label satellite_anchored

Traces to: AC-8.2, AC-NEW-6. Tier: T1.

Preconditions: stale_tile_scenarios with 18-month-old active-conflict tile (well past 6 mo + 30-day grace).

Steps:

Step Consumer Action Expected System Response
1 Replay frames whose only candidate tile is the 18 mo stale one matcher invocation skipped or scored 0
2 Inspect source label on emitted GPS_INPUT NEVER satellite_anchored
3 Inspect WARN STATUSTEXT tile rejected event recorded

Expected outcome: no satellite_anchored label across the run; rejection event recorded. Max execution time: 60 s.


FT-N-05: Stale tile in 30-day grace — confidence linearly decayed

Traces to: AC-NEW-6. Tier: T1.

Preconditions: tiles aged at +0, +15, +30 days past the 6-mo budget.

Steps:

Step Consumer Action Expected System Response
1 Replay each confidence weight in sidecar metric: 1.0, 0.5, 0.0

Expected outcome: confidence weight decays linearly as specified. Max execution time: 60 s.


FT-N-06: Sharp-turn negative — must trigger satellite re-loc, not silently fail

Traces to: AC-3.2 (negative case). Tier: T1.

Steps: same as FT-P-14 but assert that before re-loc the SUT emits a STATUSTEXT explaining VO loss; assert tracking_state transitions through DEGRADED.

Expected outcome: explicit STATUSTEXT log; recovery within 3 frames. Max execution time: 60 s.


FT-N-07: 3-consecutive-failure → RELOC_REQ on STATUSTEXT

Traces to: AC-3.4, results_report rows 20, 46. Tier: T1.

Steps: see FT-P-08; additionally verify the regex RELOC_REQ:.*last_lat=.*last_lon=.*uncertainty=.*m.

Expected outcome: regex matches at least one STATUSTEXT after 3 failures; emitted within 2 s of the third failure (per AC-3.4 timing). Max execution time: 60 s.


FT-N-08: Re-loc waiting state behaviour

Traces to: AC-3.4, results_report row 21. Tier: T1.

Steps:

Step Consumer Action Expected System Response
1 After RELOC_REQ, observe SUT for 10 s fix_type == 0 GPS_INPUT continues; IMU-prediction-only label; satellite-match attempts continue (counter increments)

Expected outcome: as above; SUT does NOT stop emitting GPS_INPUT. Max execution time: 30 s.


FT-N-09: Operator hint — used as 500 m seed

Traces to: AC-6.2, AC-3.4, results_report row 22. Tier: T1.

Preconditions: SUT in re-loc-waiting; operator hint scenario active.

Steps:

Step Consumer Action Expected System Response
1 qgc-mock sends STATUSTEXT RELOC_HINT: lat=… lon=… sigma=500m SUT consumes hint; uses as seed for VPR/cross-view
2 First fix after hint error ≤ 500 m initially
3 After next satellite match error ≤ 50 m

Expected outcome: as above. Max execution time: 60 s.


FT-N-10: Operator hint — malformed value rejected

Traces to: AC-6.2 (negative). Tier: T1.

Steps: send RELOC_HINT: lat=NaN lon=… sigma=-10.

Expected outcome: SUT emits STATUSTEXT WARN; hint NOT applied; pipeline state unchanged. Max execution time: 30 s.


FT-N-11: AC-4.3 — ODOMETRY intentionally absent in v1

Traces to: AC-4.3 (v1-scope clause), F-T9 Option A. Tier: T1.

Preconditions: SUT configured for v1 (default).

Steps:

Step Consumer Action Expected System Response
1 Run any of FT-P-01 / FT-P-04 / FT-P-T2 GPS_INPUT emitted
2 At qgc-mock, count ODOMETRY frames over the run == 0
3 Inspect EK3_SRC1_* configuration via FC parameter readback POSXY=GPS, VELXY=GPS, YAW=GPS+Compass

Expected outcome: ODOMETRY count == 0; FC parameters as configured. Max execution time: 60 s.


FT-N-12: Spoofed GPS — SUT promotes within 3 s

Traces to: AC-NEW-2, F-T12. Tier: T3 (deferred-sitl). data_status: deferred-sitl.

Preconditions: SITL + gps-spoof-injector configured.

Steps:

Step Consumer Action Expected System Response
1 At t=0, inject a malicious GPS_RAW_INT with a 1 km offset FC sees both spoof + SUT GPS_INPUT
2 Time from spoof onset to SUT promoting its GPS_INPUT to primary (raised fix_type=3 AND STATUSTEXT promotion event) t_promote
3 Repeat 50× distribution

Expected outcome: 95th percentile of t_promote < 3 s. Max execution time: 30 min.


FT-N-13: Failsafe at 3 s no-fix (AC-5.2)

Traces to: AC-5.2. Tier: T1+T3.

Preconditions: scripted scenario where SUT cannot produce ANY estimate for 3.5 s.

Steps:

Step Consumer Action Expected System Response
1 Force pipeline blackout SUT logs failure
2 Verify FC behaviour ArduPilot SITL logs fall-back to IMU-only dead reckoning

Expected outcome: failsafe transition observable in EKF_STATUS_REPORT within 4 s of blackout. Max execution time: 60 s.


Traces to: restrictions §Sensors. Tier: T3.

Steps:

Step Consumer Action Expected System Response
1 Send a GPS_INPUT with invalid signing tag from the runner FC rejects
2 Inspect FC log + STATUSTEXT to GCS rejection event recorded

Expected outcome: rejected; FC continues to fly on prior valid source. Max execution time: 30 s.


FT-N-15: SITL F-T9 Option A regression — EKF3 fuses GPS_INPUT only, no double-fusion

Traces to: AC-4.3, F-T9 Option A. Tier: T3.

Preconditions: SITL with EK3_SRC1_*=GPS+Compass, EK3_SRC2_*=GPS.

Steps:

Step Consumer Action Expected System Response
1 Run a representative AerialVL replay EKF3 fuses GPS_INPUT
2 Inspect EKF3 logs for double-fusion symptoms (issues #30076, #32506) none
3 Trigger backup-GPS failover via SITL parameter EKF3 switches to EK3_SRC2_* cleanly

Expected outcome: no double-fusion; clean failover. Max execution time: 30 min.


FT-N-16: SITL F-T9 Option B regression (v1.1 candidate)

Traces to: AC-4.3 Option B (v1.1+), F-T9. Tier: T3 (deferred-sitl). data_status: deferred-sitl.

Preconditions: SITL with PR #30080-class build; SUT switched to ODOMETRY-primary mode (build flag).

Steps: ODOMETRY primary; GPS_INPUT held in reserve; verify clean source-switching, no double-fusion.

Expected outcome: as above. Test runs but build flag is OFF for v1 release gate. Max execution time: 30 min.


FT-N-17: AC-NEW-7 single-flight tile NOT promoted to trusted basemap

Traces to: AC-NEW-7 (Service-side voting). Tier: T1 (service-stub).

Steps:

Step Consumer Action Expected System Response
1 Run a single-flight upload to candidate pool tile recorded as trust_level=candidate
2 Query service-stub for promoted basemap content tile NOT in promoted basemap

Expected outcome: as above; promotion only after N≥2 voting flights confirm. Max execution time: 5 min.


FT-N-18: AC-8.5 — raw frames are NOT retained in FDR

Traces to: AC-8.5, AC-NEW-3. Tier: T1.

Preconditions: replay 60-frame slice; nav_cam_60_slice written to camera input.

Steps:

Step Consumer Action Expected System Response
1 Run the replay FDR populates
2 Inspect /probe/fdr/ for raw nav-cam frames no JPEGs / no AI-cam frames
3 Inspect for thumbnail log of failed-tile-generation frames present, ≤0.1 Hz, within FDR cap

Expected outcome: no raw frames retained; only the failure thumbnail log within budget. Max execution time: 60 s.


FT-N-19: Free public Sentinel-2 tile rejected at cache boundary

Traces to: AC-8.1 (resolution floor), restrictions §Satellite. Tier: T1.

Steps:

Step Consumer Action Expected System Response
1 Inject a synthetic tile at 10 m/px into the cache cache index marks as below-resolution
2 Replay frames over that area matcher does NOT use the tile; never emits satellite_anchored from it

Expected outcome: as above. Max execution time: 60 s.


FT-N-20: Photo-count cap removed — system runs without arbitrary cap

Traces to: restrictions §UAV ("no photo-count cap"). Tier: T1 (smoke).

Steps:

Step Consumer Action Expected System Response
1 Replay synthetic_8h_load for 30 min SUT continues operating
2 Inspect logs for any "photo count exceeded" condition none

Expected outcome: no cap-related condition; pipeline degrades only against FDR cap (AC-NEW-3) and tile-cache cap. Max execution time: 35 min.


FT-N-21: AC-7.1 in maneuvering flight — uncertainty bound published, not a fixed number

Traces to: AC-7.1 second clause. Tier: T1.

Steps: like FT-P-33 but assert that for bank ∈ {0°, 5°, 15°, 30°}, the published bank_pitch_bound_m matches altitude × |sin(bank)| within 1 m.

Expected outcome: bound published correctly across the range. Max execution time: 30 s.


Coverage notes

  • Pipeline-correctness boundary: T1 tests on the 60-image slice are NOT deployment-binding. AC-1.1, AC-1.2, AC-2.1, AC-2.2, AC-1.3, AC-NEW-8 deployment numbers come from T2 (FT-P-T2, FT-P-04 binding split, FT-P-10 binding, FT-P-11 binding).
  • Behavioural-shape tests: FT-P-08, FT-P-15, FT-P-16, FT-P-18, FT-N-04, FT-N-11, FT-N-12, FT-N-13, FT-N-14, FT-N-17, FT-N-18, FT-N-19 use the behavioural shape (trigger + observable + quantifiable verdict) — no input-data input/output mapping required.
  • Untraced tests: none. Every test traces to ≥1 AC or restriction.