36 KiB
Blackbox Tests
Tier markers (per
environment.md):pipeline(T1),deferred-corpus(T2),deferred-sitl(T3),deferred-hil(T4),deferred-field(T5). Every test pairs an input/observable with a quantifiable expected result from_docs/00_problem/input_data/expected_results/results_report.mdor directly from an AC. All tests run through the public interfaces defined inenvironment.md. No SUT-internal access.
Positive Scenarios
FT-P-01: 60-frame sequential pipeline — ≥80 % within 50 m (pipeline-correctness only)
Summary: Sequentially feed the 60 nav-cam JPGs through the SUT and verify the position-error CDF on this corpus.
Traces to: AC-1.1 (pipeline-correctness only — see test-data.md caveat), results_report row 1.
Category: Position Accuracy. Tier: T1 (pipeline).
Preconditions:
nav_cam_60_slicemounted;nav_cam_60_slice_imusynthesised;satellite_tiles_AD0000xx_z20placeholder fixture present.- SUT booted; cuVSLAM warmed; ArduPilot SITL loaded with the corresponding IMU replay; first valid GPS_INPUT received (i.e., AC-NEW-1 cleared).
Input data: nav_cam_60_slice + coordinates.csv + nav_cam_60_slice_imu.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Stream the 60 JPGs at 3 fps via the camera-input shim into the SUT | SUT publishes GPS_INPUT for each frame |
| 2 | Capture each GPS_INPUT.lat / lon at the qgc-mock sniffer |
All frames produce a frame within the test window |
| 3 | Compute haversine error vs coordinates.csv ground truth per frame |
Per-frame errors collected into a CDF |
Expected outcome: ≥80 % of frames have error < 50 m. Reported as pipeline-functional, not deployment-binding (per test-data.md caveat — deployment-binding number from FT-P-T2 / AerialVL).
Max execution time: 60 s per run.
FT-P-02: 60-frame sequential pipeline — ≥50 % within 20 m (pipeline-correctness only)
Summary: Same corpus as FT-P-01; tighter tolerance. Traces to: AC-1.2 (pipeline-correctness only), results_report row 2. Category: Position Accuracy. Tier: T1.
Preconditions / Input data: same as FT-P-01.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay the corpus end-to-end | per-frame GPS_INPUT |
| 2 | Compute haversine error CDF | — |
Expected outcome: ≥50 % within 20 m on the 60-image slice (functional check). Deployment-binding number comes from AerialVL S03 in FT-P-T2. Max execution time: 60 s.
FT-P-T2: AC-1.1 / AC-1.2 deployment-binding accuracy on AerialVL S03
Summary: Re-run AC-1.1 / AC-1.2 on the deployment-binding corpus.
Traces to: AC-1.1, AC-1.2. Tier: T2 (deferred-corpus). data_status: deferred-corpus.
Preconditions: aerialvl_s03 mounted with synced IMU + nav-cam stream + GPS truth.
Input data: AerialVL S03.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay AerialVL S03 70 km of fixed-wing flight at 1 km AGL | per-frame GPS_INPUT |
| 2 | Compute error CDF vs S03 GPS truth | — |
Expected outcome: ≥80 % within 50 m AND ≥50 % within 20 m (deployment-binding). Max execution time: 90 min (replay + analysis).
FT-P-03: Per-frame error bound ≤100 m
Summary: No single frame exceeds 100 m error on the 60-image slice. Traces to: AC-1.1 (negative-tail bound), results_report row 3. Tier: T1.
Preconditions / Input: same as FT-P-01.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay 60 frames | per-frame GPS_INPUT |
| 2 | Compute max(haversine_err) over all frames | — |
Expected outcome: max error ≤ 100 m. Pipeline-functional only. Max execution time: 60 s.
FT-P-04: VO drift bound between satellite anchors
Summary: VO drift between successive satellite-anchored fixes stays bounded. Traces to: AC-1.3, AC-NEW-8, results_report row 4, F-T1b. Tier: T1 functional + T2 binding.
Preconditions: cuVSLAM in mono+IMU mode (T1) AND mono-only mode (T2 split test).
Input data: nav_cam_60_slice (T1) + AerialVL S03 (T2).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Identify successive satellite_anchored source-label transitions |
series of anchor pairs |
| 2 | For each anchor pair, measure VO-extrapolated centre vs next-anchor centre | drift in metres |
| 3 | Compute 95th percentile across all pairs | — |
Expected outcome:
- mono+IMU: p95 drift ≤ 50 m (binding on T2 / AerialVL).
- mono-only: p95 drift ≤ 100 m (binding on T2 / AerialVL).
- T1 functional check: drift bounded (no monotonic growth) — exact numbers not deployment-binding.
Max execution time: 90 min (T2).
FT-P-05: GPS_INPUT shape under normal tracking
Summary: GPS_INPUT messages emitted while tracking is healthy carry the correct schema and value ranges. Traces to: AC-1.4, AC-4.3, AC-6.3, results_report row 5. Tier: T1.
Preconditions: SUT in steady-state tracking with recent satellite anchor (<30 s old).
Input data: any single frame from nav_cam_60_slice.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Sniff GPS_INPUT at qgc-mock | one frame per nav-cam frame |
| 2 | Decode fields: fix_type, horiz_accuracy, satellites_visible, lat, lon, alt, vel_acc |
as per MAVLink GPS_INPUT spec |
| 3 | Inspect optional ODOMETRY: assert intentional absence in v1 (per AC-4.3 v1-scope clause) | no ODOMETRY frames present |
Expected outcome: fix_type == 3, horiz_accuracy ∈ [1, 50] m, satellites_visible == 10, lat / lon non-null, WGS84. ODOMETRY count == 0 across the run.
Max execution time: 30 s.
FT-P-06: GPS_INPUT shape during VO-only fallback
Summary: Fields adapt when no satellite anchor is available for >30 s. Traces to: AC-1.4, AC-4.3, results_report row 6. Tier: T1.
Preconditions: Force satellite-match failure for >30 s (cache poisoned with stale tiles).
Input data: nav_cam_60_slice with stale_tile_scenarios injected.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | After 30 s of failed matches, sniff GPS_INPUT | fix_type == 3, horiz_accuracy ∈ [20, 100] m, source-label vo_extrapolated |
Expected outcome: as above; horiz_accuracy grows monotonically until next successful match. Max execution time: 60 s.
FT-P-07: GPS_INPUT shape during dead-reckoning
Summary: VO lost AND no satellite → IMU-only dead reckoning. Traces to: AC-1.4, AC-5.2, results_report row 7. Tier: T1.
Preconditions: Inject cuVSLAM tracking-loss + cache poisoned.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Sniff GPS_INPUT | fix_type == 2, horiz_accuracy ≥ 50 m and growing |
| 2 | Source label | dead_reckoned |
Expected outcome: fix_type == 2, monotonically growing horiz_accuracy, source == dead_reckoned.
Max execution time: 60 s.
FT-P-08: GPS_INPUT shape on total failure
Summary: 3+ consecutive failures — system signals total failure. Traces to: AC-3.4, results_report row 8. Tier: T1.
Preconditions: cache_poisoning_scenarios flavour that causes 3 sat failures + cuVSLAM lost.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Wait for 3 consecutive failures | GPS_INPUT continues at the configured rate |
| 2 | Inspect GPS_INPUT | fix_type == 0, horiz_accuracy == 999.0 |
| 3 | Inspect STATUSTEXT | RELOC_REQ regex emitted |
Expected outcome: as above. Max execution time: 60 s.
FT-P-09: Confidence tier transitions
Summary: Confidence tier label transitions match defined conditions. Traces to: AC-1.4, results_report rows 10–13. Tier: T1.
Preconditions: scripted scenario that walks (HIGH → MEDIUM → LOW → FAILED).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | At each scripted state, read the SSE stream confidence field AND the source-label field | matches expected tier |
Expected outcome:
- Sat anchor <30 s + cov <400 m² → tier
HIGH, sourcesatellite_anchored. - cuVSLAM OK + no sat >30 s → tier
MEDIUM, sourcevo_extrapolated. - cuVSLAM lost + IMU only → tier
LOW, sourcedead_reckoned. - 3+ consecutive failures → tier
FAILED, fix_type 0.
Max execution time: 5 min.
FT-P-10: Image registration rate (functional)
Summary: Pipeline registers ≥95 % of normal-flight frames against the previous frame. Traces to: AC-2.1 (pipeline-functional only), results_report row 14. Tier: T1 functional + T2 binding.
Preconditions: SUT exposes registration outcome via STATUSTEXT or NAMED_VALUE_FLOAT (reg_pass_count, reg_total_count).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay nav_cam_60_slice (T1) or AerialVL S03 (T2) |
registration metrics published |
| 2 | Compute reg_pass_count / reg_total_count |
percentage |
Expected outcome: T1 ≥95 % (functional); T2 ≥95 % (deployment-binding) under normal-flight definition (nadir, ±10° bank/pitch, ≥40 % overlap, daytime, season-matched tile). Max execution time: 60 s (T1) / 90 min (T2).
FT-P-11: Mean Reprojection Error (MRE)
Summary: VO and cross-domain MRE under thresholds. Traces to: AC-2.2, results_report row 15. Tier: T1 functional + T2 binding.
Preconditions: SUT publishes mre_vo (frame-to-frame) and mre_cross (cross-view) on the metrics endpoint.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Scrape MRE metrics over a replay | per-frame samples |
| 2 | Compute mean across the run | — |
Expected outcome: mean(mre_vo) < 1.0 px; mean(mre_cross) < 2.5 px. T1 numbers functional only.
Max execution time: 60 s (T1) / 90 min (T2).
FT-P-12: Continuous output through turn area (frames 32–43)
Summary: SUT keeps producing position estimates through the turn segment of coordinates.csv.
Traces to: AC-3.2, AC-4.4, results_report row 16. Tier: T1.
Preconditions: standard pipeline replay.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay frames 32–43 | per-frame GPS_INPUT |
| 2 | Count outputs vs frames | — |
Expected outcome: ≥1 GPS_INPUT per nav-cam frame in the turn region. Max execution time: 30 s.
FT-P-13: 350 m outlier handled (AC-3.1)
Summary: Pipeline survives a synthetic 350 m gap between consecutive frames (caused by ±20° tilt outlier). Traces to: AC-3.1, results_report row 17. Tier: T1.
Input data: synthetic two-frame pair with 350 m gap injected into nav_cam_60_slice mid-replay.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Inject the outlier pair | SUT emits a vo_extrapolated or dead_reckoned frame, not corrupted output |
| 2 | Continue with next valid frame | error returns to ≤100 m within next valid frame |
Expected outcome: error ≤ 100 m on the next valid frame after the outlier. Max execution time: 60 s.
FT-P-14: Sharp-turn re-localization (AC-3.2)
Summary: Sharp turn (<5 % overlap, <70°, <200 m drift) — VO fails, satellite re-loc recovers. Traces to: AC-3.2, F-T7, results_report row 18. Tier: T1.
Input data: synthetic sharp-turn pair injected into nav_cam_60_slice.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Inject the sharp-turn pair | cuVSLAM tracking lost; VPR triggers; matcher re-localizes |
| 2 | Track error over next 3 frames | error ≤ 50 m within 3 frames |
Expected outcome: error ≤ 50 m within 3 frames of the turn. Max execution time: 60 s.
FT-P-15: VO loss → satellite recovery → tracking_state == NORMAL
Summary: After cuVSLAM tracking loss + sat match success, tracking_state returns to NORMAL. Traces to: AC-3.2, AC-3.3, results_report row 19. Tier: T1.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Force cuVSLAM tracking-loss; deliver a fresh tile that matches | matcher emits absolute pose; calibrator emits satellite-anchored fix |
| 2 | Observe FC EKF3 reconvergence via EKF_STATUS_REPORT |
EKF3 reconverges |
| 3 | Read SUT-published tracking_state |
== NORMAL |
Expected outcome: tracking_state == NORMAL within bounded time. Max execution time: 60 s.
FT-P-16: Cold-start TTFF ≤30 s p95
Summary: From companion-computer boot, first valid GPS_INPUT within 30 s. Traces to: AC-NEW-1, results_report row 23, F-T11. Tier: T1 statistical (≤10 boots) + T4 binding (50 boots on real HW).
Preconditions: SUT image cold (no warmed engines); FC providing GLOBAL_POSITION_INT simulating IMU-extrapolated pose.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Boot SUT container | container start logged |
| 2 | Time from container start to first valid fix_type==3 GPS_INPUT |
t_ttff |
| 3 | Repeat N times (N=10 T1 / N=50 T4) | distribution |
Expected outcome: 95th percentile of t_ttff ≤ 30 s. Max execution time: 10 min (T1) / 30 min (T4).
FT-P-17: Validate initial position via first satellite match
Summary: First satellite match after startup pulls position to ≤50 m. Traces to: AC-5.1, AC-NEW-1, results_report row 24. Tier: T1.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Provide GLOBAL_POSITION_INT with a deliberate 200 m offset from truth |
SUT seeds pipeline with 200 m uncertainty |
| 2 | Replay first frame with valid satellite tile | matcher succeeds; calibrator emits anchored fix |
| 3 | Read GPS_INPUT lat/lon | error ≤ 50 m |
Expected outcome: position error ≤ 50 m after first match. Max execution time: 90 s.
FT-P-18: Mid-flight reboot recovery ≤30 s
Summary: Process kill mid-flight; SUT recovers within AC-NEW-1 budget. Traces to: AC-5.3, AC-NEW-1, results_report row 25. Tier: T1.
Preconditions: SUT in steady-state tracking; FC continues to fly.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Send SIGKILL to SUT container | SUT restarts |
| 2 | Time from restart to first fix_type==3 GPS_INPUT |
t_recovery |
Expected outcome: t_recovery ≤ 30 s. Max execution time: 60 s.
FT-P-19: Post-reboot first-match accuracy
Summary: After reboot, first satellite match restores accuracy. Traces to: AC-5.3, results_report row 26. Tier: T1.
Steps: same as FT-P-17 but starting from a reboot.
Expected outcome: error ≤ 50 m after first match. Max execution time: 90 s.
FT-P-20: Object localization (level flight)
Summary: POST /objects/locate returns lat/lon for an object pixel given known UAV pose.
Traces to: AC-7.1, AC-7.2, results_report row 27. Tier: T1.
Preconditions: SUT has a known anchored fix; AI camera gimbal pose injected via FC ATTITUDE.
Input data: pixel coordinates + gimbal angle + zoom + altitude in request body.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | POST /objects/locate with pixel_x, pixel_y, gimbal_pitch, gimbal_yaw, zoom, altitude |
200 + JSON {lat, lon, alt, accuracy_m, confidence} |
| 2 | Compare to ground truth | error ≤ accuracy_m |
Expected outcome: lat/lon within accuracy_m of ground truth; in level flight, accuracy_m consistent with frame-center accuracy of the GPS-Denied system. In maneuvering flight, response includes the altitude × |sin(unknown_bank_or_pitch)| bound (AC-7.1 second clause) when bank/pitch >5°.
Max execution time: 5 s.
FT-P-21: Coordinate transform round-trip ≤0.1 m
Summary: GPS → NED → pixel → GPS round-trip preserves position. Traces to: AC-6.3, AC-7.2, results_report row 29. Tier: T1.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Submit a known WGS84 point through the round-trip via /objects/locate (or a debug endpoint if exposed) |
round-trip lat/lon |
| 2 | Compare to original | ≤ 0.1 m |
Expected outcome: round-trip error ≤ 0.1 m. Max execution time: 1 s.
FT-P-22: GET /health schema and content
Summary: Health endpoint returns 200 with required fields. Traces to: AC-6.1 (telemetry), results_report row 30. Tier: T1.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | GET /health |
HTTP 200, JSON body |
| 2 | Validate schema | contains status, memory_mb, gpu_temp_c, tracking_state, last_anchor_age_s, confidence_tier |
Expected outcome: as above; status ∈ {ok, degraded, failed}.
Max execution time: 1 s.
FT-P-23: POST /sessions returns id
Traces to: AC-6.1, results_report row 31. Tier: T1.
Steps: POST /sessions (auth) → 200/201 with session id.
Expected outcome: status ∈ {200, 201}; body has session_id matching ^[a-f0-9-]{36}$.
Max execution time: 1 s.
FT-P-24: SSE stream emits per-second events
Traces to: AC-6.1, results_report row 32. Tier: T1.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | GET /sessions/{id}/stream |
SSE connection; events emitted at ~1 Hz |
| 2 | Sample 30 s of stream | each event matches schema: type, timestamp, lat, lon, alt, accuracy_h, confidence, vo_status |
Expected outcome: rate 1 Hz ± 0.2 Hz; all events conform to schema. Max execution time: 35 s.
FT-P-25: TRT engine load ≤10 s
Traces to: AC-NEW-1 (sub-budget), results_report row 39. Tier: T1 (synthetic timing) + T4 (real HW).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | At SUT boot, time from container start to "all engines ready" STATUSTEXT | t_engines |
Expected outcome: t_engines ≤ 10 s. Max execution time: 30 s.
FT-P-26: Tile storage size for the operational area
Traces to: AC-8.3, restrictions §UAV/Satellite, results_report row 40. Tier: T1.
Preconditions: a 200 km mission path × ±2 km buffer × z=18 + z=20 fixture loaded.
Steps: read total bytes under /probe/tiles/.
Expected outcome: 300 MB ≤ size ≤ 1000 MB. (Aligned with restriction's ~10 GB persistent cap for full 400 km².) Max execution time: 5 s.
FT-P-27: Tile mosaic coverage radius ≥500 m
Traces to: AC-8.3, results_report row 41. Tier: T1.
Preconditions: SUT given EKF position with σ_xy.
Steps: capture the assembled mosaic bbox via STATUSTEXT or a debug endpoint.
Expected outcome: mosaic radius ≥ 500 m around current position. Max execution time: 5 s.
FT-P-28: Tile dedup — ≤1 onboard tile per ground sector
Traces to: AC-8.4, F-T2. Tier: T1.
Preconditions: tile_dedup_replay (sectors visited ≥2×).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay the flight | onboard tiles written |
| 2 | Inspect MBTiles + sidecar in /probe/tiles/ |
per-sector tile count |
Expected outcome: per-sector count ≤ 1; latest/highest-quality wins. Max execution time: 10 min.
FT-P-29: Post-flight upload to candidate pool
Traces to: AC-8.4, F-T3. Tier: T1.
Preconditions: service-stub running.
Steps: replay → on landing-event, SUT uploads tiles.
Expected outcome: service-stub records ≥1 tile with trust_level=candidate; promotion only after N≥2 voting flights (so a single flight does not promote).
Max execution time: 5 min.
FT-P-30: NAMED_VALUE_FLOAT telemetry rate
Traces to: AC-6.1, results_report row 45. Tier: T1.
Steps: sniff gps_conf, gps_drift, gps_hacc NAMED_VALUE_FLOAT rates over 30 s.
Expected outcome: each at 1 Hz ± 0.2 Hz. Max execution time: 35 s.
FT-P-31: Disconnected segments — ≥3 connected via global retrieval
Traces to: AC-3.3, F-T8. Tier: T1.
Preconditions: disconnected_segments_replay with ≥3 segments.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay each segment with a synthetic gap | for each segment, VPR retrieves top-K candidates; matcher relocalizes |
| 2 | Verify segment-to-segment trajectory continuity | each segment connects to prior trajectory |
Expected outcome: 3/3 segments connect within 10 frames of segment start; tracking_state == NORMAL after each.
Max execution time: 5 min.
FT-P-32: Position refinement / corrections (AC-4.5)
Traces to: AC-4.5. Tier: T1.
Preconditions: SUT in steady state; ability to refine prior fixes.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Capture sequence of GPS_INPUT for a 10-s window | per-frame fixes |
| 2 | After delayed loop closure / late satellite match, observe whether SUT emits a corrected fix or signals correction via STATUSTEXT | a follow-up GPS_INPUT for an earlier time_usec OR a STATUSTEXT correction record |
Expected outcome: at least one correction event where the corrected fix replaces the prior fix's h_acc (covariance shrinks). System never silently rewrites past output without recording the correction.
Max execution time: 60 s.
FT-P-33: Object-localize bank/pitch >5° publishes uncertainty bound
Traces to: AC-7.1 (second clause). Tier: T1.
Preconditions: FC ATTITUDE published with bank > 5°.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | POST /objects/locate while bank > 5° |
response includes bound = altitude × abs(sin(bank_or_pitch)) |
Expected outcome: response body includes bank_pitch_bound_m matching the formula within 1 m.
Max execution time: 5 s.
FT-P-34: Mid-flight tile generation respects σ_xy ≤ 5 m hard gate
Traces to: AC-8.4, AC-NEW-7 hard gate. Tier: T1.
Preconditions: scripted scenarios with σ_xy ∈ {2, 4, 6, 8} m.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay σ_xy=2 m frames | tiles written |
| 2 | Replay σ_xy=8 m frames | NO tiles written |
| 3 | Inspect sidecar trust_level for σ_xy ∈ (3, 5] m |
trust_level == soft |
| 4 | Inspect sidecar for σ_xy ≤ 3 m | trust_level == candidate |
Expected outcome: as above. Max execution time: 5 min.
FT-P-35: NF-T4b cache-poisoning safety budget (Monte Carlo)
Traces to: AC-NEW-7. Tier: T2 (deferred-corpus). data_status: deferred-corpus.
Preconditions: ≥100 simulated flights worth of frames from AerialVL + Mavic + AerialExtreMatch with synthetic over-confidence injection (1.5×–3×).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay all flights | per-tile geo-misalignment captured |
| 2 | Compute P(misalign > 30 m) and P(misalign > 100 m) | — |
Expected outcome: P(>30 m) < 1 %, P(>100 m) < 0.1 %. Max execution time: 4 h.
FT-P-36: AC-NEW-9 covariance calibration accuracy
Traces to: AC-NEW-9, F-T18. Tier: T2.
Preconditions: AerialVL S03 replay with ground truth.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | For each emitted GPS_INPUT, capture (h_acc, ground-truth error) |
series of pairs |
| 2 | Compute fraction of frames where error ≤ h_acc * Mahalanobis-2D-95% factor |
fraction |
Expected outcome: fraction ≥ 95 % (calibration neither over- nor under-claims). Max execution time: 90 min.
FT-P-37: F-T18 calibrator regression (no state propagation)
Traces to: AC-NEW-9, F-T18. Tier: T2.
Preconditions: replay with logging hooks on Component 5 outputs (publicly exposed counters).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Run replay | calibrator counters emitted |
| 2 | Assert state_propagation_invocations_total == 0 |
no propagation |
| 3 | Assert mahalanobis_gate_rejections_total > 0 |
gate active |
Expected outcome: as above. Max execution time: 90 min.
Negative Scenarios
FT-N-01: Corrupted nav-cam frame — no crash, degraded mode
Traces to: AC-3.x (resilience), restriction "fixed downward camera". Tier: T1.
Input data: a 60-frame replay with frame N replaced by a 10-byte random blob.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Stream the replay | SUT logs decode error; emits STATUSTEXT WARN |
| 2 | Inspect tracking_state | transitions to DEGRADED for 1 frame; recovers to NORMAL on next valid frame |
| 3 | SUT process | does NOT crash |
Expected outcome: process alive; no GPS_INPUT spike with bad data; tracking_state returns to NORMAL within 1 frame of recovery. Max execution time: 30 s.
FT-N-02: Object-localize invalid pixel
Traces to: AC-7.1, results_report row 28. Tier: T1.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | POST /objects/locate with pixel_x = -10 (out of frame) |
HTTP 422 + error body |
Expected outcome: status == 422; body contains a structured error code. Max execution time: 1 s.
FT-N-03: Unauthenticated POST /sessions
Traces to: results_report row 33, security restrictions. Tier: T1.
Steps: POST /sessions without JWT → 401.
Expected outcome: status == 401. Max execution time: 1 s.
FT-N-04: Stale tile beyond grace — must NOT label satellite_anchored
Traces to: AC-8.2, AC-NEW-6. Tier: T1.
Preconditions: stale_tile_scenarios with 18-month-old active-conflict tile (well past 6 mo + 30-day grace).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay frames whose only candidate tile is the 18 mo stale one | matcher invocation skipped or scored 0 |
| 2 | Inspect source label on emitted GPS_INPUT | NEVER satellite_anchored |
| 3 | Inspect WARN STATUSTEXT | tile rejected event recorded |
Expected outcome: no satellite_anchored label across the run; rejection event recorded.
Max execution time: 60 s.
FT-N-05: Stale tile in 30-day grace — confidence linearly decayed
Traces to: AC-NEW-6. Tier: T1.
Preconditions: tiles aged at +0, +15, +30 days past the 6-mo budget.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay each | confidence weight in sidecar metric: 1.0, 0.5, 0.0 |
Expected outcome: confidence weight decays linearly as specified. Max execution time: 60 s.
FT-N-06: Sharp-turn negative — must trigger satellite re-loc, not silently fail
Traces to: AC-3.2 (negative case). Tier: T1.
Steps: same as FT-P-14 but assert that before re-loc the SUT emits a STATUSTEXT explaining VO loss; assert tracking_state transitions through DEGRADED.
Expected outcome: explicit STATUSTEXT log; recovery within 3 frames. Max execution time: 60 s.
FT-N-07: 3-consecutive-failure → RELOC_REQ on STATUSTEXT
Traces to: AC-3.4, results_report rows 20, 46. Tier: T1.
Steps: see FT-P-08; additionally verify the regex RELOC_REQ:.*last_lat=.*last_lon=.*uncertainty=.*m.
Expected outcome: regex matches at least one STATUSTEXT after 3 failures; emitted within 2 s of the third failure (per AC-3.4 timing). Max execution time: 60 s.
FT-N-08: Re-loc waiting state behaviour
Traces to: AC-3.4, results_report row 21. Tier: T1.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | After RELOC_REQ, observe SUT for 10 s | fix_type == 0 GPS_INPUT continues; IMU-prediction-only label; satellite-match attempts continue (counter increments) |
Expected outcome: as above; SUT does NOT stop emitting GPS_INPUT. Max execution time: 30 s.
FT-N-09: Operator hint — used as 500 m seed
Traces to: AC-6.2, AC-3.4, results_report row 22. Tier: T1.
Preconditions: SUT in re-loc-waiting; operator hint scenario active.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | qgc-mock sends STATUSTEXT RELOC_HINT: lat=… lon=… sigma=500m |
SUT consumes hint; uses as seed for VPR/cross-view |
| 2 | First fix after hint | error ≤ 500 m initially |
| 3 | After next satellite match | error ≤ 50 m |
Expected outcome: as above. Max execution time: 60 s.
FT-N-10: Operator hint — malformed value rejected
Traces to: AC-6.2 (negative). Tier: T1.
Steps: send RELOC_HINT: lat=NaN lon=… sigma=-10.
Expected outcome: SUT emits STATUSTEXT WARN; hint NOT applied; pipeline state unchanged. Max execution time: 30 s.
FT-N-11: AC-4.3 — ODOMETRY intentionally absent in v1
Traces to: AC-4.3 (v1-scope clause), F-T9 Option A. Tier: T1.
Preconditions: SUT configured for v1 (default).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Run any of FT-P-01 / FT-P-04 / FT-P-T2 | GPS_INPUT emitted |
| 2 | At qgc-mock, count ODOMETRY frames over the run | == 0 |
| 3 | Inspect EK3_SRC1_* configuration via FC parameter readback |
POSXY=GPS, VELXY=GPS, YAW=GPS+Compass |
Expected outcome: ODOMETRY count == 0; FC parameters as configured. Max execution time: 60 s.
FT-N-12: Spoofed GPS — SUT promotes within 3 s
Traces to: AC-NEW-2, F-T12. Tier: T3 (deferred-sitl). data_status: deferred-sitl.
Preconditions: SITL + gps-spoof-injector configured.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | At t=0, inject a malicious GPS_RAW_INT with a 1 km offset |
FC sees both spoof + SUT GPS_INPUT |
| 2 | Time from spoof onset to SUT promoting its GPS_INPUT to primary (raised fix_type=3 AND STATUSTEXT promotion event) |
t_promote |
| 3 | Repeat 50× | distribution |
Expected outcome: 95th percentile of t_promote < 3 s. Max execution time: 30 min.
FT-N-13: Failsafe at 3 s no-fix (AC-5.2)
Traces to: AC-5.2. Tier: T1+T3.
Preconditions: scripted scenario where SUT cannot produce ANY estimate for 3.5 s.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Force pipeline blackout | SUT logs failure |
| 2 | Verify FC behaviour | ArduPilot SITL logs fall-back to IMU-only dead reckoning |
Expected outcome: failsafe transition observable in EKF_STATUS_REPORT within 4 s of blackout.
Max execution time: 60 s.
FT-N-14: Refusal of unsigned MAVLink (S-T1 boundary)
Traces to: restrictions §Sensors. Tier: T3.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Send a GPS_INPUT with invalid signing tag from the runner | FC rejects |
| 2 | Inspect FC log + STATUSTEXT to GCS | rejection event recorded |
Expected outcome: rejected; FC continues to fly on prior valid source. Max execution time: 30 s.
FT-N-15: SITL F-T9 Option A regression — EKF3 fuses GPS_INPUT only, no double-fusion
Traces to: AC-4.3, F-T9 Option A. Tier: T3.
Preconditions: SITL with EK3_SRC1_*=GPS+Compass, EK3_SRC2_*=GPS.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Run a representative AerialVL replay | EKF3 fuses GPS_INPUT |
| 2 | Inspect EKF3 logs for double-fusion symptoms (issues #30076, #32506) | none |
| 3 | Trigger backup-GPS failover via SITL parameter | EKF3 switches to EK3_SRC2_* cleanly |
Expected outcome: no double-fusion; clean failover. Max execution time: 30 min.
FT-N-16: SITL F-T9 Option B regression (v1.1 candidate)
Traces to: AC-4.3 Option B (v1.1+), F-T9. Tier: T3 (deferred-sitl). data_status: deferred-sitl.
Preconditions: SITL with PR #30080-class build; SUT switched to ODOMETRY-primary mode (build flag).
Steps: ODOMETRY primary; GPS_INPUT held in reserve; verify clean source-switching, no double-fusion.
Expected outcome: as above. Test runs but build flag is OFF for v1 release gate. Max execution time: 30 min.
FT-N-17: AC-NEW-7 single-flight tile NOT promoted to trusted basemap
Traces to: AC-NEW-7 (Service-side voting). Tier: T1 (service-stub).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Run a single-flight upload to candidate pool | tile recorded as trust_level=candidate |
| 2 | Query service-stub for promoted basemap content |
tile NOT in promoted basemap |
Expected outcome: as above; promotion only after N≥2 voting flights confirm. Max execution time: 5 min.
FT-N-18: AC-8.5 — raw frames are NOT retained in FDR
Traces to: AC-8.5, AC-NEW-3. Tier: T1.
Preconditions: replay 60-frame slice; nav_cam_60_slice written to camera input.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Run the replay | FDR populates |
| 2 | Inspect /probe/fdr/ for raw nav-cam frames |
no JPEGs / no AI-cam frames |
| 3 | Inspect for thumbnail log of failed-tile-generation frames | present, ≤0.1 Hz, within FDR cap |
Expected outcome: no raw frames retained; only the failure thumbnail log within budget. Max execution time: 60 s.
FT-N-19: Free public Sentinel-2 tile rejected at cache boundary
Traces to: AC-8.1 (resolution floor), restrictions §Satellite. Tier: T1.
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Inject a synthetic tile at 10 m/px into the cache | cache index marks as below-resolution |
| 2 | Replay frames over that area | matcher does NOT use the tile; never emits satellite_anchored from it |
Expected outcome: as above. Max execution time: 60 s.
FT-N-20: Photo-count cap removed — system runs without arbitrary cap
Traces to: restrictions §UAV ("no photo-count cap"). Tier: T1 (smoke).
Steps:
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay synthetic_8h_load for 30 min |
SUT continues operating |
| 2 | Inspect logs for any "photo count exceeded" condition | none |
Expected outcome: no cap-related condition; pipeline degrades only against FDR cap (AC-NEW-3) and tile-cache cap. Max execution time: 35 min.
FT-N-21: AC-7.1 in maneuvering flight — uncertainty bound published, not a fixed number
Traces to: AC-7.1 second clause. Tier: T1.
Steps: like FT-P-33 but assert that for bank ∈ {0°, 5°, 15°, 30°}, the published bank_pitch_bound_m matches altitude × |sin(bank)| within 1 m.
Expected outcome: bound published correctly across the range. Max execution time: 30 s.
Coverage notes
- Pipeline-correctness boundary: T1 tests on the 60-image slice are NOT deployment-binding. AC-1.1, AC-1.2, AC-2.1, AC-2.2, AC-1.3, AC-NEW-8 deployment numbers come from T2 (FT-P-T2, FT-P-04 binding split, FT-P-10 binding, FT-P-11 binding).
- Behavioural-shape tests: FT-P-08, FT-P-15, FT-P-16, FT-P-18, FT-N-04, FT-N-11, FT-N-12, FT-N-13, FT-N-14, FT-N-17, FT-N-18, FT-N-19 use the behavioural shape (trigger + observable + quantifiable verdict) — no input-data input/output mapping required.
- Untraced tests: none. Every test traces to ≥1 AC or restriction.