Files
gps-denied-onboard/_docs/02_document/tests/blackbox-tests.md
T

7.7 KiB

Blackbox Tests

Positive Scenarios

FT-P-01: Still-Image Frame Center Geolocation

Summary: Validate that the system estimates WGS84 frame centers for the provided 60-image nadir dataset.

Traces to: AC-1.1, AC-1.2, AC-6.3, AC-8.1

Category: Position Accuracy

Preconditions:

  • Offline satellite cache fixture is available for the sample area.
  • Expected results are loaded from input_data/expected_results/results_report.md.

Input data: project_60_still_images, expected_frame_centers

Step Consumer Action Expected System Response
1 Submit AD000001.jpg through AD000060.jpg with height/camera metadata System emits one WGS84 estimate per processed image
2 Compare each estimate to the mapped expected coordinate Per-frame error is reported in meters

Expected outcome: At least 80% of images are within 50 m and at least 50% are within 20 m.

Max execution time: 15 minutes for the 60-image replay on the local replay environment.


FT-P-02: Position Confidence Output Contract

Summary: Validate that every emitted position estimate includes confidence and source-label fields required by the public contract.

Traces to: AC-1.3, AC-1.4, AC-4.4, AC-4.5

Category: Position Confidence

Preconditions:

  • Same fixture setup as FT-P-01.

Input data: project_60_still_images, expected_frame_centers

Step Consumer Action Expected System Response
1 Submit the 60-image replay System emits estimates frame-by-frame, not batched
2 Inspect public output fields Each estimate contains WGS84 coordinate, 95% covariance semi-major axis, source label, and last_satellite_anchor_age_ms
3 Submit a later correction for a prior frame if available System emits updated estimate with timestamp and covariance without corrupting newer estimates

Expected outcome: 100% of emitted estimates include required confidence fields; no horiz_accuracy equivalent under-reports the 95% covariance semi-major axis.

Max execution time: 15 minutes.


FT-P-03: BASALT VIO Replay With Public Synchronized Data

Summary: Validate that BASALT + safety/anchor wrapper can process synchronized camera/IMU data and produce trajectory estimates with calibrated confidence.

Traces to: AC-1.3, AC-2.1a, AC-2.2, AC-4.1, AC-4.2

Category: VO / IMU Propagation

Preconditions:

  • Public synchronized dataset slice is pinned during implementation. Strongest candidates: MUN-FRL, ALTO, EPFL fixed-wing, Kagaru; EuRoC/UZH FPV are proxy-only.
  • Ground-truth trajectory or frame poses are available.

Input data: public_nadir_vio_candidates

Step Consumer Action Expected System Response
1 Replay synchronized camera and IMU stream System emits frame-by-frame vo_extrapolated or satellite_anchored estimates
2 Compare output trajectory to dataset ground truth Error and covariance calibration are reported per segment
3 Compare against OpenVINS reference replay BASALT + wrapper does not materially under-report uncertainty relative to error

Expected outcome: VO registration succeeds for >95% of normal overlapping frames in dataset-supported normal segments; VO homography MRE is <1.0 px where homography validation is applicable.

Max execution time: Dataset-dependent, but replay must report per-frame latency.


FT-P-04: Satellite Retrieval And Anchor Verification

Summary: Validate that relocalization uses global retrieval plus local verification and emits only verified satellite anchors.

Traces to: AC-2.1b, AC-2.2, AC-3.2, AC-3.3, AC-8.6

Category: Satellite Anchor

Preconditions:

  • AerialVL/ALTO/VPAir-style public dataset slice or project satellite-cache fixture is available.
  • VPR chunks and descriptors are precomputed.

Input data: Public aerial localization slice, cache fixture

Step Consumer Action Expected System Response
1 Trigger cold-start or relocalization query System searches CPU FAISS top-K chunks
2 Present top-K candidates to local verification System runs ALIKED/DISK+LightGlue and RANSAC
3 Inspect emitted anchor decision Accepted anchors include source label, MRE, inlier count, covariance, and tile provenance

Expected outcome: Cross-domain satellite-anchor MRE is <2.5 px for accepted anchors; rejected candidates do not produce satellite_anchored estimates.

Max execution time: Must be measured as part of performance tests.

Negative Scenarios

FT-N-01: Repetitive Or Low-Texture Imagery

Summary: Validate that visually ambiguous images do not produce confident false satellite anchors.

Traces to: AC-1.4, AC-3.1, AC-NEW-4, AC-8.6

Category: False Position Prevention

Input data: Repetitive agricultural or low-texture frames from project/public data.

Step Consumer Action Expected System Response
1 Submit ambiguous frame or sequence System either emits degraded vo_extrapolated/dead_reckoned output or rejects low-confidence anchor
2 Inspect anchor and confidence outputs No anchor is accepted unless local verification and covariance gates pass

Expected outcome: 0 confident satellite_anchored outputs for candidates that fail local verification, freshness, or Mahalanobis gates.

Max execution time: 15 minutes per fixture.


FT-N-02: GPS Spoofing During Total Visual Blackout

Summary: Validate that spoofed GPS is not promoted during total camera occlusion/visual blackout and that output degrades honestly before unusable frames reach VIO.

Traces to: AC-3.5, AC-5.2, AC-NEW-2, AC-NEW-8

Category: Spoofing / Blackout

Input data: ArduPilot Plane SITL spoofing trace with camera blackout/total-occlusion frames.

Step Consumer Action Expected System Response
1 Start normal replay with trusted visual/satellite anchor System emits normal estimates
2 Inject full visual blackout/total occlusion and spoofed GPS_RAW_INT Camera gate sets usable_for_vio=false, BASALT is bypassed for occluded frames, and system switches to dead_reckoned within <=1 processed frame or <=400 ms
3 Continue blackout beyond thresholds IMU-only covariance grows monotonically; system degrades fix type and emits failsafe status at specified covariance/time thresholds

Expected outcome: Spoofed GPS is ignored; total occlusion never feeds BASALT as a usable VIO frame; fix_type=0, horiz_accuracy=999.0, and VISUAL_BLACKOUT_FAILSAFE are emitted when covariance >500 m or blackout >30 s.

Max execution time: 10 minutes per SITL scenario.


FT-N-03: Invalid Or Stale Satellite Cache

Summary: Validate cache freshness, integrity, and provenance gates.

Traces to: AC-8.2, AC-8.3, AC-NEW-6, AC-NEW-7

Category: Cache Integrity

Input data: cache_integrity_fixtures

Step Consumer Action Expected System Response
1 Replay with stale tile manifest Tile is rejected or down-confidence weighted; no stale tile emits satellite_anchored
2 Replay with hash-mismatched or unsigned manifest Cache fixture is rejected and security event is logged
3 Replay generated tile with weak parent-pose covariance Tile is not promoted beyond allowed trust level

Expected outcome: 0 invalid/stale/cache-poisoning fixtures produce trusted anchors or trusted basemap tiles.

Max execution time: 15 minutes.