Co-authored-by: Cursor <cursoragent@cursor.com>
9.4 KiB
Blackbox Tests
Positive Scenarios
FT-P-01: Still-Image Frame Center Geolocation
Summary: Validate that the system estimates WGS84 frame centers for the provided 60-image nadir dataset.
Traces to: AC-1.1, AC-1.2, AC-6.3, AC-8.1
Category: Position Accuracy
Preconditions:
- Offline satellite cache fixture is available for the sample area.
- Expected results are loaded from
input_data/expected_results/results_report.md.
Input data: project_60_still_images, expected_frame_centers
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Submit AD000001.jpg through AD000060.jpg with height/camera metadata |
System emits one WGS84 estimate per processed image |
| 2 | Compare each estimate to the mapped expected coordinate | Per-frame error is reported in meters |
Expected outcome: At least 80% of images are within 50 m and at least 50% are within 20 m.
Max execution time: 15 minutes for the 60-image replay on the local replay environment.
FT-P-02: Position Confidence Output Contract
Summary: Validate that every emitted position estimate includes confidence and source-label fields required by the public contract.
Traces to: AC-1.3, AC-1.4, AC-4.4, AC-4.5
Category: Position Confidence
Preconditions:
- Same fixture setup as FT-P-01.
Input data: project_60_still_images, expected_frame_centers
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Submit the 60-image replay | System emits estimates frame-by-frame, not batched |
| 2 | Inspect public output fields | Each estimate contains WGS84 coordinate, 95% covariance semi-major axis, source label, and last_satellite_anchor_age_ms |
| 3 | Submit a later correction for a prior frame if available | System emits updated estimate with timestamp and covariance without corrupting newer estimates |
Expected outcome: 100% of emitted estimates include required confidence fields; no horiz_accuracy equivalent under-reports the 95% covariance semi-major axis.
Max execution time: 15 minutes.
FT-P-03: BASALT VIO Replay With Synchronized Video/Telemetry
Summary: Validate that BASALT + safety/anchor wrapper can process synchronized nadir video, IMU, and trajectory telemetry and produce frame-by-frame estimates with honest confidence.
Traces to: AC-1.3, AC-2.1a, AC-2.2, AC-4.1, AC-4.2
Category: VO / IMU Propagation
Preconditions:
- Derkachi replay fixture is mounted from
input_data/flight_derkachi/. flight_derkachi.mp4is readable as cropped nadir video: 880 x 720, 30 fps, approximately 490.07 s.data_imu.csvcontains monotonic 10 HzTime,timestamp(ms),SCALED_IMU2.*, andGLOBAL_POSITION_INT.*fields for 4,900 rows.- Production or Jetson VIO profile is configured for native mode; replay mode is allowed only for explicit development replay checks.
- Camera intrinsics, lens distortion, and camera-to-body transform are either pinned or the run is marked as calibration-limited.
- Public synchronized dataset slice remains useful for calibrated final comparison. Strongest candidates: MUN-FRL, ALTO, EPFL fixed-wing, Kagaru; EuRoC/UZH FPV are proxy-only.
Input data: derkachi_video_telemetry, public_nadir_vio_candidates
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Validate Derkachi video/telemetry alignment | Harness accepts the fixture only if MP4 duration and CSV duration differ by <=250 ms and there are exactly 3 video frames per telemetry row |
| 2 | Replay synchronized video frames and IMU stream | System emits frame-by-frame vo_extrapolated or satellite_anchored estimates without batching |
| 3 | Compare output trajectory to GLOBAL_POSITION_INT lat/lon/alt/heading |
Error, covariance, source label, and anchor age are reported per segment |
| 4 | Compare calibrated public/representative replay against ground truth when available | BASALT + wrapper does not materially under-report uncertainty relative to error |
| 5 | Compare against OpenVINS reference replay when available | BASALT + wrapper does not materially under-report uncertainty relative to error |
| 6 | Start with production VIO profile when the BASALT-compatible runtime is not installed | System reports an explicit native runtime prerequisite error and emits no replay-derived successful VIO state |
| 7 | Start with explicit development replay profile | Replay VIO behavior is available only through the explicit replay profile and cannot satisfy production native-mode checks |
Expected outcome: Derkachi replay is accepted as a synchronized representative fixture and produces continuous estimates for >95% of normal overlapping frames when native prerequisites are available. Missing native runtime prerequisites block production VIO with an explicit error rather than replay success. Absolute geolocation and covariance pass/fail thresholds are calibration-gated until camera intrinsics, distortion, and camera-to-body transform are pinned. For calibrated datasets, VO homography MRE is <1.0 px where homography validation is applicable.
Max execution time: Dataset-dependent, but replay must report per-frame latency.
FT-P-04: Satellite Service And Anchor Verification
Summary: Validate that relocalization uses global retrieval plus local verification and emits only verified satellite anchors.
Traces to: AC-2.1b, AC-2.2, AC-3.2, AC-3.3, AC-8.6
Category: Satellite Anchor
Preconditions:
- AerialVL/ALTO/VPAir-style public dataset slice or project satellite-cache fixture is available.
- VPR chunks and descriptors are precomputed.
Input data: Public aerial localization slice, cache fixture
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Trigger cold-start or relocalization query | System searches CPU FAISS top-K chunks |
| 2 | Present top-K candidates to local verification | System runs ALIKED/DISK+LightGlue and RANSAC |
| 3 | Inspect emitted anchor decision | Accepted anchors include source label, MRE, inlier count, covariance, and tile provenance |
Expected outcome: Cross-domain satellite-anchor MRE is <2.5 px for accepted anchors; rejected candidates do not produce satellite_anchored estimates.
Max execution time: Must be measured as part of performance tests.
Negative Scenarios
FT-N-01: Repetitive Or Low-Texture Imagery
Summary: Validate that visually ambiguous images do not produce confident false satellite anchors.
Traces to: AC-1.4, AC-3.1, AC-NEW-4, AC-8.6
Category: False Position Prevention
Input data: Repetitive agricultural or low-texture frames from project/public data.
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Submit ambiguous frame or sequence | System either emits degraded vo_extrapolated/dead_reckoned output or rejects low-confidence anchor |
| 2 | Inspect anchor and confidence outputs | No anchor is accepted unless local verification and covariance gates pass |
Expected outcome: 0 confident satellite_anchored outputs for candidates that fail local verification, freshness, or Mahalanobis gates.
Max execution time: 15 minutes per fixture.
FT-N-02: GPS Spoofing During Total Visual Blackout
Summary: Validate that spoofed GPS is not promoted during total camera occlusion/visual blackout and that output degrades honestly before unusable frames reach VIO.
Traces to: AC-3.5, AC-5.2, AC-NEW-2, AC-NEW-8
Category: Spoofing / Blackout
Input data: ArduPilot Plane SITL spoofing trace with camera blackout/total-occlusion frames.
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Start normal replay with trusted visual/satellite anchor | System emits normal estimates |
| 2 | Inject full visual blackout/total occlusion and spoofed GPS_RAW_INT |
Camera gate sets usable_for_vio=false, BASALT is bypassed for occluded frames, and system switches to dead_reckoned within <=1 processed frame or <=400 ms |
| 3 | Continue blackout beyond thresholds | IMU-only covariance grows monotonically; system degrades fix type and emits failsafe status at specified covariance/time thresholds |
Expected outcome: Spoofed GPS is ignored; total occlusion never feeds BASALT as a usable VIO frame; fix_type=0, horiz_accuracy=999.0, and VISUAL_BLACKOUT_FAILSAFE are emitted when covariance >500 m or blackout >30 s.
Max execution time: 10 minutes per SITL scenario.
FT-N-03: Invalid Or Stale Satellite Cache
Summary: Validate cache freshness, integrity, and provenance gates.
Traces to: AC-8.2, AC-8.3, AC-NEW-6, AC-NEW-7
Category: Cache Integrity
Input data: cache_integrity_fixtures
| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay with stale tile manifest | Tile is rejected or down-confidence weighted; no stale tile emits satellite_anchored |
| 2 | Replay with hash-mismatched or unsigned manifest | Cache fixture is rejected and security event is logged |
| 3 | Replay generated tile with weak parent-pose covariance | Tile is not promoted beyond allowed trust level |
Expected outcome: 0 invalid/stale/cache-poisoning fixtures produce trusted anchors or trusted basemap tiles.
Max execution time: 15 minutes.