Revise acceptance criteria and restrictions documentation to clarify recent updates and specifications. Key changes include enhanced definitions for position accuracy, image processing quality, and operational parameters, as well as updates to camera specifications and validation requirements. This revision aims to improve clarity and ensure alignment with project goals.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-01 16:24:46 +03:00
parent 3f173c1bb7
commit 7e15868d39
62 changed files with 6878 additions and 13 deletions
+168
View File
@@ -0,0 +1,168 @@
# Blackbox Tests
## Positive Scenarios
### FT-P-01: Still-Image Frame Center Geolocation
**Summary**: Validate that the system estimates WGS84 frame centers for the provided 60-image nadir dataset.
**Traces to**: AC-1.1, AC-1.2, AC-6.3, AC-8.1
**Category**: Position Accuracy
**Preconditions**:
- Offline satellite cache fixture is available for the sample area.
- Expected results are loaded from `input_data/expected_results/results_report.md`.
**Input data**: `project_60_still_images`, `expected_frame_centers`
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Submit `AD000001.jpg` through `AD000060.jpg` with height/camera metadata | System emits one WGS84 estimate per processed image |
| 2 | Compare each estimate to the mapped expected coordinate | Per-frame error is reported in meters |
**Expected outcome**: At least 80% of images are within 50 m and at least 50% are within 20 m.
**Max execution time**: 15 minutes for the 60-image replay on the local replay environment.
---
### FT-P-02: Position Confidence Output Contract
**Summary**: Validate that every emitted position estimate includes confidence and source-label fields required by the public contract.
**Traces to**: AC-1.3, AC-1.4, AC-4.4, AC-4.5
**Category**: Position Confidence
**Preconditions**:
- Same fixture setup as FT-P-01.
**Input data**: `project_60_still_images`, `expected_frame_centers`
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Submit the 60-image replay | System emits estimates frame-by-frame, not batched |
| 2 | Inspect public output fields | Each estimate contains WGS84 coordinate, 95% covariance semi-major axis, source label, and `last_satellite_anchor_age_ms` |
| 3 | Submit a later correction for a prior frame if available | System emits updated estimate with timestamp and covariance without corrupting newer estimates |
**Expected outcome**: 100% of emitted estimates include required confidence fields; no `horiz_accuracy` equivalent under-reports the 95% covariance semi-major axis.
**Max execution time**: 15 minutes.
---
### FT-P-03: BASALT VIO Replay With Public Synchronized Data
**Summary**: Validate that BASALT + safety/anchor wrapper can process synchronized camera/IMU data and produce trajectory estimates with calibrated confidence.
**Traces to**: AC-1.3, AC-2.1a, AC-2.2, AC-4.1, AC-4.2
**Category**: VO / IMU Propagation
**Preconditions**:
- Public synchronized dataset slice is pinned during implementation. Strongest candidates: MUN-FRL, ALTO, EPFL fixed-wing, Kagaru; EuRoC/UZH FPV are proxy-only.
- Ground-truth trajectory or frame poses are available.
**Input data**: `public_nadir_vio_candidates`
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Replay synchronized camera and IMU stream | System emits frame-by-frame `vo_extrapolated` or `satellite_anchored` estimates |
| 2 | Compare output trajectory to dataset ground truth | Error and covariance calibration are reported per segment |
| 3 | Compare against OpenVINS reference replay | BASALT + wrapper does not materially under-report uncertainty relative to error |
**Expected outcome**: VO registration succeeds for >95% of normal overlapping frames in dataset-supported normal segments; VO homography MRE is <1.0 px where homography validation is applicable.
**Max execution time**: Dataset-dependent, but replay must report per-frame latency.
---
### FT-P-04: Satellite Retrieval And Anchor Verification
**Summary**: Validate that relocalization uses global retrieval plus local verification and emits only verified satellite anchors.
**Traces to**: AC-2.1b, AC-2.2, AC-3.2, AC-3.3, AC-8.6
**Category**: Satellite Anchor
**Preconditions**:
- AerialVL/ALTO/VPAir-style public dataset slice or project satellite-cache fixture is available.
- VPR chunks and descriptors are precomputed.
**Input data**: Public aerial localization slice, cache fixture
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Trigger cold-start or relocalization query | System searches CPU FAISS top-K chunks |
| 2 | Present top-K candidates to local verification | System runs ALIKED/DISK+LightGlue and RANSAC |
| 3 | Inspect emitted anchor decision | Accepted anchors include source label, MRE, inlier count, covariance, and tile provenance |
**Expected outcome**: Cross-domain satellite-anchor MRE is <2.5 px for accepted anchors; rejected candidates do not produce `satellite_anchored` estimates.
**Max execution time**: Must be measured as part of performance tests.
## Negative Scenarios
### FT-N-01: Repetitive Or Low-Texture Imagery
**Summary**: Validate that visually ambiguous images do not produce confident false satellite anchors.
**Traces to**: AC-1.4, AC-3.1, AC-NEW-4, AC-8.6
**Category**: False Position Prevention
**Input data**: Repetitive agricultural or low-texture frames from project/public data.
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Submit ambiguous frame or sequence | System either emits degraded `vo_extrapolated`/`dead_reckoned` output or rejects low-confidence anchor |
| 2 | Inspect anchor and confidence outputs | No anchor is accepted unless local verification and covariance gates pass |
**Expected outcome**: 0 confident `satellite_anchored` outputs for candidates that fail local verification, freshness, or Mahalanobis gates.
**Max execution time**: 15 minutes per fixture.
---
### FT-N-02: GPS Spoofing During Total Visual Blackout
**Summary**: Validate that spoofed GPS is not promoted during total camera occlusion/visual blackout and that output degrades honestly before unusable frames reach VIO.
**Traces to**: AC-3.5, AC-5.2, AC-NEW-2, AC-NEW-8
**Category**: Spoofing / Blackout
**Input data**: ArduPilot Plane SITL spoofing trace with camera blackout/total-occlusion frames.
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Start normal replay with trusted visual/satellite anchor | System emits normal estimates |
| 2 | Inject full visual blackout/total occlusion and spoofed `GPS_RAW_INT` | Camera gate sets `usable_for_vio=false`, BASALT is bypassed for occluded frames, and system switches to `dead_reckoned` within <=1 processed frame or <=400 ms |
| 3 | Continue blackout beyond thresholds | IMU-only covariance grows monotonically; system degrades fix type and emits failsafe status at specified covariance/time thresholds |
**Expected outcome**: Spoofed GPS is ignored; total occlusion never feeds BASALT as a usable VIO frame; `fix_type=0`, `horiz_accuracy=999.0`, and `VISUAL_BLACKOUT_FAILSAFE` are emitted when covariance >500 m or blackout >30 s.
**Max execution time**: 10 minutes per SITL scenario.
---
### FT-N-03: Invalid Or Stale Satellite Cache
**Summary**: Validate cache freshness, integrity, and provenance gates.
**Traces to**: AC-8.2, AC-8.3, AC-NEW-6, AC-NEW-7
**Category**: Cache Integrity
**Input data**: `cache_integrity_fixtures`
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Replay with stale tile manifest | Tile is rejected or down-confidence weighted; no stale tile emits `satellite_anchored` |
| 2 | Replay with hash-mismatched or unsigned manifest | Cache fixture is rejected and security event is logged |
| 3 | Replay generated tile with weak parent-pose covariance | Tile is not promoted beyond allowed trust level |
**Expected outcome**: 0 invalid/stale/cache-poisoning fixtures produce trusted anchors or trusted basemap tiles.
**Max execution time**: 15 minutes.