Files
gps-denied-onboard/_docs/02_document/components/09_validation_harness/tests.md
T

7.7 KiB

Test Specification — Validation Harness

Acceptance Criteria Traceability

AC ID Acceptance Criterion Test IDs Coverage
AC-1.1 through AC-1.4 Position accuracy, drift, confidence IT-01, AT-01 Covered
AC-2.1a/b, AC-2.2 VO and satellite registration IT-02, IT-03 Covered
AC-3.1 through AC-3.5 Resilience edge cases IT-04, IT-05 Covered
AC-4.1 through AC-4.5 Latency, memory, MAVLink streaming PT-01, IT-06 Covered
AC-5.1 through AC-5.3 Startup/failsafe/reboot IT-07 Covered
AC-6.1 through AC-6.3 QGC/GCS/WGS84 IT-06 Covered
AC-7.1, AC-7.2 Object coordinate contract IT-08 Covered at system boundary
AC-8.1 through AC-8.6 Offline cache, freshness, tiles, VPR IT-03, IT-09, ST-01 Covered
AC-NEW-1 through AC-NEW-8 Cold start, spoofing, FDR, false-position, thermal, freshness, poisoning, blackout IT-05, IT-07, PT-02, ST-01, AT-02 Covered

Blackbox Tests

IT-01: Still-Image Accuracy Runner

Summary: Verify project still-image replay reports frame-center accuracy.

Traces to: AC-1.1, AC-1.2, AC-1.4

Input data: Project mapped images and expected_results/results_report.md.

Expected result: Report includes per-image error, aggregate 50 m/20 m pass rates, covariance, source label, and anchor age.

Max execution time: 15 minutes.


IT-02: Synchronized VIO Replay Runner

Summary: Verify Derkachi and public/representative synchronized data can drive BASALT/wrapper tests.

Traces to: AC-1.3, AC-2.1a, AC-2.2

Input data: Derkachi cropped nadir video + telemetry fixture, MUN-FRL preferred slice, or representative synchronized dataset.

Expected result: Runner validates fixture alignment, trajectory comparison, VIO registration, latency, and covariance calibration where calibration data supports it.

Max execution time: Dataset-dependent.


IT-03: Satellite Anchor Replay Runner

Summary: Verify VPR and anchor verification test scenarios are executable.

Traces to: AC-2.1b, AC-2.2, AC-8.1, AC-8.2, AC-8.6

Input data: ALTO/AerialVL/representative aerial localization fixture plus cache.

Expected result: Runner reports retrieval recall, MRE, accepted/rejected anchors, and freshness behavior.

Max execution time: Dataset-dependent.


IT-04: Outlier/Sharp-Turn/Disconnected Runner

Summary: Verify resilience scenarios are executable and reported.

Traces to: AC-3.1, AC-3.2, AC-3.3, AC-3.4

Input data: Synthetic and public disconnected-segment fixtures.

Expected result: Runner validates relocalization and records degraded-mode timelines.

Max execution time: 30 minutes.


IT-05: Blackout And Spoofing Runner

Summary: Verify total blackout plus spoofing scenarios can be driven through SITL/replay.

Traces to: AC-3.5, AC-NEW-2, AC-NEW-8

Input data: Plane SITL spoofing scenario with 5 s, 15 s, and 35 s blackout windows.

Expected result: Runner measures <=400 ms mode switch, <3 s promotion, monotonic covariance, and failsafe thresholds.

Max execution time: 30 minutes.


IT-06: MAVLink/QGC Contract Runner

Summary: Verify MAVLink output and GCS status assertions are automated.

Traces to: AC-4.3, AC-4.4, AC-4.5, AC-6.1, AC-6.2, AC-6.3

Input data: Plane SITL, QGC observer/log parser, position fixtures.

Expected result: Runner validates v1 GPS_INPUT-only output, WGS84 coordinates, status rate, and command ingress.

Max execution time: 60 minutes.


IT-07: Startup/Reboot Runner

Summary: Verify cold-start and reboot scenarios are measurable.

Traces to: AC-5.1, AC-5.2, AC-5.3, AC-NEW-1

Input data: 50 cold-start runs and companion reboot trace.

Expected result: First valid GPS_INPUT <30 s p95; reboot reinitializes from FC state.

Max execution time: Runset-dependent.


IT-08: Object Coordinate Contract Runner

Summary: Verify AI-camera object coordinate request contract at system boundary.

Traces to: AC-7.1, AC-7.2

Input data: Frame-center estimate, object pixel/angle fixture, gimbal angle, altitude.

Expected result: Output coordinate includes frame-center-consistent accuracy and maneuvering-flight projection error bound.

Max execution time: 5 minutes.


IT-09: Tile Manager Runner

Summary: Verify cache, generated tiles, and storage tests are executable.

Traces to: AC-8.3, AC-8.4, AC-8.5, AC-NEW-6, AC-NEW-7

Input data: Cache integrity fixtures, generated tile scenarios, PostGIS manifest.

Expected result: Runner validates cache load, tile write gates, no raw-frame retention, stale rejection, and poisoning budget evidence.

Max execution time: Dataset-dependent.

Performance Tests

PT-01: End-To-End Release Gate Runner

Summary: Verify performance and resource tests can run in the proper environment.

Traces to: AC-4.1, AC-4.2, AC-NEW-5

Load scenario:

  • Environments: replay, Jetson hardware, SITL.
  • Duration: smoke, nightly, and release-gate profiles.
Metric Target Failure Threshold
End-to-end p95 <400 ms >=400 ms
Memory <8 GB >=8 GB
Thermal throttle 0 events in release gate Any throttle event

PT-02: FDR/Storage Runner

Summary: Verify 8-hour storage/endurance test orchestration.

Traces to: AC-NEW-3

Metric Target Failure Threshold
FDR cap <=64 GB >64 GB
Rollover logging Complete Missing rollover event

Security Tests

ST-01: Security Fixture Runner

Summary: Verify stale/tampered cache, spoofed MAVLink, and false-anchor scenarios are automated.

Traces to: AC-NEW-4, AC-NEW-6, AC-NEW-7

Attack vector: Cache tampering, stale imagery, spoofed GPS, impossible anchors.

Test procedure:

  1. Load each security fixture.
  2. Run scenario through public runtime interfaces.
  3. Validate output labels, FDR, and rejection reasons.

Expected behavior: No tampered/stale/spoofed input produces a trusted false fix.

Pass criteria: 0 accepted unsafe anchors or spoofed GPS promotions outside gates.

Acceptance Tests

AT-01: Traceability Completeness Report

Summary: Verify every AC has executable or explicitly blocked test coverage.

Traces to: All ACs

Step Action Expected Result
1 Read traceability matrix All ACs mapped to tests
2 Run fixture validation Missing public/representative data is reported as blocked, not passed

AT-02: Release Evidence Bundle

Summary: Verify release evidence can be assembled.

Traces to: AC-NEW-1 through AC-NEW-8

Step Action Expected Result
1 Run release profile Reports, tlogs, FDR summaries, cache reports are produced
2 Collate artifacts Bundle contains pass/fail status and residual blockers

Test Data Management

Data Set Description Source Size
project_60_still_images Frame-center geolocation smoke Project data Project size
public_dataset_slices MUN-FRL/ALTO/Kagaru/EPFL/AerialVL as licensed Public pinned fixtures Dataset-dependent
sitl_scenarios Plane spoofing/failsafe traces Generated Small
security_fixtures Stale/tampered/cache poisoning cases Generated Small

Setup procedure: Create isolated run directory, restore PostgreSQL schema, mount fixtures read-only, and start requested environment.

Teardown procedure: Stop environments, archive reports, drop run schema, and delete temp volumes.

Data isolation strategy: Unique run ID, schema, ports, cache staging directory, and FDR directory per scenario.