mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 17:21:13 +00:00
[AZ-234] [AZ-235] [AZ-236] [AZ-237] Add replay tests
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,88 +0,0 @@
|
||||
# Replay Geolocation And Confidence Tests
|
||||
|
||||
**Task**: AZ-234_replay_geolocation_confidence_tests
|
||||
**Name**: Replay Geolocation And Confidence Tests
|
||||
**Description**: Implement blackbox tests for still-image geolocation, confidence/source-label output, and replay latency smoke.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-233_test_infrastructure
|
||||
**Component**: Blackbox Tests
|
||||
**Tracker**: AZ-234
|
||||
**Epic**: AZ-218
|
||||
|
||||
## Problem
|
||||
|
||||
The project needs deterministic blackbox evidence that the 60-image replay path emits WGS84 frame-center estimates with required confidence fields and latency metrics.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Still-image replay reports per-frame coordinate error and aggregate threshold results.
|
||||
- Every emitted estimate includes covariance, source label, and anchor-age fields.
|
||||
- Replay smoke latency and dropped-frame metrics are captured in the shared report format.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- FT-P-01 Still-Image Frame Center Geolocation.
|
||||
- FT-P-02 Position Confidence Output Contract.
|
||||
- NFT-PERF-01 Per-Frame Latency On Project Still Images.
|
||||
- CSV and Markdown evidence output for these scenarios.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Synchronized VIO video/IMU replay.
|
||||
- Satellite-anchor VPR/local matching.
|
||||
- Jetson-only release-gate profiling.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Still-image coordinates are validated**
|
||||
Given the 60-image project fixture and expected frame-center coordinates
|
||||
When the replay test runs
|
||||
Then per-frame WGS84 error is reported and aggregate 50 m / 20 m thresholds are evaluated.
|
||||
|
||||
**AC-2: Confidence output contract is validated**
|
||||
Given emitted position estimates from the replay
|
||||
When the test inspects public output fields
|
||||
Then each estimate includes WGS84 coordinates, 95% covariance semi-major axis, source label, and anchor age.
|
||||
|
||||
**AC-3: Replay latency is measured**
|
||||
Given the still-image replay runs at the configured smoke rate
|
||||
When processing completes
|
||||
Then capture-to-output latency and dropped-frame rate are recorded with pass/fail or blocked status.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Replay smoke evidence includes p50/p95/p99 latency and dropped-frame rate.
|
||||
|
||||
**Reliability**
|
||||
- Missing or invalid expected-coordinate fixtures fail fixture validation before scenario execution.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|--------------|------------------|
|
||||
| AC-1 | Expected-coordinate loader validation | Invalid coordinates are rejected before replay |
|
||||
| AC-2 | Report field validation | Missing confidence/source fields fail the scenario |
|
||||
| AC-3 | Latency metric aggregation | p50/p95/p99 and dropped-frame metrics are emitted |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|-------------------------|--------------|-------------------|----------------|
|
||||
| AC-1 | `project_60_still_images`, `expected_frame_centers` | FT-P-01 | >=80% within 50 m and >=50% within 20 m or explicit failure | Reliability |
|
||||
| AC-2 | Same replay output | FT-P-02 | 100% of emitted estimates include required confidence fields | Reliability |
|
||||
| AC-3 | Replay smoke run | NFT-PERF-01 | Latency and drop-rate metrics are recorded | Performance |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Tests must use public replay input and output artifacts only.
|
||||
- Input fixtures must be mounted read-only.
|
||||
- Blocked prerequisites must be reported as `blocked`, not `passed`.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Calibration limits are mistaken for product failure**
|
||||
- *Risk*: Fixture limits can make absolute accuracy inconclusive.
|
||||
- *Mitigation*: Report the fixture source and threshold basis with each failure.
|
||||
@@ -1,89 +0,0 @@
|
||||
# VIO Replay Performance Tests
|
||||
|
||||
**Task**: AZ-235_vio_replay_performance_tests
|
||||
**Name**: VIO Replay Performance Tests
|
||||
**Description**: Implement synchronized video/IMU replay tests for VIO output, covariance evidence, and replay performance metrics.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-233_test_infrastructure, AZ-240_native_vio_backend_integration
|
||||
**Component**: Blackbox Tests
|
||||
**Tracker**: AZ-235
|
||||
**Epic**: AZ-218
|
||||
|
||||
## Problem
|
||||
|
||||
The runtime needs blackbox evidence that synchronized navigation video and flight-controller telemetry can drive VIO/wrapper output with honest confidence and measurable performance.
|
||||
|
||||
This test task must run after AZ-240 so it validates the real native VIO path rather than the deterministic scaffold.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Derkachi video/telemetry fixture alignment is validated before replay.
|
||||
- Synchronized replay produces frame-by-frame output or a clear blocked/failure reason.
|
||||
- Latency, completion rate, memory, trajectory comparison, and calibration-gated checks are reported.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- FT-P-03 BASALT VIO Replay With Synchronized Video/Telemetry.
|
||||
- NFT-PERF-02 BASALT + Wrapper Replay Latency.
|
||||
- Public/representative dataset prerequisite reporting.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Satellite-anchor local verification.
|
||||
- SITL spoofing/failsafe scenarios.
|
||||
- Thermal/endurance release gates.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Replay fixture alignment is validated**
|
||||
Given the Derkachi MP4 and telemetry CSV
|
||||
When fixture validation runs
|
||||
Then duration, frame-to-telemetry ratio, and timestamp monotonicity are verified before replay.
|
||||
|
||||
**AC-2: Synchronized replay emits estimates**
|
||||
Given a valid synchronized video/IMU replay fixture
|
||||
When replay executes
|
||||
Then estimates are emitted frame-by-frame with source labels, covariance, and segment evidence.
|
||||
|
||||
**AC-3: VIO performance evidence is reported**
|
||||
Given replay completed or blocked
|
||||
When reporting finishes
|
||||
Then latency, completion rate, memory, and calibration/public-dataset prerequisite status are written.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Reports include per-frame latency and memory metrics where the environment can measure them.
|
||||
|
||||
**Reliability**
|
||||
- Calibration-gated absolute accuracy checks must be marked explicitly instead of silently passing.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|--------------|------------------|
|
||||
| AC-1 | Video/telemetry validator | Invalid duration or timestamp alignment blocks replay |
|
||||
| AC-2 | Replay result parser | Missing per-frame confidence fields fail the scenario |
|
||||
| AC-3 | Calibration gate reporting | Missing calibration/public data is reported as blocked |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|-------------------------|--------------|-------------------|----------------|
|
||||
| AC-1 | `derkachi_video_telemetry` | FT-P-03 fixture validation | Fixture accepted only when alignment rules pass | Reliability |
|
||||
| AC-2 | Valid synchronized replay | FT-P-03 output | Continuous estimates for normal overlapping segments or explicit degradation | Reliability |
|
||||
| AC-3 | Replay performance run | NFT-PERF-02 | Latency, completion rate, and memory evidence are recorded | Performance |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Tests must not import BASALT/OpenVINS/Kimera internals directly.
|
||||
- Public/representative datasets are optional prerequisites and may produce blocked results.
|
||||
- Raw input video and telemetry fixtures remain read-only.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Hardware or dataset prerequisites are unavailable**
|
||||
- *Risk*: The scenario cannot produce final accuracy evidence locally.
|
||||
- *Mitigation*: Emit blocked results with exact missing prerequisite and continue other scenario groups.
|
||||
@@ -1,102 +0,0 @@
|
||||
# Satellite Anchor Cache Tests
|
||||
|
||||
**Task**: AZ-236_satellite_anchor_cache_tests
|
||||
**Name**: Satellite Anchor Cache Tests
|
||||
**Description**: Implement blackbox, security, and performance tests for satellite-anchor retrieval, local verification, cache integrity, and no in-flight external access.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-233_test_infrastructure, AZ-241_real_satellite_vpr_descriptor_retrieval, AZ-242_real_anchor_feature_matching_ransac
|
||||
**Component**: Blackbox Tests
|
||||
**Tracker**: AZ-236
|
||||
**Epic**: AZ-218
|
||||
|
||||
## Problem
|
||||
|
||||
Satellite anchors and cache fixtures are safety-critical: invalid, stale, poisoned, or externally fetched data must not become trusted localization output.
|
||||
|
||||
This test task must run after AZ-241 and AZ-242 so it validates real local VPR retrieval and real anchor feature matching rather than scaffold evidence gates.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Accepted anchors include retrieval, matching, geometry, freshness, and provenance evidence.
|
||||
- Invalid/stale/poisoned cache fixtures cannot produce trusted anchors or trusted generated tiles.
|
||||
- No in-flight Satellite Service or provider access occurs when cache data is missing.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- FT-P-04 Satellite Service And Anchor Verification.
|
||||
- FT-N-01 Repetitive Or Low-Texture Imagery.
|
||||
- FT-N-03 Invalid Or Stale Satellite Cache.
|
||||
- NFT-PERF-03 Relocalization Trigger Path Latency.
|
||||
- NFT-RES-04 Tile Cache Freshness Degradation.
|
||||
- NFT-SEC-01 Signed Cache Manifest Enforcement.
|
||||
- NFT-SEC-02 Cache Poisoning Write Gate.
|
||||
- NFT-SEC-04 No In-Flight Satellite Provider Access.
|
||||
- NFT-RES-LIM-03 Satellite Cache Storage Budget.
|
||||
|
||||
### Excluded
|
||||
|
||||
- VIO synchronized replay.
|
||||
- MAVLink spoofing/failsafe behavior.
|
||||
- Jetson thermal endurance.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Verified anchors include evidence**
|
||||
Given a valid local cache/index fixture and relocalization trigger
|
||||
When retrieval and verification run
|
||||
Then accepted anchors include candidate IDs, scores, MRE, inliers, covariance, and tile provenance.
|
||||
|
||||
**AC-2: Unsafe candidates are rejected**
|
||||
Given low-texture, stale, unsigned, hash-mismatched, or low-resolution fixtures
|
||||
When anchor/cache tests run
|
||||
Then no invalid candidate emits a trusted `satellite_anchored` estimate or trusted generated tile.
|
||||
|
||||
**AC-3: No in-flight external access occurs**
|
||||
Given flight-mode replay with missing cache data
|
||||
When relocalization is requested
|
||||
Then the system reports degraded/no-candidate behavior without satellite-provider or Suite service network calls.
|
||||
|
||||
**AC-4: Cache and trigger-path metrics are reported**
|
||||
Given cache and relocalization scenarios complete
|
||||
When reporting finishes
|
||||
Then latency, MRE, trust level, freshness, and storage-budget evidence are written.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Security**
|
||||
- Invalid cache data must not be trusted or promoted.
|
||||
|
||||
**Performance**
|
||||
- Trigger-path latency and bounded top-K behavior are measured.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|--------------|------------------|
|
||||
| AC-1 | Anchor evidence parser | Required evidence fields are present |
|
||||
| AC-2 | Invalid cache fixture generator | Stale/unsigned/hash-mismatched fixtures are produced deterministically |
|
||||
| AC-3 | Network-block assertion | Unexpected external calls fail the scenario |
|
||||
| AC-4 | Cache metrics report | Latency, freshness, and storage metrics are present |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|-------------------------|--------------|-------------------|----------------|
|
||||
| AC-1 | Public/cache fixture | FT-P-04 | Accepted anchors meet MRE/evidence requirements | Performance |
|
||||
| AC-2 | Ambiguous and invalid cache fixtures | FT-N-01, FT-N-03, NFT-SEC-01, NFT-SEC-02 | 0 unsafe trusted outputs | Security |
|
||||
| AC-3 | Network-blocked flight-mode replay | NFT-SEC-04 | Missing cache causes degraded behavior, not fetch | Security |
|
||||
| AC-4 | Relocalization/cache runs | NFT-PERF-03, NFT-RES-04, NFT-RES-LIM-03 | Metrics and storage evidence are recorded | Performance |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Tests must use local preloaded cache/index fixtures only.
|
||||
- External network access during flight-mode scenarios is a failure.
|
||||
- VPAir and UZH FPV licensing must be respected before use as commercial acceptance evidence.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Dataset licensing blocks final anchor evidence**
|
||||
- *Risk*: Public dataset terms prevent commercial acceptance use.
|
||||
- *Mitigation*: Mark dataset-specific checks blocked and keep generated cache fixtures for deterministic security coverage.
|
||||
@@ -1,94 +0,0 @@
|
||||
# MAVLink Blackout Spoofing Tests
|
||||
|
||||
**Task**: AZ-237_mavlink_blackout_spoofing_tests
|
||||
**Name**: MAVLink Blackout Spoofing Tests
|
||||
**Description**: Implement SITL/replay tests for visual blackout, spoofed GPS, MAVLink source validation, degraded covariance, no-fix thresholds, and QGC status.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-233_test_infrastructure
|
||||
**Component**: Blackbox Tests
|
||||
**Tracker**: AZ-237
|
||||
**Epic**: AZ-218
|
||||
|
||||
## Problem
|
||||
|
||||
The system must prove that spoofed GPS and unauthorized MAVLink messages cannot override estimator state during visual blackout or degraded operation.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Blackout and spoofing traces drive visible degraded-mode transitions.
|
||||
- Covariance, `GPS_INPUT`, QGC status, and FDR evidence match the safety thresholds.
|
||||
- Unauthorized MAVLink sources are rejected and recorded.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- FT-N-02 GPS Spoofing During Total Visual Blackout.
|
||||
- NFT-RES-01 Total Visual Blackout With GPS Spoofing.
|
||||
- NFT-SEC-03 MAVLink Source And Spoofing Rejection.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Still-image geolocation accuracy.
|
||||
- Satellite-anchor cache poisoning.
|
||||
- Cold-start and restart trials.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Blackout transitions to dead reckoning**
|
||||
Given a replay/SITL trace with total camera blackout and spoofed GPS
|
||||
When the scenario runs
|
||||
Then the system enters `dead_reckoned` mode within the required frame or timing threshold.
|
||||
|
||||
**AC-2: Degraded output thresholds are enforced**
|
||||
Given blackout continues beyond configured thresholds
|
||||
When estimates are emitted
|
||||
Then covariance grows monotonically and `GPS_INPUT` fields degrade to no-fix/failsafe values at the specified limits.
|
||||
|
||||
**AC-3: Spoofed or unauthorized MAVLink inputs are rejected**
|
||||
Given spoofed real-GPS measurements or unauthorized MAVLink source IDs
|
||||
When messages arrive during normal or blackout operation
|
||||
Then no confident position estimate is produced from those inputs.
|
||||
|
||||
**AC-4: Operator and FDR evidence is visible**
|
||||
Given degraded-mode transitions occur
|
||||
When reporting completes
|
||||
Then QGC status and FDR evidence show promotion, demotion, blackout, and failsafe events at expected rates.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Safety**
|
||||
- Spoofed GPS must not be promoted during blackout without the documented recovery gates.
|
||||
|
||||
**Reliability**
|
||||
- Missing SITL prerequisites are reported as blocked with exact setup evidence.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|--------------|------------------|
|
||||
| AC-1 | Scenario trigger builder | Blackout and spoofing events are generated deterministically |
|
||||
| AC-2 | Threshold assertion logic | Fix type, covariance, and `horiz_accuracy` thresholds are checked |
|
||||
| AC-3 | MAVLink source filter assertion | Unauthorized source messages fail the scenario |
|
||||
| AC-4 | Status/FDR parser | Expected status events and rates are validated |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|-------------------------|--------------|-------------------|----------------|
|
||||
| AC-1 | SITL or replay spoofing trace | FT-N-02, NFT-RES-01 | Dead-reckoned transition within timing threshold | Safety |
|
||||
| AC-2 | Continued blackout | FT-N-02, NFT-RES-01 | Monotonic covariance and no-fix/failsafe fields | Safety |
|
||||
| AC-3 | Unauthorized/spoofed MAVLink messages | NFT-SEC-03 | No confident estimate from bad source | Safety |
|
||||
| AC-4 | QGC/FDR outputs | FT-N-02, NFT-SEC-03 | Status and evidence are visible and rate-limited | Reliability |
|
||||
|
||||
## Constraints
|
||||
|
||||
- ArduPilot Plane SITL is the authoritative autopilot target.
|
||||
- v1 asserts `GPS_INPUT` output and intentional absence of ODOMETRY.
|
||||
- Tests must not depend on Mission Planner or PX4 behavior.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: SITL setup varies by environment**
|
||||
- *Risk*: Local runs may not have SITL installed or configured.
|
||||
- *Mitigation*: Report blocked prerequisites clearly and keep replay-level assertions runnable where possible.
|
||||
Reference in New Issue
Block a user