# Test Specification — Validation Harness

## Acceptance Criteria Traceability

| AC ID | Acceptance Criterion | Test IDs | Coverage |
|-------|---------------------|----------|----------|
| AC-1.1 through AC-1.4 | Position accuracy, drift, confidence | IT-01, AT-01 | Covered |
| AC-2.1a/b, AC-2.2 | VO and satellite registration | IT-02, IT-03 | Covered |
| AC-3.1 through AC-3.5 | Resilience edge cases | IT-04, IT-05 | Covered |
| AC-4.1 through AC-4.5 | Latency, memory, MAVLink streaming | PT-01, IT-06 | Covered |
| AC-5.1 through AC-5.3 | Startup/failsafe/reboot | IT-07 | Covered |
| AC-6.1 through AC-6.3 | QGC/GCS/WGS84 | IT-06 | Covered |
| AC-7.1, AC-7.2 | Object coordinate contract | IT-08 | Covered at system boundary |
| AC-8.1 through AC-8.6 | Offline cache, freshness, tiles, VPR | IT-03, IT-09, ST-01 | Covered |
| AC-NEW-1 through AC-NEW-8 | Cold start, spoofing, FDR, false-position, thermal, freshness, poisoning, blackout | IT-05, IT-07, PT-02, ST-01, AT-02 | Covered |

## Blackbox Tests

### IT-01: Still-Image Accuracy Runner

**Summary**: Verify project still-image replay reports frame-center accuracy.

**Traces to**: AC-1.1, AC-1.2, AC-1.4

**Input data**: Project mapped images and `expected_results/results_report.md`.

**Expected result**: Report includes per-image error, aggregate 50 m/20 m pass rates, covariance, source label, and anchor age.

**Max execution time**: 15 minutes.

---

### IT-02: Synchronized VIO Replay Runner

**Summary**: Verify Derkachi and public/representative synchronized data can drive BASALT/wrapper tests.

**Traces to**: AC-1.3, AC-2.1a, AC-2.2

**Input data**: Derkachi cropped nadir video + telemetry fixture, MUN-FRL preferred slice, or representative synchronized dataset.

**Expected result**: Runner validates fixture alignment, trajectory comparison, VIO registration, latency, and covariance calibration where calibration data supports it.

**Max execution time**: Dataset-dependent.

---

### IT-03: Satellite Anchor Replay Runner

**Summary**: Verify VPR and anchor verification test scenarios are executable.

**Traces to**: AC-2.1b, AC-2.2, AC-8.1, AC-8.2, AC-8.6

**Input data**: ALTO/AerialVL/representative aerial localization fixture plus cache.

**Expected result**: Runner reports retrieval recall, MRE, accepted/rejected anchors, and freshness behavior.

**Max execution time**: Dataset-dependent.

---

### IT-04: Outlier/Sharp-Turn/Disconnected Runner

**Summary**: Verify resilience scenarios are executable and reported.

**Traces to**: AC-3.1, AC-3.2, AC-3.3, AC-3.4

**Input data**: Synthetic and public disconnected-segment fixtures.

**Expected result**: Runner validates relocalization and records degraded-mode timelines.

**Max execution time**: 30 minutes.

---

### IT-05: Blackout And Spoofing Runner

**Summary**: Verify total blackout plus spoofing scenarios can be driven through SITL/replay.

**Traces to**: AC-3.5, AC-NEW-2, AC-NEW-8

**Input data**: Plane SITL spoofing scenario with 5 s, 15 s, and 35 s blackout windows.

**Expected result**: Runner measures <=400 ms mode switch, <3 s promotion, monotonic covariance, and failsafe thresholds.

**Max execution time**: 30 minutes.

---

### IT-06: MAVLink/QGC Contract Runner

**Summary**: Verify MAVLink output and GCS status assertions are automated.

**Traces to**: AC-4.3, AC-4.4, AC-4.5, AC-6.1, AC-6.2, AC-6.3

**Input data**: Plane SITL, QGC observer/log parser, position fixtures.

**Expected result**: Runner validates v1 GPS_INPUT-only output, WGS84 coordinates, status rate, and command ingress.

**Max execution time**: 60 minutes.

---

### IT-07: Startup/Reboot Runner

**Summary**: Verify cold-start and reboot scenarios are measurable.

**Traces to**: AC-5.1, AC-5.2, AC-5.3, AC-NEW-1

**Input data**: 50 cold-start runs and companion reboot trace.

**Expected result**: First valid `GPS_INPUT` <30 s p95; reboot reinitializes from FC state.

**Max execution time**: Runset-dependent.

---

### IT-08: Object Coordinate Contract Runner

**Summary**: Verify AI-camera object coordinate request contract at system boundary.

**Traces to**: AC-7.1, AC-7.2

**Input data**: Frame-center estimate, object pixel/angle fixture, gimbal angle, altitude.

**Expected result**: Output coordinate includes frame-center-consistent accuracy and maneuvering-flight projection error bound.

**Max execution time**: 5 minutes.

---

### IT-09: Tile Manager Runner

**Summary**: Verify cache, generated tiles, and storage tests are executable.

**Traces to**: AC-8.3, AC-8.4, AC-8.5, AC-NEW-6, AC-NEW-7

**Input data**: Cache integrity fixtures, generated tile scenarios, PostGIS manifest.

**Expected result**: Runner validates cache load, tile write gates, no raw-frame retention, stale rejection, and poisoning budget evidence.

**Max execution time**: Dataset-dependent.

## Performance Tests

### PT-01: End-To-End Release Gate Runner

**Summary**: Verify performance and resource tests can run in the proper environment.

**Traces to**: AC-4.1, AC-4.2, AC-NEW-5

**Load scenario**:
- Environments: replay, Jetson hardware, SITL.
- Duration: smoke, nightly, and release-gate profiles.

| Metric | Target | Failure Threshold |
|--------|--------|-------------------|
| End-to-end p95 | <400 ms | >=400 ms |
| Memory | <8 GB | >=8 GB |
| Thermal throttle | 0 events in release gate | Any throttle event |

---

### PT-02: FDR/Storage Runner

**Summary**: Verify 8-hour storage/endurance test orchestration.

**Traces to**: AC-NEW-3

| Metric | Target | Failure Threshold |
|--------|--------|-------------------|
| FDR cap | <=64 GB | >64 GB |
| Rollover logging | Complete | Missing rollover event |

## Security Tests

### ST-01: Security Fixture Runner

**Summary**: Verify stale/tampered cache, spoofed MAVLink, and false-anchor scenarios are automated.

**Traces to**: AC-NEW-4, AC-NEW-6, AC-NEW-7

**Attack vector**: Cache tampering, stale imagery, spoofed GPS, impossible anchors.

**Test procedure**:
1. Load each security fixture.
2. Run scenario through public runtime interfaces.
3. Validate output labels, FDR, and rejection reasons.

**Expected behavior**: No tampered/stale/spoofed input produces a trusted false fix.

**Pass criteria**: 0 accepted unsafe anchors or spoofed GPS promotions outside gates.

## Acceptance Tests

### AT-01: Traceability Completeness Report

**Summary**: Verify every AC has executable or explicitly blocked test coverage.

**Traces to**: All ACs

| Step | Action | Expected Result |
|------|--------|-----------------|
| 1 | Read traceability matrix | All ACs mapped to tests |
| 2 | Run fixture validation | Missing public/representative data is reported as blocked, not passed |

---

### AT-02: Release Evidence Bundle

**Summary**: Verify release evidence can be assembled.

**Traces to**: AC-NEW-1 through AC-NEW-8

| Step | Action | Expected Result |
|------|--------|-----------------|
| 1 | Run release profile | Reports, tlogs, FDR summaries, cache reports are produced |
| 2 | Collate artifacts | Bundle contains pass/fail status and residual blockers |

## Test Data Management

| Data Set | Description | Source | Size |
|----------|-------------|--------|------|
| `project_60_still_images` | Frame-center geolocation smoke | Project data | Project size |
| `public_dataset_slices` | MUN-FRL/ALTO/Kagaru/EPFL/AerialVL as licensed | Public pinned fixtures | Dataset-dependent |
| `sitl_scenarios` | Plane spoofing/failsafe traces | Generated | Small |
| `security_fixtures` | Stale/tampered/cache poisoning cases | Generated | Small |

**Setup procedure**: Create isolated run directory, restore PostgreSQL schema, mount fixtures read-only, and start requested environment.

**Teardown procedure**: Stop environments, archive reports, drop run schema, and delete temp volumes.

**Data isolation strategy**: Unique run ID, schema, ports, cache staging directory, and FDR directory per scenario.