# Test Specification — Validation Harness ## Acceptance Criteria Traceability | AC ID | Acceptance Criterion | Test IDs | Coverage | |-------|---------------------|----------|----------| | AC-1.1 through AC-1.4 | Position accuracy, drift, confidence | IT-01, AT-01 | Covered | | AC-2.1a/b, AC-2.2 | VO and satellite registration | IT-02, IT-03 | Covered | | AC-3.1 through AC-3.5 | Resilience edge cases | IT-04, IT-05 | Covered | | AC-4.1 through AC-4.5 | Latency, memory, MAVLink streaming | PT-01, IT-06 | Covered | | AC-5.1 through AC-5.3 | Startup/failsafe/reboot | IT-07 | Covered | | AC-6.1 through AC-6.3 | QGC/GCS/WGS84 | IT-06 | Covered | | AC-7.1, AC-7.2 | Object coordinate contract | IT-08 | Covered at system boundary | | AC-8.1 through AC-8.6 | Offline cache, freshness, tiles, VPR | IT-03, IT-09, ST-01 | Covered | | AC-NEW-1 through AC-NEW-8 | Cold start, spoofing, FDR, false-position, thermal, freshness, poisoning, blackout | IT-05, IT-07, PT-02, ST-01, AT-02 | Covered | ## Blackbox Tests ### IT-01: Still-Image Accuracy Runner **Summary**: Verify project still-image replay reports frame-center accuracy. **Traces to**: AC-1.1, AC-1.2, AC-1.4 **Input data**: Project mapped images and `expected_results/results_report.md`. **Expected result**: Report includes per-image error, aggregate 50 m/20 m pass rates, covariance, source label, and anchor age. **Max execution time**: 15 minutes. --- ### IT-02: Synchronized VIO Replay Runner **Summary**: Verify Derkachi and public/representative synchronized data can drive BASALT/wrapper tests. **Traces to**: AC-1.3, AC-2.1a, AC-2.2 **Input data**: Derkachi cropped nadir video + telemetry fixture, MUN-FRL preferred slice, or representative synchronized dataset. **Expected result**: Runner validates fixture alignment, trajectory comparison, VIO registration, latency, and covariance calibration where calibration data supports it. **Max execution time**: Dataset-dependent. --- ### IT-03: Satellite Anchor Replay Runner **Summary**: Verify VPR and anchor verification test scenarios are executable. **Traces to**: AC-2.1b, AC-2.2, AC-8.1, AC-8.2, AC-8.6 **Input data**: ALTO/AerialVL/representative aerial localization fixture plus cache. **Expected result**: Runner reports retrieval recall, MRE, accepted/rejected anchors, and freshness behavior. **Max execution time**: Dataset-dependent. --- ### IT-04: Outlier/Sharp-Turn/Disconnected Runner **Summary**: Verify resilience scenarios are executable and reported. **Traces to**: AC-3.1, AC-3.2, AC-3.3, AC-3.4 **Input data**: Synthetic and public disconnected-segment fixtures. **Expected result**: Runner validates relocalization and records degraded-mode timelines. **Max execution time**: 30 minutes. --- ### IT-05: Blackout And Spoofing Runner **Summary**: Verify total blackout plus spoofing scenarios can be driven through SITL/replay. **Traces to**: AC-3.5, AC-NEW-2, AC-NEW-8 **Input data**: Plane SITL spoofing scenario with 5 s, 15 s, and 35 s blackout windows. **Expected result**: Runner measures <=400 ms mode switch, <3 s promotion, monotonic covariance, and failsafe thresholds. **Max execution time**: 30 minutes. --- ### IT-06: MAVLink/QGC Contract Runner **Summary**: Verify MAVLink output and GCS status assertions are automated. **Traces to**: AC-4.3, AC-4.4, AC-4.5, AC-6.1, AC-6.2, AC-6.3 **Input data**: Plane SITL, QGC observer/log parser, position fixtures. **Expected result**: Runner validates v1 GPS_INPUT-only output, WGS84 coordinates, status rate, and command ingress. **Max execution time**: 60 minutes. --- ### IT-07: Startup/Reboot Runner **Summary**: Verify cold-start and reboot scenarios are measurable. **Traces to**: AC-5.1, AC-5.2, AC-5.3, AC-NEW-1 **Input data**: 50 cold-start runs and companion reboot trace. **Expected result**: First valid `GPS_INPUT` <30 s p95; reboot reinitializes from FC state. **Max execution time**: Runset-dependent. --- ### IT-08: Object Coordinate Contract Runner **Summary**: Verify AI-camera object coordinate request contract at system boundary. **Traces to**: AC-7.1, AC-7.2 **Input data**: Frame-center estimate, object pixel/angle fixture, gimbal angle, altitude. **Expected result**: Output coordinate includes frame-center-consistent accuracy and maneuvering-flight projection error bound. **Max execution time**: 5 minutes. --- ### IT-09: Tile Manager Runner **Summary**: Verify cache, generated tiles, and storage tests are executable. **Traces to**: AC-8.3, AC-8.4, AC-8.5, AC-NEW-6, AC-NEW-7 **Input data**: Cache integrity fixtures, generated tile scenarios, PostGIS manifest. **Expected result**: Runner validates cache load, tile write gates, no raw-frame retention, stale rejection, and poisoning budget evidence. **Max execution time**: Dataset-dependent. ## Performance Tests ### PT-01: End-To-End Release Gate Runner **Summary**: Verify performance and resource tests can run in the proper environment. **Traces to**: AC-4.1, AC-4.2, AC-NEW-5 **Load scenario**: - Environments: replay, Jetson hardware, SITL. - Duration: smoke, nightly, and release-gate profiles. | Metric | Target | Failure Threshold | |--------|--------|-------------------| | End-to-end p95 | <400 ms | >=400 ms | | Memory | <8 GB | >=8 GB | | Thermal throttle | 0 events in release gate | Any throttle event | --- ### PT-02: FDR/Storage Runner **Summary**: Verify 8-hour storage/endurance test orchestration. **Traces to**: AC-NEW-3 | Metric | Target | Failure Threshold | |--------|--------|-------------------| | FDR cap | <=64 GB | >64 GB | | Rollover logging | Complete | Missing rollover event | ## Security Tests ### ST-01: Security Fixture Runner **Summary**: Verify stale/tampered cache, spoofed MAVLink, and false-anchor scenarios are automated. **Traces to**: AC-NEW-4, AC-NEW-6, AC-NEW-7 **Attack vector**: Cache tampering, stale imagery, spoofed GPS, impossible anchors. **Test procedure**: 1. Load each security fixture. 2. Run scenario through public runtime interfaces. 3. Validate output labels, FDR, and rejection reasons. **Expected behavior**: No tampered/stale/spoofed input produces a trusted false fix. **Pass criteria**: 0 accepted unsafe anchors or spoofed GPS promotions outside gates. ## Acceptance Tests ### AT-01: Traceability Completeness Report **Summary**: Verify every AC has executable or explicitly blocked test coverage. **Traces to**: All ACs | Step | Action | Expected Result | |------|--------|-----------------| | 1 | Read traceability matrix | All ACs mapped to tests | | 2 | Run fixture validation | Missing public/representative data is reported as blocked, not passed | --- ### AT-02: Release Evidence Bundle **Summary**: Verify release evidence can be assembled. **Traces to**: AC-NEW-1 through AC-NEW-8 | Step | Action | Expected Result | |------|--------|-----------------| | 1 | Run release profile | Reports, tlogs, FDR summaries, cache reports are produced | | 2 | Collate artifacts | Bundle contains pass/fail status and residual blockers | ## Test Data Management | Data Set | Description | Source | Size | |----------|-------------|--------|------| | `project_60_still_images` | Frame-center geolocation smoke | Project data | Project size | | `public_dataset_slices` | MUN-FRL/ALTO/Kagaru/EPFL/AerialVL as licensed | Public pinned fixtures | Dataset-dependent | | `sitl_scenarios` | Plane spoofing/failsafe traces | Generated | Small | | `security_fixtures` | Stale/tampered/cache poisoning cases | Generated | Small | **Setup procedure**: Create isolated run directory, restore PostgreSQL schema, mount fixtures read-only, and start requested environment. **Teardown procedure**: Stop environments, archive reports, drop run schema, and delete temp volumes. **Data isolation strategy**: Unique run ID, schema, ports, cache staging directory, and FDR directory per scenario.