autopilot/_docs/02_document/tests/traceability-matrix.md

# Traceability Matrix

Authored by `/test-spec` Phase 2 (2026-05-19).

This matrix maps every acceptance-criterion bullet from `_docs/00_problem/acceptance_criteria.md` and every restriction bullet from `_docs/00_problem/restrictions.md` to the test scenarios that exercise them. Coverage is **scenario-level**, not fixture-level — scenarios marked `DEFERRED` in the underlying `*-tests.md` files still count as covered for the purpose of "the test is specified"; the fixture-acquisition status is tracked separately in `_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md`.

## Acceptance Criteria Coverage

| AC ID | Acceptance criterion (paraphrased; canonical text in `acceptance_criteria.md`) | Test IDs | Coverage |
|---|---|---|---|
| AC-L1 | Tier 1 per-frame ≤ 100 ms at 1280 px on deployed compute | NFT-PERF-L1, FT-P-001 (functional contract) | Covered |
| AC-L2 | Tier 2 per-ROI ≤ 200 ms | NFT-PERF-L2 | Covered |
| AC-L3 | Tier 3 per-ROI ≤ 5 s (when enabled) | NFT-PERF-L3 | Covered (fixture DEFERRED) |
| AC-L4 | Camera zoom transition (medium→high) ≤ 2 s | NFT-PERF-L4 | Covered (fixture DEFERRED) |
| AC-L5 | Decision-to-movement latency ≤ 500 ms | NFT-PERF-L5 | Covered (fixture DEFERRED) |
| AC-L6 | Movement candidate enqueue ≤ 1 s (wide sweep) | NFT-PERF-L6 | Covered (fixture DEFERRED — gimbal.csv) |
| AC-L7 | Movement candidate enqueue ≤ 1.5 s (zoomed-in) | NFT-PERF-L7 | Covered (fixture DEFERRED — zoomed gimbal.csv) |
| AC-L8 | Zoom-out → zoom-in transition ≤ 2 s | NFT-PERF-L8 | Covered (fixture DEFERRED) |
| AC-L9 | Operator command → outbound action ≤ 500 ms | NFT-PERF-L9 | Covered (fixture DEFERRED for signed envelopes; placeholder usable today) |
| AC-T1 | POI rate surfaced to operator ≤ 5 / min (hard cap) | NFT-PERF-T1 | Covered |
| AC-T2 | Position telemetry rate ∈ [1, 10] Hz (target 10) | NFT-PERF-T2 | Covered (fixture DEFERRED — MAVLink replay) |
| AC-T3 | Frame-rate floor < 10 fps for ≥ 5 s → suppress zoom-in AND health yellow | NFT-PERF-T3 | Covered |
| AC-D1 | New classes per-class P ≥ 0.80 AND R ≥ 0.80 | FT-P-003 | Covered (fixture DEFERRED — annotated eval set) |
| AC-D2 | Existing-class regression Δ ≤ ± 0.02 vs baseline | FT-P-002 | Covered (baseline JSON DEFERRED; visual fixtures present) |
| AC-D3 | Concealed-position recall ≥ 0.60 (initial gate) | FT-P-004 | Covered (fixture DEFERRED — multi-season set) |
| AC-D4 | Concealed-position precision ≥ 0.20 (initial gate) | FT-P-005 | Covered (fixture DEFERRED — same as D3) |
| AC-D5 | Footpath recall ≥ 0.70 | FT-P-006 | Covered (fixture DEFERRED — polyline-annotated set) |
| AC-D6 | Tier-1 normalised-box contract conformance (class ids 0..18, coords ∈ [0,1]) | FT-P-001, NFT-SEC-Tier1SchemaViolation | Covered |
| AC-Mov-EnqueueWideSweep | Small movers during wide sweep MUST be detected and enqueued ≤ 1 s | FT-P-007 (M1 behavioural), NFT-PERF-L6 (latency dimension) | Covered |
| AC-Mov-ContinueDuringZoom | Movement detection continues during zoomed-in inspection | FT-P-008 (M2), NFT-PERF-L7 | Covered |
| AC-Mov-StableObjectsRejected | Stable objects (trees, houses, roads) NOT treated as moving solely due to camera platform motion | FT-P-007 (M1 — set_contains explicitly excludes tree row) | Covered |
| AC-Mov-FPBudgetHonoured | Configurable per-zoom-band FP budget honoured | FT-P-009 (M3), FT-P-010 (M4 — Q14) | Covered (M4 fixture DEFERRED) |
| AC-Scan-SweepCoverage | Wide-area sweep covers planned route at wide/light/medium zoom | implicitly by FT-P-011 setup + scenario-runner BIT scenarios; NOT covered as a distinct test | NOT COVERED (see Uncovered Items § §1) |
| AC-Scan-SweepToZoomTransition | Sweep → detailed inspection transition ≤ 2 s | FT-P-011, NFT-PERF-L8 | Covered |
| AC-Scan-TargetLock | Lock + pan + 2 s deep-analysis hold + per-POI timeout default 5 s | FT-P-015 (S5 — three cases) | Covered (fixture DEFERRED — vlm-mock with realistic timing) |
| AC-Scan-TargetFollowCentre | Target-follow within centre 25 % of frame | FT-P-013 | Covered (fixture DEFERRED) |
| AC-Scan-GimbalLatency | Gimbal decision-to-movement ≤ 500 ms (links to L5) | NFT-PERF-L5 | Covered |
| AC-Scan-POIOrdering | POI queue ordered by `confidence × proximity × age_factor` | FT-P-014 | Covered |
| AC-Op-DecisionWindowScale | Decision window scales 30 s @ 0.40 → 120 s @ 1.00 linearly | FT-P-017 (O1), FT-P-018 (O2), FT-P-019 (O3), FT-N-004 (O4 below-threshold) | Covered |
| AC-Op-DeclinePersistsIgnored | Operator-decline → persistent ignored-item per (MGRS, class_group) | FT-P-020 (O5) | Covered |
| AC-Op-TimeoutForget | Timeout (no response) MUST NOT create ignored-item | FT-P-022 (O7) | Covered |
| AC-Op-IgnoredSuppress | New detection matching existing ignored-item NOT surfaced | FT-P-021 (O6) | Covered |
| AC-Op-ConfirmWaypointFollow | Operator-confirm → middle waypoint POST + target-follow mode | FT-P-016 (O8) | Covered (Q9 envelope DEFERRED; happy path uses placeholder) |
| AC-Op-ReplayUnsignedRejected | Replayed or unsigned operator command REJECTED with logged security WARN; state UNCHANGED | NFT-SEC-O9, NFT-SEC-O10 | Covered (Q9 DEFERRED for full semantics) |
| AC-Rel-BITGatesTakeoff | BIT MUST pass before takeoff permitted | FT-P-023 (R1), FT-N-001 (R2), FT-N-002 (R3), FT-N-003 (Mp2 BIT gate) | Covered |
| AC-Rel-LostLinkRTL30s | Lost operator/GS link → known mission-safe outcome within configurable grace (default 30 s → RTL) | NFT-RES-R4 | Covered |
| AC-Rel-AirframeLinkRedImmediate | Airframe command link loss → health red immediately; defer to airframe failsafe | NFT-RES-R7 (extension), implicitly by airframe-link health observation in NFT-RES-R5/R6 | Covered |
| AC-Rel-BatteryFloors | Battery ≤ RTL floor → RTL; battery ≤ hard floor → land-now; operator override only | NFT-RES-R5, NFT-RES-R6 | Covered (fixture DEFERRED) |
| AC-Rel-MavlinkExhaustionRed | MAVLink command exhaustion → airframe-link health red | NFT-RES-R7 | Covered (fixture DEFERRED) |
| AC-Rel-DriftYellow | Wall-clock drift > 200 ms → health yellow | NFT-RES-R8 | Covered |
| AC-Rel-GeofenceSymmetric | Geofence INCLUSION + EXCLUSION violations → waypoint refusal + RTL | NFT-RES-R9 (both cases) | Covered (fixture DEFERRED) |
| AC-Res-RSS6GB | Combined RSS on Jetson (excluding Tier 1) ≤ 6 GB sustained | NFT-RES-LIM-Re1, NFT-RES-LIM-CPU (CPU dimension), NFT-RES-LIM-FileHandles (FD dimension) | Covered (HW DEFERRED) |
| AC-Res-Tier1NonDegradation | Tier 1 per-frame latency Δ ± 5 ms under concurrent autopilot workload | NFT-RES-LIM-Re2, NFT-RES-LIM-GPU (GPU mutual exclusion) | Covered (HW DEFERRED) |
| AC-Mp-PreFlightPull30s | Pre-flight map pull ≤ 30 s; cache-fallback only with explicit operator ack | FT-P-024 (Mp1), FT-N-003 (Mp2 cache-fallback gate), NFT-RES-Mp2 (timing+recovery) | Covered |
| AC-Mp-PostFlightPush120s | Post-flight pass diff push ≤ 120 s; failure → persist + bounded retry | FT-P-025 (Mp3), NFT-RES-Mp4 | Covered (fixture DEFERRED) |
| AC-Gate-HWBench | HW/replay benchmark suite MUST pass before product implementation | every Tier-HW row in environment.md `Hardware Execution Matrix` (filled by `hardware-assessment.md`) | Covered as a gate, executed at the Acceptance-Gates milestone |
| AC-Gate-SeasonCoverage | Per-season dataset coverage demonstrated before MVP sign-off (Q13) | NOT COVERED at blackbox test level — gated on annotation campaign and the `../ai-training` repo | NOT COVERED (see Uncovered Items § §2) |
| AC-Gate-MavlinkSITLConformance | MAVLink command surface MUST pass SITL conformance | implicitly by FT-P-016 (O8 confirms waypoint POST through SITL) + NFT-RES-R4/R5/R6/R7/R9 (all run through SITL); a dedicated conformance suite is recommended | Partially Covered (see Uncovered Items § §3) |
| AC-Q-Mov-Zoomed-FPRate | Movement detection FP rate at zoomed-in inspection (Q14) | FT-P-010 (M4) | Covered (Q14 DEFERRED) |
| AC-Q-MapObjectsConflict | MapObjects conflict resolution rule (Q8) | FT-P-026 (Mp5) | Covered (Q8 DEFERRED) |
| AC-Q-OperatorCmdAuth | Operator-command authentication conformance (Q9) | NFT-SEC-O9, NFT-SEC-O10, FT-P-016 (O8) | Covered (Q9 DEFERRED — placeholders used today) |
| AC-Q-MAVLinkSigning | Airframe MAVLink-2 message signing (Q6) | NFT-SEC-MavlinkUnsigned | Covered (Q6 DEFERRED) |
| AC-Q-SeasonGates | Per-season flight-test gates (Q13) | NOT COVERED — same as AC-Gate-SeasonCoverage | NOT COVERED |

## Restrictions Coverage

| Restriction ID | Restriction (paraphrased; canonical in `restrictions.md`) | Test IDs | Coverage |
|---|---|---|---|
| RESTRICT-HW-Jetson | Compute device Jetson Orin Nano Super; 8 GB shared LPDDR5; ~6 GB residual after Tier 1 | NFT-RES-LIM-Re1, NFT-RES-LIM-CPU, NFT-RES-LIM-Re2, all Tier-HW rows | Covered (HW DEFERRED) |
| RESTRICT-HW-A40 | Primary camera ViewPro A40; vendor protocol mandatory | FT-P-011, FT-P-012, FT-P-013, NFT-PERF-L4 (zoom traversal floor) | Covered (HW DEFERRED for L4) |
| RESTRICT-HW-Z40K | Alternative camera ViewPro Z40K — system must remain compatible | NOT COVERED at autopilot test level — verified by component-swap regression run on the Z40K HW | NOT COVERED (see Uncovered Items § §4) |
| RESTRICT-HW-ThermalLater | Thermal sensor may be added later; not assumed today | implicit (no test depends on thermal) | Covered by absence (negative assumption) |
| RESTRICT-HW-ZoomFloor | 40× optical zoom traversal 1–2 s wall-clock | NFT-PERF-L4 (asserts the ≤ 2 s ceiling that includes the physical floor) | Covered (HW DEFERRED) |
| RESTRICT-Op-Altitude | Flight altitude 600–1000 m | implicitly by every mission-trace fixture; no dedicated test | Covered by fixture assumption |
| RESTRICT-Op-AllSeasons | All four seasons in scope; winter-first-only rejected | FT-P-002, FT-P-003, FT-P-004, FT-P-005, FT-P-006 — multi-season fixtures required | Covered (all DEFERRED on multi-season fixtures) |
| RESTRICT-Op-AllTerrains | Forest, open field, urban edges, mixed terrain | same as RESTRICT-Op-AllSeasons | Covered (DEFERRED) |
| RESTRICT-Op-IntermittentModem | Modem operator/GS link intermittent | NFT-RES-R4, FT-P-016 (O8 nominal session), NFT-SEC-O9/O10 | Covered |
| RESTRICT-SW-JetsonResidualBudget | Onboard inference path runs within 6 GB residual RAM | NFT-RES-LIM-Re1 | Covered (HW DEFERRED) |
| RESTRICT-SW-FP16 | Models use FP16 precision (INT8 rejected for MVP) | NOT COVERED at autopilot test level — pinned at the model-loading layer (Tier 1 in `../detections`; Tier 2/3 in autopilot config) | NOT COVERED (see Uncovered Items § §5) |
| RESTRICT-SW-NoCloudInference | No cloud egress for inference | NFT-SEC-CraftedFrame (process boundary), implicit by environment.md `autopilot-e2e` network having no egress | Covered |
| RESTRICT-SW-GPUMutualExclusion | Tier 1 + any local large model serialise on the Jetson GPU | NFT-RES-LIM-GPU | Covered (HW DEFERRED) |
| RESTRICT-SW-MissionSchemaShared | Autopilot consumes shared `mission-schema`; cannot fork | FT-P-016 (O8 — POST validates against schema), FT-P-024 (Mp1 — schema-validated pull) | Covered (fixtures DEFERRED) |
| RESTRICT-Arch-Tier1External | Tier 1 lives in `../detections`; autopilot consumes | FT-P-001 (D6), NFT-SEC-Tier1SchemaViolation, FT-N-001 (R2 — Tier 1 unreachable inhibits BIT) | Covered |
| RESTRICT-Arch-MissionExternal | Mission state from `missions` service; autopilot doesn't author | FT-P-024, FT-P-025, FT-P-016 | Covered (fixtures DEFERRED) |
| RESTRICT-Arch-MapInMissions | Central area map in `missions /mapobjects` | FT-P-024, FT-P-025, FT-P-026 (Mp5), NFT-RES-Mp2, NFT-RES-Mp4 | Covered (fixtures DEFERRED) |
| RESTRICT-Arch-GPSDeniedExternal | GPS coords from separate GPS-denied service; autopilot does NOT implement | NOT COVERED at autopilot test level — verified at suite-e2e tier via the live GPS-denied service | NOT COVERED at autopilot tier (covered at suite-e2e tier) |
| RESTRICT-Arch-OperatorUIExternal | Operator browser UI owned by Ground Station; autopilot pushes data | implicit by NOT testing any UI rendering; verified by operator-stream protocol assertions in FT-P-016, FT-P-017–022 | Covered by absence |
| RESTRICT-Arch-AnnotationTrainingExternal | Annotation + training in `../annotations`, `../ai-training`; autopilot doesn't own | NOT TESTABLE at autopilot blackbox tier — process boundary | NOT TESTABLE (intentional scope exclusion) |
| RESTRICT-Rel-BITGate | Pre-flight BIT MUST gate takeoff | FT-P-023 (R1), FT-N-001 (R2), FT-N-002 (R3), FT-N-003 (Mp2) | Covered |
| RESTRICT-Rel-LostLinkDeterministic | Lost operator-link failsafe deterministic + bounded | NFT-RES-R4 | Covered |
| RESTRICT-Rel-AirframeLossRedImmediate | Airframe MAVLink loss → health red immediately | NFT-RES-R7 (red after retry exhaustion); a dedicated "immediate red on link loss" scenario MAY be desirable (currently rolled into R7) | Partially Covered (see Uncovered Items § §6) |
| RESTRICT-Rel-BatteryThresholds | Battery RTL + land-now triggers (override only via operator) | NFT-RES-R5, NFT-RES-R6 | Covered (fixtures DEFERRED) |
| RESTRICT-Rel-GeofenceSymmetric | Geofence INCLUSION + EXCLUSION enforcement | NFT-RES-R9 (both) | Covered (fixture DEFERRED) |
| RESTRICT-Rel-OperatorCmdAuth | Operator commands authenticated + signed + replay-protected | NFT-SEC-O9, NFT-SEC-O10, FT-P-016 happy path | Covered (Q9 DEFERRED) |
| RESTRICT-Rel-StorageBounded | On-device storage bounded; full = takeoff blocker; mid-flight eviction policy | FT-N-002 (R3 — BIT block), NFT-RES-LIM-Storage | Covered |
| RESTRICT-Rel-NoSilentErrors | No silent error swallowing | every NFT-SEC-* scenario asserts a counter + log entry; every NFT-RES-* asserts a structured-log + health transition | Covered |
| RESTRICT-Rel-ClockBound | Wall-clock bound to GPS once locked, else NTP at boot | NFT-RES-R8 | Covered |
| RESTRICT-Rel-MavlinkConformance | MAVLink command surface MUST conform to ArduPilot/PX4 SITL | every MAVLink-emitting scenario runs through `mavlink-sitl`; a dedicated conformance suite is recommended | Partially Covered (see Uncovered Items § §3) |

## Coverage Summary

| Category | Total Items | Covered | Partially Covered | Not Covered | Coverage % (counting Partially as 0.5) |
|---|---|---|---|---|---|
| Acceptance Criteria | 47 | 43 | 1 | 3 | (43 + 0.5×1) / 47 ≈ **92.6 %** |
| Restrictions | 30 | 25 | 2 | 3 | (25 + 0.5×2) / 30 ≈ **86.7 %** |
| **Total** | 77 | 68 | 3 | 6 | **(68 + 1.5) / 77 ≈ 90.3 %** |

(Coverage here is "test scenario exists for the item", not "fixture has been acquired and the test currently passes". Fixture status is tracked in `_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md`.)

## Uncovered Items Analysis

| § | Item | Reason not covered | Risk | Mitigation |
|---|---|---|---|---|
| §1 | AC-Scan-SweepCoverage (wide-area sweep covers planned route) | The "covers the planned route" property is a path-coverage assertion best tested by component-level tests in the `scan_controller` component (geometry coverage) rather than at the blackbox level | Medium — incorrect sweep pattern leaks observation gaps | Componenet-test in `scan_controller` (added by `/decompose` test tasks); a Tier-E "did the camera point at every planned waypoint area for ≥ N seconds" scenario can be added if needed |
| §2 | AC-Gate-SeasonCoverage / AC-Q-SeasonGates | Per-season coverage gates depend on dataset acquisition owned by `../ai-training` and per-season flight tests (Q13) | High — model performance on un-evaluated seasons unknown | Tracked as release-gate item; D3/D4/D5/D1 scenarios DEFERRED until each season's dataset lands |
| §3 | AC-Gate-MavlinkSITLConformance / RESTRICT-Rel-MavlinkConformance | A dedicated "every command in `architecture.md §7.7` exercised against SITL" suite is recommended in addition to the implicit coverage by R-scenarios | Medium — could miss a rarely-used command | Add a `NFT-MavlinkConformance` suite during Step 9 (Decompose Tests) — explicit per-command SITL exercise |
| §4 | RESTRICT-HW-Z40K (Z40K compatibility) | Requires a second camera HW for the swap test | Medium — could miss a A40-specific assumption | Run the Tier-HW rows on Z40K as a post-MVP smoke step |
| §5 | RESTRICT-SW-FP16 (model precision) | Pinned at config + model-loading layer; not externally observable beyond perf/latency | Low — incorrect precision would manifest as either L1 latency or D2 regression failure | Add a startup log assertion: "Tier 2/3 models loaded with precision=FP16" via the SUT's structured boot log |
| §6 | RESTRICT-Rel-AirframeLossRedImmediate (immediate red on airframe link loss) | NFT-RES-R7 asserts red after retry exhaustion; the "immediate red on link loss" path (no retries) is implicit | Low–Medium — depends on timing window between "link silent" and "considered lost" | Add `NFT-RES-AirframeImmediate` scenario in Step 9 (Decompose Tests) — sustained zero MAVLink traffic for N seconds → immediate health red (no retry phase) |

## Scenario index by file

| File | Scenarios | Read-back ID prefix |
|---|---|---|
| `blackbox-tests.md` | 26 positive + 4 negative | FT-P-001..FT-P-026, FT-N-001..FT-N-004 |
| `performance-tests.md` | 9 latency + 3 rate | NFT-PERF-L1..L9, NFT-PERF-T1..T3 |
| `resilience-tests.md` | 6 R-rows + 2 Mp-rows | NFT-RES-R4..R9, NFT-RES-Mp2, NFT-RES-Mp4 |
| `security-tests.md` | 10 SEC rows | NFT-SEC-O9, NFT-SEC-O10, NFT-SEC-CraftedFrame, NFT-SEC-OversizeCrop, NFT-SEC-VlmSchemaViolation, NFT-SEC-VlmFreeFormText, NFT-SEC-IpcPeerAuth, NFT-SEC-Tier1SchemaViolation, NFT-SEC-MavlinkUnsigned, NFT-SEC-HealthExposesSecurity |
| `resource-limit-tests.md` | 6 LIM rows | NFT-RES-LIM-Re1, Re2, Storage, CPU, GPU, FileHandles |

**Total scenarios authored**: 66.

## Open dependencies summary

| Dependency | Affects (scenario count) | Tracking |
|---|---|---|
| `<DEFERRED: gimbal.csv + telemetry.csv pairs>` | FT-P-007/008/009/010, NFT-PERF-L6/L7 | Leftover row "Gimbal CSV pairs" |
| `<DEFERRED: multi-season annotated datasets (concealed, footpath, new classes, existing baseline)>` | FT-P-002/003/004/005/006 | Leftover row "Concealed position image set + Footpath sequences + new-class eval set" |
| `<DEFERRED: SITL or HW capture for L4/L5/L8>` | NFT-PERF-L4/L5/L8 | Leftover row "MAVLink SITL traces" + camera frame sequences with zoom-band labelling |
| `<DEFERRED: missions API mock fixtures (Mp1/Mp3/Mp4)>` | FT-P-024/025, NFT-RES-Mp4 | Leftover row "Mock central area-map service responses" |
| `<DEFERRED: vlm-io-pairs (real recordings)>` | NFT-PERF-L3, FT-P-015 (S5), NFT-SEC-VlmSchemaViolation real-recording variant | Leftover row "Deep-analysis I/O pairs" |
| `<DEFERRED: operator-envelopes (Q9-blocked)>` | NFT-SEC-O9/O10, full semantics of FT-P-016 | Leftover row "Operator-command envelopes" + Q9 |
| `<DEFERRED: HW Jetson Orin Nano Super OR benchmarked replay>` | every Tier-HW scenario (L1, L2, L4, L5, L8, Re1, Re2, CPU, GPU) | Leftover does not enumerate HW directly — tracked via the project-level Acceptance Gate |
| `<DEFERRED: Q6 — MAVLink-2 signing decision>` | NFT-SEC-MavlinkUnsigned | architecture.md §8 Q6 |
| `<DEFERRED: Q8 — MapObjects conflict resolution rule>` | FT-P-026 (Mp5) | architecture.md §8 Q8 |
| `<DEFERRED: Q9 — operator-command auth scheme>` | NFT-SEC-O9/O10 full semantics | architecture.md §8 Q9 |
| `<DEFERRED: Q13 — per-season gates>` | AC-Gate-SeasonCoverage | architecture.md §8 Q13 |
| `<DEFERRED: Q14 — movement-detection classical vs learned-CV>` | FT-P-010 (M4) | architecture.md §8 Q14 |

When any of the above dependencies resolves, the corresponding leftover entry is replayed (per `tracker.mdc → Leftovers Mechanism`) and the affected scenarios' `Test status` lines move from `DEFERRED` to `READY` in the source files.

## Phase 3 — Test Data & Expected Results Validation Gate Outcome

Recorded by `/test-spec` Phase 3 on 2026-05-19.

### Mechanical gate

Phase 3's mechanical contract is: every scenario MUST have either (a) a provided input + provided quantifiable expected result, OR (b) a behavioural trigger + observable behaviour + quantifiable pass/fail criterion. Scenarios that fail this contract are normally REMOVED. The 75 % final-coverage check then applies.

| Shape | Total scenarios | Quantifiable comparison declared | Input/trigger fully provided today | Input/trigger DEFERRED (release-gate item) |
|---|---|---|---|---|
| Input/output | 56 | 56 | 16 | 40 |
| Behavioural | 10 | 10 | 10 | 0 |
| **Total** | **66** | **66 (100 %)** | **26 (39 %)** | **40 (61 %)** |

Every scenario carries a `Comparison` method drawn from `.cursor/skills/test-spec/templates/expected-results.md` (`exact`, `numeric_tolerance`, `threshold_min/max`, `range`, `regex`, `substring`, `set_contains`, `json_diff`, `schema_match`, `file_reference`) — none of the 66 fail the quantifiability check.

### Project-policy override (recorded 2026-05-19)

The Phase 3 75 % fixture-coverage gate is **intentionally overridden** for this project, per the decision recorded in `_docs/00_problem/input_data/expected_results/results_report.md → "Decision (project policy)"`:

> rather than block on the Phase 3 75 % gate, each deferred row is now registered with a structured `<DEFERRED:>` tag and surfaces in `data_parameters.md → "Gaps that block /test-spec downstream"`. `/test-spec` Phase 2 can author scenarios for all 56 rows; deferred rows become **release-gate items**, not development-gate items. The `acceptance_criteria.md → "Acceptance Gates (project-level)"` hardware/replay benchmark requirement is preserved as the hard release gate — that one is NOT being deferred.

Under this policy:

- **No scenarios are removed by Phase 3.** Every authored scenario remains in the spec; its `Test status` line in the source file (`blackbox-tests.md`, `performance-tests.md`, etc.) carries either `READY` or `DEFERRED — <reason>`.
- **Final coverage** is computed at the **scenario level**, not the fixture level. Per the matrix above:
  - AC coverage: 92.6 % (43 + 0.5 × 1 / 47)
  - RESTRICT coverage: 86.7 % (25 + 0.5 × 2 / 30)
  - **Total: 90.3 %** — well above the 75 % gate.
- **Fixture acquisition** is tracked as a release-gate concern in `_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md`; on every `/autodev` invocation the leftover-replay step re-evaluates whether any deferred fixture has landed and moves the affected scenarios from `DEFERRED` to `READY`.
- **The project-level Acceptance Gate** (`acceptance_criteria.md → "Acceptance Gates"` — HW/replay benchmark, per-season coverage, MAVLink SITL conformance) remains a hard release blocker. The override does NOT relax that gate.

### Phase 3 verdict

**PASSED** — scenario-level coverage 90.3 % ≥ 75 % gate; every scenario has a quantifiable comparison; deferred-fixture tracking handled via leftovers replay; no scenarios removed.

## Phase 4 — Test Runner Script Generation: SKIPPED in this invocation

Per `phases/04-runner-scripts.md → "Skip condition"`:

> If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.

This invocation is greenfield Step 5 (Test Spec) and no source code exists yet — the `_docs/02_document/components/*/description.md` files describe 13 Rust components that the Implement step (Step 7) will create. Producing runner scripts here would write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` against a binary that does not yet exist.

**Handoff to Step 6 (Decompose)**: the decomposer MUST create at least two task specs covering the test runner scripts:

1. A task to create `scripts/run-tests.sh` (Tier B/E orchestration; calls `docker compose -f e2e/docker-compose.autopilot-e2e.yml up` and runs `cargo test --release --test scenarios` in `e2e-consumer`).
2. A task to create `scripts/run-performance-tests.sh` (Tier HW orchestration; per `environment.md → Hardware Execution Matrix`).

Both tasks should be tagged as part of the test-infrastructure decomposition (`Step 1t` of decompose tests-only mode) so they land before any Tier-B test scenarios are implemented.