Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
24 KiB
Traceability Matrix
Authored by /test-spec Phase 2 (2026-05-19).
This matrix maps every acceptance-criterion bullet from _docs/00_problem/acceptance_criteria.md and every restriction bullet from _docs/00_problem/restrictions.md to the test scenarios that exercise them. Coverage is scenario-level, not fixture-level — scenarios marked DEFERRED in the underlying *-tests.md files still count as covered for the purpose of "the test is specified"; the fixture-acquisition status is tracked separately in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md.
Acceptance Criteria Coverage
| AC ID | Acceptance criterion (paraphrased; canonical text in acceptance_criteria.md) |
Test IDs | Coverage |
|---|---|---|---|
| AC-L1 | Tier 1 per-frame ≤ 100 ms at 1280 px on deployed compute | NFT-PERF-L1, FT-P-001 (functional contract) | Covered |
| AC-L2 | Tier 2 per-ROI ≤ 200 ms | NFT-PERF-L2 | Covered |
| AC-L3 | Tier 3 per-ROI ≤ 5 s (when enabled) | NFT-PERF-L3 | Covered (fixture DEFERRED) |
| AC-L4 | Camera zoom transition (medium→high) ≤ 2 s | NFT-PERF-L4 | Covered (fixture DEFERRED) |
| AC-L5 | Decision-to-movement latency ≤ 500 ms | NFT-PERF-L5 | Covered (fixture DEFERRED) |
| AC-L6 | Movement candidate enqueue ≤ 1 s (wide sweep) | NFT-PERF-L6 | Covered (fixture DEFERRED — gimbal.csv) |
| AC-L7 | Movement candidate enqueue ≤ 1.5 s (zoomed-in) | NFT-PERF-L7 | Covered (fixture DEFERRED — zoomed gimbal.csv) |
| AC-L8 | Zoom-out → zoom-in transition ≤ 2 s | NFT-PERF-L8 | Covered (fixture DEFERRED) |
| AC-L9 | Operator command → outbound action ≤ 500 ms | NFT-PERF-L9 | Covered (fixture DEFERRED for signed envelopes; placeholder usable today) |
| AC-T1 | POI rate surfaced to operator ≤ 5 / min (hard cap) | NFT-PERF-T1 | Covered |
| AC-T2 | Position telemetry rate ∈ [1, 10] Hz (target 10) | NFT-PERF-T2 | Covered (fixture DEFERRED — MAVLink replay) |
| AC-T3 | Frame-rate floor < 10 fps for ≥ 5 s → suppress zoom-in AND health yellow | NFT-PERF-T3 | Covered |
| AC-D1 | New classes per-class P ≥ 0.80 AND R ≥ 0.80 | FT-P-003 | Covered (fixture DEFERRED — annotated eval set) |
| AC-D2 | Existing-class regression Δ ≤ ± 0.02 vs baseline | FT-P-002 | Covered (baseline JSON DEFERRED; visual fixtures present) |
| AC-D3 | Concealed-position recall ≥ 0.60 (initial gate) | FT-P-004 | Covered (fixture DEFERRED — multi-season set) |
| AC-D4 | Concealed-position precision ≥ 0.20 (initial gate) | FT-P-005 | Covered (fixture DEFERRED — same as D3) |
| AC-D5 | Footpath recall ≥ 0.70 | FT-P-006 | Covered (fixture DEFERRED — polyline-annotated set) |
| AC-D6 | Tier-1 normalised-box contract conformance (class ids 0..18, coords ∈ [0,1]) | FT-P-001, NFT-SEC-Tier1SchemaViolation | Covered |
| AC-Mov-EnqueueWideSweep | Small movers during wide sweep MUST be detected and enqueued ≤ 1 s | FT-P-007 (M1 behavioural), NFT-PERF-L6 (latency dimension) | Covered |
| AC-Mov-ContinueDuringZoom | Movement detection continues during zoomed-in inspection | FT-P-008 (M2), NFT-PERF-L7 | Covered |
| AC-Mov-StableObjectsRejected | Stable objects (trees, houses, roads) NOT treated as moving solely due to camera platform motion | FT-P-007 (M1 — set_contains explicitly excludes tree row) | Covered |
| AC-Mov-FPBudgetHonoured | Configurable per-zoom-band FP budget honoured | FT-P-009 (M3), FT-P-010 (M4 — Q14) | Covered (M4 fixture DEFERRED) |
| AC-Scan-SweepCoverage | Wide-area sweep covers planned route at wide/light/medium zoom | implicitly by FT-P-011 setup + scenario-runner BIT scenarios; NOT covered as a distinct test | NOT COVERED (see Uncovered Items § §1) |
| AC-Scan-SweepToZoomTransition | Sweep → detailed inspection transition ≤ 2 s | FT-P-011, NFT-PERF-L8 | Covered |
| AC-Scan-TargetLock | Lock + pan + 2 s deep-analysis hold + per-POI timeout default 5 s | FT-P-015 (S5 — three cases) | Covered (fixture DEFERRED — vlm-mock with realistic timing) |
| AC-Scan-TargetFollowCentre | Target-follow within centre 25 % of frame | FT-P-013 | Covered (fixture DEFERRED) |
| AC-Scan-GimbalLatency | Gimbal decision-to-movement ≤ 500 ms (links to L5) | NFT-PERF-L5 | Covered |
| AC-Scan-POIOrdering | POI queue ordered by confidence × proximity × age_factor |
FT-P-014 | Covered |
| AC-Op-DecisionWindowScale | Decision window scales 30 s @ 0.40 → 120 s @ 1.00 linearly | FT-P-017 (O1), FT-P-018 (O2), FT-P-019 (O3), FT-N-004 (O4 below-threshold) | Covered |
| AC-Op-DeclinePersistsIgnored | Operator-decline → persistent ignored-item per (MGRS, class_group) | FT-P-020 (O5) | Covered |
| AC-Op-TimeoutForget | Timeout (no response) MUST NOT create ignored-item | FT-P-022 (O7) | Covered |
| AC-Op-IgnoredSuppress | New detection matching existing ignored-item NOT surfaced | FT-P-021 (O6) | Covered |
| AC-Op-ConfirmWaypointFollow | Operator-confirm → middle waypoint POST + target-follow mode | FT-P-016 (O8) | Covered (Q9 envelope DEFERRED; happy path uses placeholder) |
| AC-Op-ReplayUnsignedRejected | Replayed or unsigned operator command REJECTED with logged security WARN; state UNCHANGED | NFT-SEC-O9, NFT-SEC-O10 | Covered (Q9 DEFERRED for full semantics) |
| AC-Rel-BITGatesTakeoff | BIT MUST pass before takeoff permitted | FT-P-023 (R1), FT-N-001 (R2), FT-N-002 (R3), FT-N-003 (Mp2 BIT gate) | Covered |
| AC-Rel-LostLinkRTL30s | Lost operator/GS link → known mission-safe outcome within configurable grace (default 30 s → RTL) | NFT-RES-R4 | Covered |
| AC-Rel-AirframeLinkRedImmediate | Airframe command link loss → health red immediately; defer to airframe failsafe | NFT-RES-R7 (extension), implicitly by airframe-link health observation in NFT-RES-R5/R6 | Covered |
| AC-Rel-BatteryFloors | Battery ≤ RTL floor → RTL; battery ≤ hard floor → land-now; operator override only | NFT-RES-R5, NFT-RES-R6 | Covered (fixture DEFERRED) |
| AC-Rel-MavlinkExhaustionRed | MAVLink command exhaustion → airframe-link health red | NFT-RES-R7 | Covered (fixture DEFERRED) |
| AC-Rel-DriftYellow | Wall-clock drift > 200 ms → health yellow | NFT-RES-R8 | Covered |
| AC-Rel-GeofenceSymmetric | Geofence INCLUSION + EXCLUSION violations → waypoint refusal + RTL | NFT-RES-R9 (both cases) | Covered (fixture DEFERRED) |
| AC-Res-RSS6GB | Combined RSS on Jetson (excluding Tier 1) ≤ 6 GB sustained | NFT-RES-LIM-Re1, NFT-RES-LIM-CPU (CPU dimension), NFT-RES-LIM-FileHandles (FD dimension) | Covered (HW DEFERRED) |
| AC-Res-Tier1NonDegradation | Tier 1 per-frame latency Δ ± 5 ms under concurrent autopilot workload | NFT-RES-LIM-Re2, NFT-RES-LIM-GPU (GPU mutual exclusion) | Covered (HW DEFERRED) |
| AC-Mp-PreFlightPull30s | Pre-flight map pull ≤ 30 s; cache-fallback only with explicit operator ack | FT-P-024 (Mp1), FT-N-003 (Mp2 cache-fallback gate), NFT-RES-Mp2 (timing+recovery) | Covered |
| AC-Mp-PostFlightPush120s | Post-flight pass diff push ≤ 120 s; failure → persist + bounded retry | FT-P-025 (Mp3), NFT-RES-Mp4 | Covered (fixture DEFERRED) |
| AC-Gate-HWBench | HW/replay benchmark suite MUST pass before product implementation | every Tier-HW row in environment.md Hardware Execution Matrix (filled by hardware-assessment.md) |
Covered as a gate, executed at the Acceptance-Gates milestone |
| AC-Gate-SeasonCoverage | Per-season dataset coverage demonstrated before MVP sign-off (Q13) | NOT COVERED at blackbox test level — gated on annotation campaign and the ../ai-training repo |
NOT COVERED (see Uncovered Items § §2) |
| AC-Gate-MavlinkSITLConformance | MAVLink command surface MUST pass SITL conformance | implicitly by FT-P-016 (O8 confirms waypoint POST through SITL) + NFT-RES-R4/R5/R6/R7/R9 (all run through SITL); a dedicated conformance suite is recommended | Partially Covered (see Uncovered Items § §3) |
| AC-Q-Mov-Zoomed-FPRate | Movement detection FP rate at zoomed-in inspection (Q14) | FT-P-010 (M4) | Covered (Q14 DEFERRED) |
| AC-Q-MapObjectsConflict | MapObjects conflict resolution rule (Q8) | FT-P-026 (Mp5) | Covered (Q8 DEFERRED) |
| AC-Q-OperatorCmdAuth | Operator-command authentication conformance (Q9) | NFT-SEC-O9, NFT-SEC-O10, FT-P-016 (O8) | Covered (Q9 DEFERRED — placeholders used today) |
| AC-Q-MAVLinkSigning | Airframe MAVLink-2 message signing (Q6) | NFT-SEC-MavlinkUnsigned | Covered (Q6 DEFERRED) |
| AC-Q-SeasonGates | Per-season flight-test gates (Q13) | NOT COVERED — same as AC-Gate-SeasonCoverage | NOT COVERED |
Restrictions Coverage
| Restriction ID | Restriction (paraphrased; canonical in restrictions.md) |
Test IDs | Coverage |
|---|---|---|---|
| RESTRICT-HW-Jetson | Compute device Jetson Orin Nano Super; 8 GB shared LPDDR5; ~6 GB residual after Tier 1 | NFT-RES-LIM-Re1, NFT-RES-LIM-CPU, NFT-RES-LIM-Re2, all Tier-HW rows | Covered (HW DEFERRED) |
| RESTRICT-HW-A40 | Primary camera ViewPro A40; vendor protocol mandatory | FT-P-011, FT-P-012, FT-P-013, NFT-PERF-L4 (zoom traversal floor) | Covered (HW DEFERRED for L4) |
| RESTRICT-HW-Z40K | Alternative camera ViewPro Z40K — system must remain compatible | NOT COVERED at autopilot test level — verified by component-swap regression run on the Z40K HW | NOT COVERED (see Uncovered Items § §4) |
| RESTRICT-HW-ThermalLater | Thermal sensor may be added later; not assumed today | implicit (no test depends on thermal) | Covered by absence (negative assumption) |
| RESTRICT-HW-ZoomFloor | 40× optical zoom traversal 1–2 s wall-clock | NFT-PERF-L4 (asserts the ≤ 2 s ceiling that includes the physical floor) | Covered (HW DEFERRED) |
| RESTRICT-Op-Altitude | Flight altitude 600–1000 m | implicitly by every mission-trace fixture; no dedicated test | Covered by fixture assumption |
| RESTRICT-Op-AllSeasons | All four seasons in scope; winter-first-only rejected | FT-P-002, FT-P-003, FT-P-004, FT-P-005, FT-P-006 — multi-season fixtures required | Covered (all DEFERRED on multi-season fixtures) |
| RESTRICT-Op-AllTerrains | Forest, open field, urban edges, mixed terrain | same as RESTRICT-Op-AllSeasons | Covered (DEFERRED) |
| RESTRICT-Op-IntermittentModem | Modem operator/GS link intermittent | NFT-RES-R4, FT-P-016 (O8 nominal session), NFT-SEC-O9/O10 | Covered |
| RESTRICT-SW-JetsonResidualBudget | Onboard inference path runs within 6 GB residual RAM | NFT-RES-LIM-Re1 | Covered (HW DEFERRED) |
| RESTRICT-SW-FP16 | Models use FP16 precision (INT8 rejected for MVP) | NOT COVERED at autopilot test level — pinned at the model-loading layer (Tier 1 in ../detections; Tier 2/3 in autopilot config) |
NOT COVERED (see Uncovered Items § §5) |
| RESTRICT-SW-NoCloudInference | No cloud egress for inference | NFT-SEC-CraftedFrame (process boundary), implicit by environment.md autopilot-e2e network having no egress |
Covered |
| RESTRICT-SW-GPUMutualExclusion | Tier 1 + any local large model serialise on the Jetson GPU | NFT-RES-LIM-GPU | Covered (HW DEFERRED) |
| RESTRICT-SW-MissionSchemaShared | Autopilot consumes shared mission-schema; cannot fork |
FT-P-016 (O8 — POST validates against schema), FT-P-024 (Mp1 — schema-validated pull) | Covered (fixtures DEFERRED) |
| RESTRICT-Arch-Tier1External | Tier 1 lives in ../detections; autopilot consumes |
FT-P-001 (D6), NFT-SEC-Tier1SchemaViolation, FT-N-001 (R2 — Tier 1 unreachable inhibits BIT) | Covered |
| RESTRICT-Arch-MissionExternal | Mission state from missions service; autopilot doesn't author |
FT-P-024, FT-P-025, FT-P-016 | Covered (fixtures DEFERRED) |
| RESTRICT-Arch-MapInMissions | Central area map in missions /mapobjects |
FT-P-024, FT-P-025, FT-P-026 (Mp5), NFT-RES-Mp2, NFT-RES-Mp4 | Covered (fixtures DEFERRED) |
| RESTRICT-Arch-GPSDeniedExternal | GPS coords from separate GPS-denied service; autopilot does NOT implement | NOT COVERED at autopilot test level — verified at suite-e2e tier via the live GPS-denied service | NOT COVERED at autopilot tier (covered at suite-e2e tier) |
| RESTRICT-Arch-OperatorUIExternal | Operator browser UI owned by Ground Station; autopilot pushes data | implicit by NOT testing any UI rendering; verified by operator-stream protocol assertions in FT-P-016, FT-P-017–022 | Covered by absence |
| RESTRICT-Arch-AnnotationTrainingExternal | Annotation + training in ../annotations, ../ai-training; autopilot doesn't own |
NOT TESTABLE at autopilot blackbox tier — process boundary | NOT TESTABLE (intentional scope exclusion) |
| RESTRICT-Rel-BITGate | Pre-flight BIT MUST gate takeoff | FT-P-023 (R1), FT-N-001 (R2), FT-N-002 (R3), FT-N-003 (Mp2) | Covered |
| RESTRICT-Rel-LostLinkDeterministic | Lost operator-link failsafe deterministic + bounded | NFT-RES-R4 | Covered |
| RESTRICT-Rel-AirframeLossRedImmediate | Airframe MAVLink loss → health red immediately | NFT-RES-R7 (red after retry exhaustion); a dedicated "immediate red on link loss" scenario MAY be desirable (currently rolled into R7) | Partially Covered (see Uncovered Items § §6) |
| RESTRICT-Rel-BatteryThresholds | Battery RTL + land-now triggers (override only via operator) | NFT-RES-R5, NFT-RES-R6 | Covered (fixtures DEFERRED) |
| RESTRICT-Rel-GeofenceSymmetric | Geofence INCLUSION + EXCLUSION enforcement | NFT-RES-R9 (both) | Covered (fixture DEFERRED) |
| RESTRICT-Rel-OperatorCmdAuth | Operator commands authenticated + signed + replay-protected | NFT-SEC-O9, NFT-SEC-O10, FT-P-016 happy path | Covered (Q9 DEFERRED) |
| RESTRICT-Rel-StorageBounded | On-device storage bounded; full = takeoff blocker; mid-flight eviction policy | FT-N-002 (R3 — BIT block), NFT-RES-LIM-Storage | Covered |
| RESTRICT-Rel-NoSilentErrors | No silent error swallowing | every NFT-SEC-* scenario asserts a counter + log entry; every NFT-RES-* asserts a structured-log + health transition | Covered |
| RESTRICT-Rel-ClockBound | Wall-clock bound to GPS once locked, else NTP at boot | NFT-RES-R8 | Covered |
| RESTRICT-Rel-MavlinkConformance | MAVLink command surface MUST conform to ArduPilot/PX4 SITL | every MAVLink-emitting scenario runs through mavlink-sitl; a dedicated conformance suite is recommended |
Partially Covered (see Uncovered Items § §3) |
Coverage Summary
| Category | Total Items | Covered | Partially Covered | Not Covered | Coverage % (counting Partially as 0.5) |
|---|---|---|---|---|---|
| Acceptance Criteria | 47 | 43 | 1 | 3 | (43 + 0.5×1) / 47 ≈ 92.6 % |
| Restrictions | 30 | 25 | 2 | 3 | (25 + 0.5×2) / 30 ≈ 86.7 % |
| Total | 77 | 68 | 3 | 6 | (68 + 1.5) / 77 ≈ 90.3 % |
(Coverage here is "test scenario exists for the item", not "fixture has been acquired and the test currently passes". Fixture status is tracked in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md.)
Uncovered Items Analysis
| § | Item | Reason not covered | Risk | Mitigation |
|---|---|---|---|---|
| §1 | AC-Scan-SweepCoverage (wide-area sweep covers planned route) | The "covers the planned route" property is a path-coverage assertion best tested by component-level tests in the scan_controller component (geometry coverage) rather than at the blackbox level |
Medium — incorrect sweep pattern leaks observation gaps | Componenet-test in scan_controller (added by /decompose test tasks); a Tier-E "did the camera point at every planned waypoint area for ≥ N seconds" scenario can be added if needed |
| §2 | AC-Gate-SeasonCoverage / AC-Q-SeasonGates | Per-season coverage gates depend on dataset acquisition owned by ../ai-training and per-season flight tests (Q13) |
High — model performance on un-evaluated seasons unknown | Tracked as release-gate item; D3/D4/D5/D1 scenarios DEFERRED until each season's dataset lands |
| §3 | AC-Gate-MavlinkSITLConformance / RESTRICT-Rel-MavlinkConformance | A dedicated "every command in architecture.md §7.7 exercised against SITL" suite is recommended in addition to the implicit coverage by R-scenarios |
Medium — could miss a rarely-used command | Add a NFT-MavlinkConformance suite during Step 9 (Decompose Tests) — explicit per-command SITL exercise |
| §4 | RESTRICT-HW-Z40K (Z40K compatibility) | Requires a second camera HW for the swap test | Medium — could miss a A40-specific assumption | Run the Tier-HW rows on Z40K as a post-MVP smoke step |
| §5 | RESTRICT-SW-FP16 (model precision) | Pinned at config + model-loading layer; not externally observable beyond perf/latency | Low — incorrect precision would manifest as either L1 latency or D2 regression failure | Add a startup log assertion: "Tier 2/3 models loaded with precision=FP16" via the SUT's structured boot log |
| §6 | RESTRICT-Rel-AirframeLossRedImmediate (immediate red on airframe link loss) | NFT-RES-R7 asserts red after retry exhaustion; the "immediate red on link loss" path (no retries) is implicit | Low–Medium — depends on timing window between "link silent" and "considered lost" | Add NFT-RES-AirframeImmediate scenario in Step 9 (Decompose Tests) — sustained zero MAVLink traffic for N seconds → immediate health red (no retry phase) |
Scenario index by file
| File | Scenarios | Read-back ID prefix |
|---|---|---|
blackbox-tests.md |
26 positive + 4 negative | FT-P-001..FT-P-026, FT-N-001..FT-N-004 |
performance-tests.md |
9 latency + 3 rate | NFT-PERF-L1..L9, NFT-PERF-T1..T3 |
resilience-tests.md |
6 R-rows + 2 Mp-rows | NFT-RES-R4..R9, NFT-RES-Mp2, NFT-RES-Mp4 |
security-tests.md |
10 SEC rows | NFT-SEC-O9, NFT-SEC-O10, NFT-SEC-CraftedFrame, NFT-SEC-OversizeCrop, NFT-SEC-VlmSchemaViolation, NFT-SEC-VlmFreeFormText, NFT-SEC-IpcPeerAuth, NFT-SEC-Tier1SchemaViolation, NFT-SEC-MavlinkUnsigned, NFT-SEC-HealthExposesSecurity |
resource-limit-tests.md |
6 LIM rows | NFT-RES-LIM-Re1, Re2, Storage, CPU, GPU, FileHandles |
Total scenarios authored: 66.
Open dependencies summary
| Dependency | Affects (scenario count) | Tracking |
|---|---|---|
<DEFERRED: gimbal.csv + telemetry.csv pairs> |
FT-P-007/008/009/010, NFT-PERF-L6/L7 | Leftover row "Gimbal CSV pairs" |
<DEFERRED: multi-season annotated datasets (concealed, footpath, new classes, existing baseline)> |
FT-P-002/003/004/005/006 | Leftover row "Concealed position image set + Footpath sequences + new-class eval set" |
<DEFERRED: SITL or HW capture for L4/L5/L8> |
NFT-PERF-L4/L5/L8 | Leftover row "MAVLink SITL traces" + camera frame sequences with zoom-band labelling |
<DEFERRED: missions API mock fixtures (Mp1/Mp3/Mp4)> |
FT-P-024/025, NFT-RES-Mp4 | Leftover row "Mock central area-map service responses" |
<DEFERRED: vlm-io-pairs (real recordings)> |
NFT-PERF-L3, FT-P-015 (S5), NFT-SEC-VlmSchemaViolation real-recording variant | Leftover row "Deep-analysis I/O pairs" |
<DEFERRED: operator-envelopes (Q9-blocked)> |
NFT-SEC-O9/O10, full semantics of FT-P-016 | Leftover row "Operator-command envelopes" + Q9 |
<DEFERRED: HW Jetson Orin Nano Super OR benchmarked replay> |
every Tier-HW scenario (L1, L2, L4, L5, L8, Re1, Re2, CPU, GPU) | Leftover does not enumerate HW directly — tracked via the project-level Acceptance Gate |
<DEFERRED: Q6 — MAVLink-2 signing decision> |
NFT-SEC-MavlinkUnsigned | architecture.md §8 Q6 |
<DEFERRED: Q8 — MapObjects conflict resolution rule> |
FT-P-026 (Mp5) | architecture.md §8 Q8 |
<DEFERRED: Q9 — operator-command auth scheme> |
NFT-SEC-O9/O10 full semantics | architecture.md §8 Q9 |
<DEFERRED: Q13 — per-season gates> |
AC-Gate-SeasonCoverage | architecture.md §8 Q13 |
<DEFERRED: Q14 — movement-detection classical vs learned-CV> |
FT-P-010 (M4) | architecture.md §8 Q14 |
When any of the above dependencies resolves, the corresponding leftover entry is replayed (per tracker.mdc → Leftovers Mechanism) and the affected scenarios' Test status lines move from DEFERRED to READY in the source files.
Phase 3 — Test Data & Expected Results Validation Gate Outcome
Recorded by /test-spec Phase 3 on 2026-05-19.
Mechanical gate
Phase 3's mechanical contract is: every scenario MUST have either (a) a provided input + provided quantifiable expected result, OR (b) a behavioural trigger + observable behaviour + quantifiable pass/fail criterion. Scenarios that fail this contract are normally REMOVED. The 75 % final-coverage check then applies.
| Shape | Total scenarios | Quantifiable comparison declared | Input/trigger fully provided today | Input/trigger DEFERRED (release-gate item) |
|---|---|---|---|---|
| Input/output | 56 | 56 | 16 | 40 |
| Behavioural | 10 | 10 | 10 | 0 |
| Total | 66 | 66 (100 %) | 26 (39 %) | 40 (61 %) |
Every scenario carries a Comparison method drawn from .cursor/skills/test-spec/templates/expected-results.md (exact, numeric_tolerance, threshold_min/max, range, regex, substring, set_contains, json_diff, schema_match, file_reference) — none of the 66 fail the quantifiability check.
Project-policy override (recorded 2026-05-19)
The Phase 3 75 % fixture-coverage gate is intentionally overridden for this project, per the decision recorded in _docs/00_problem/input_data/expected_results/results_report.md → "Decision (project policy)":
rather than block on the Phase 3 75 % gate, each deferred row is now registered with a structured
<DEFERRED:>tag and surfaces indata_parameters.md → "Gaps that block /test-spec downstream"./test-specPhase 2 can author scenarios for all 56 rows; deferred rows become release-gate items, not development-gate items. Theacceptance_criteria.md → "Acceptance Gates (project-level)"hardware/replay benchmark requirement is preserved as the hard release gate — that one is NOT being deferred.
Under this policy:
- No scenarios are removed by Phase 3. Every authored scenario remains in the spec; its
Test statusline in the source file (blackbox-tests.md,performance-tests.md, etc.) carries eitherREADYorDEFERRED — <reason>. - Final coverage is computed at the scenario level, not the fixture level. Per the matrix above:
- AC coverage: 92.6 % (43 + 0.5 × 1 / 47)
- RESTRICT coverage: 86.7 % (25 + 0.5 × 2 / 30)
- Total: 90.3 % — well above the 75 % gate.
- Fixture acquisition is tracked as a release-gate concern in
_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md; on every/autodevinvocation the leftover-replay step re-evaluates whether any deferred fixture has landed and moves the affected scenarios fromDEFERREDtoREADY. - The project-level Acceptance Gate (
acceptance_criteria.md → "Acceptance Gates"— HW/replay benchmark, per-season coverage, MAVLink SITL conformance) remains a hard release blocker. The override does NOT relax that gate.
Phase 3 verdict
PASSED — scenario-level coverage 90.3 % ≥ 75 % gate; every scenario has a quantifiable comparison; deferred-fixture tracking handled via leftovers replay; no scenarios removed.
Phase 4 — Test Runner Script Generation: SKIPPED in this invocation
Per phases/04-runner-scripts.md → "Skip condition":
If this skill was invoked from the
/planskill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.
This invocation is greenfield Step 5 (Test Spec) and no source code exists yet — the _docs/02_document/components/*/description.md files describe 13 Rust components that the Implement step (Step 7) will create. Producing runner scripts here would write scripts/run-tests.sh and scripts/run-performance-tests.sh against a binary that does not yet exist.
Handoff to Step 6 (Decompose): the decomposer MUST create at least two task specs covering the test runner scripts:
- A task to create
scripts/run-tests.sh(Tier B/E orchestration; callsdocker compose -f e2e/docker-compose.autopilot-e2e.yml upand runscargo test --release --test scenariosine2e-consumer). - A task to create
scripts/run-performance-tests.sh(Tier HW orchestration; perenvironment.md → Hardware Execution Matrix).
Both tasks should be tagged as part of the test-infrastructure decomposition (Step 1t of decompose tests-only mode) so they land before any Tier-B test scenarios are implemented.