Files
Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 11:02:01 +03:00

24 KiB
Raw Permalink Blame History

Traceability Matrix

Authored by /test-spec Phase 2 (2026-05-19).

This matrix maps every acceptance-criterion bullet from _docs/00_problem/acceptance_criteria.md and every restriction bullet from _docs/00_problem/restrictions.md to the test scenarios that exercise them. Coverage is scenario-level, not fixture-level — scenarios marked DEFERRED in the underlying *-tests.md files still count as covered for the purpose of "the test is specified"; the fixture-acquisition status is tracked separately in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md.

Acceptance Criteria Coverage

AC ID Acceptance criterion (paraphrased; canonical text in acceptance_criteria.md) Test IDs Coverage
AC-L1 Tier 1 per-frame ≤ 100 ms at 1280 px on deployed compute NFT-PERF-L1, FT-P-001 (functional contract) Covered
AC-L2 Tier 2 per-ROI ≤ 200 ms NFT-PERF-L2 Covered
AC-L3 Tier 3 per-ROI ≤ 5 s (when enabled) NFT-PERF-L3 Covered (fixture DEFERRED)
AC-L4 Camera zoom transition (medium→high) ≤ 2 s NFT-PERF-L4 Covered (fixture DEFERRED)
AC-L5 Decision-to-movement latency ≤ 500 ms NFT-PERF-L5 Covered (fixture DEFERRED)
AC-L6 Movement candidate enqueue ≤ 1 s (wide sweep) NFT-PERF-L6 Covered (fixture DEFERRED — gimbal.csv)
AC-L7 Movement candidate enqueue ≤ 1.5 s (zoomed-in) NFT-PERF-L7 Covered (fixture DEFERRED — zoomed gimbal.csv)
AC-L8 Zoom-out → zoom-in transition ≤ 2 s NFT-PERF-L8 Covered (fixture DEFERRED)
AC-L9 Operator command → outbound action ≤ 500 ms NFT-PERF-L9 Covered (fixture DEFERRED for signed envelopes; placeholder usable today)
AC-T1 POI rate surfaced to operator ≤ 5 / min (hard cap) NFT-PERF-T1 Covered
AC-T2 Position telemetry rate ∈ [1, 10] Hz (target 10) NFT-PERF-T2 Covered (fixture DEFERRED — MAVLink replay)
AC-T3 Frame-rate floor < 10 fps for ≥ 5 s → suppress zoom-in AND health yellow NFT-PERF-T3 Covered
AC-D1 New classes per-class P ≥ 0.80 AND R ≥ 0.80 FT-P-003 Covered (fixture DEFERRED — annotated eval set)
AC-D2 Existing-class regression Δ ≤ ± 0.02 vs baseline FT-P-002 Covered (baseline JSON DEFERRED; visual fixtures present)
AC-D3 Concealed-position recall ≥ 0.60 (initial gate) FT-P-004 Covered (fixture DEFERRED — multi-season set)
AC-D4 Concealed-position precision ≥ 0.20 (initial gate) FT-P-005 Covered (fixture DEFERRED — same as D3)
AC-D5 Footpath recall ≥ 0.70 FT-P-006 Covered (fixture DEFERRED — polyline-annotated set)
AC-D6 Tier-1 normalised-box contract conformance (class ids 0..18, coords ∈ [0,1]) FT-P-001, NFT-SEC-Tier1SchemaViolation Covered
AC-Mov-EnqueueWideSweep Small movers during wide sweep MUST be detected and enqueued ≤ 1 s FT-P-007 (M1 behavioural), NFT-PERF-L6 (latency dimension) Covered
AC-Mov-ContinueDuringZoom Movement detection continues during zoomed-in inspection FT-P-008 (M2), NFT-PERF-L7 Covered
AC-Mov-StableObjectsRejected Stable objects (trees, houses, roads) NOT treated as moving solely due to camera platform motion FT-P-007 (M1 — set_contains explicitly excludes tree row) Covered
AC-Mov-FPBudgetHonoured Configurable per-zoom-band FP budget honoured FT-P-009 (M3), FT-P-010 (M4 — Q14) Covered (M4 fixture DEFERRED)
AC-Scan-SweepCoverage Wide-area sweep covers planned route at wide/light/medium zoom implicitly by FT-P-011 setup + scenario-runner BIT scenarios; NOT covered as a distinct test NOT COVERED (see Uncovered Items § §1)
AC-Scan-SweepToZoomTransition Sweep → detailed inspection transition ≤ 2 s FT-P-011, NFT-PERF-L8 Covered
AC-Scan-TargetLock Lock + pan + 2 s deep-analysis hold + per-POI timeout default 5 s FT-P-015 (S5 — three cases) Covered (fixture DEFERRED — vlm-mock with realistic timing)
AC-Scan-TargetFollowCentre Target-follow within centre 25 % of frame FT-P-013 Covered (fixture DEFERRED)
AC-Scan-GimbalLatency Gimbal decision-to-movement ≤ 500 ms (links to L5) NFT-PERF-L5 Covered
AC-Scan-POIOrdering POI queue ordered by confidence × proximity × age_factor FT-P-014 Covered
AC-Op-DecisionWindowScale Decision window scales 30 s @ 0.40 → 120 s @ 1.00 linearly FT-P-017 (O1), FT-P-018 (O2), FT-P-019 (O3), FT-N-004 (O4 below-threshold) Covered
AC-Op-DeclinePersistsIgnored Operator-decline → persistent ignored-item per (MGRS, class_group) FT-P-020 (O5) Covered
AC-Op-TimeoutForget Timeout (no response) MUST NOT create ignored-item FT-P-022 (O7) Covered
AC-Op-IgnoredSuppress New detection matching existing ignored-item NOT surfaced FT-P-021 (O6) Covered
AC-Op-ConfirmWaypointFollow Operator-confirm → middle waypoint POST + target-follow mode FT-P-016 (O8) Covered (Q9 envelope DEFERRED; happy path uses placeholder)
AC-Op-ReplayUnsignedRejected Replayed or unsigned operator command REJECTED with logged security WARN; state UNCHANGED NFT-SEC-O9, NFT-SEC-O10 Covered (Q9 DEFERRED for full semantics)
AC-Rel-BITGatesTakeoff BIT MUST pass before takeoff permitted FT-P-023 (R1), FT-N-001 (R2), FT-N-002 (R3), FT-N-003 (Mp2 BIT gate) Covered
AC-Rel-LostLinkRTL30s Lost operator/GS link → known mission-safe outcome within configurable grace (default 30 s → RTL) NFT-RES-R4 Covered
AC-Rel-AirframeLinkRedImmediate Airframe command link loss → health red immediately; defer to airframe failsafe NFT-RES-R7 (extension), implicitly by airframe-link health observation in NFT-RES-R5/R6 Covered
AC-Rel-BatteryFloors Battery ≤ RTL floor → RTL; battery ≤ hard floor → land-now; operator override only NFT-RES-R5, NFT-RES-R6 Covered (fixture DEFERRED)
AC-Rel-MavlinkExhaustionRed MAVLink command exhaustion → airframe-link health red NFT-RES-R7 Covered (fixture DEFERRED)
AC-Rel-DriftYellow Wall-clock drift > 200 ms → health yellow NFT-RES-R8 Covered
AC-Rel-GeofenceSymmetric Geofence INCLUSION + EXCLUSION violations → waypoint refusal + RTL NFT-RES-R9 (both cases) Covered (fixture DEFERRED)
AC-Res-RSS6GB Combined RSS on Jetson (excluding Tier 1) ≤ 6 GB sustained NFT-RES-LIM-Re1, NFT-RES-LIM-CPU (CPU dimension), NFT-RES-LIM-FileHandles (FD dimension) Covered (HW DEFERRED)
AC-Res-Tier1NonDegradation Tier 1 per-frame latency Δ ± 5 ms under concurrent autopilot workload NFT-RES-LIM-Re2, NFT-RES-LIM-GPU (GPU mutual exclusion) Covered (HW DEFERRED)
AC-Mp-PreFlightPull30s Pre-flight map pull ≤ 30 s; cache-fallback only with explicit operator ack FT-P-024 (Mp1), FT-N-003 (Mp2 cache-fallback gate), NFT-RES-Mp2 (timing+recovery) Covered
AC-Mp-PostFlightPush120s Post-flight pass diff push ≤ 120 s; failure → persist + bounded retry FT-P-025 (Mp3), NFT-RES-Mp4 Covered (fixture DEFERRED)
AC-Gate-HWBench HW/replay benchmark suite MUST pass before product implementation every Tier-HW row in environment.md Hardware Execution Matrix (filled by hardware-assessment.md) Covered as a gate, executed at the Acceptance-Gates milestone
AC-Gate-SeasonCoverage Per-season dataset coverage demonstrated before MVP sign-off (Q13) NOT COVERED at blackbox test level — gated on annotation campaign and the ../ai-training repo NOT COVERED (see Uncovered Items § §2)
AC-Gate-MavlinkSITLConformance MAVLink command surface MUST pass SITL conformance implicitly by FT-P-016 (O8 confirms waypoint POST through SITL) + NFT-RES-R4/R5/R6/R7/R9 (all run through SITL); a dedicated conformance suite is recommended Partially Covered (see Uncovered Items § §3)
AC-Q-Mov-Zoomed-FPRate Movement detection FP rate at zoomed-in inspection (Q14) FT-P-010 (M4) Covered (Q14 DEFERRED)
AC-Q-MapObjectsConflict MapObjects conflict resolution rule (Q8) FT-P-026 (Mp5) Covered (Q8 DEFERRED)
AC-Q-OperatorCmdAuth Operator-command authentication conformance (Q9) NFT-SEC-O9, NFT-SEC-O10, FT-P-016 (O8) Covered (Q9 DEFERRED — placeholders used today)
AC-Q-MAVLinkSigning Airframe MAVLink-2 message signing (Q6) NFT-SEC-MavlinkUnsigned Covered (Q6 DEFERRED)
AC-Q-SeasonGates Per-season flight-test gates (Q13) NOT COVERED — same as AC-Gate-SeasonCoverage NOT COVERED

Restrictions Coverage

Restriction ID Restriction (paraphrased; canonical in restrictions.md) Test IDs Coverage
RESTRICT-HW-Jetson Compute device Jetson Orin Nano Super; 8 GB shared LPDDR5; ~6 GB residual after Tier 1 NFT-RES-LIM-Re1, NFT-RES-LIM-CPU, NFT-RES-LIM-Re2, all Tier-HW rows Covered (HW DEFERRED)
RESTRICT-HW-A40 Primary camera ViewPro A40; vendor protocol mandatory FT-P-011, FT-P-012, FT-P-013, NFT-PERF-L4 (zoom traversal floor) Covered (HW DEFERRED for L4)
RESTRICT-HW-Z40K Alternative camera ViewPro Z40K — system must remain compatible NOT COVERED at autopilot test level — verified by component-swap regression run on the Z40K HW NOT COVERED (see Uncovered Items § §4)
RESTRICT-HW-ThermalLater Thermal sensor may be added later; not assumed today implicit (no test depends on thermal) Covered by absence (negative assumption)
RESTRICT-HW-ZoomFloor 40× optical zoom traversal 12 s wall-clock NFT-PERF-L4 (asserts the ≤ 2 s ceiling that includes the physical floor) Covered (HW DEFERRED)
RESTRICT-Op-Altitude Flight altitude 6001000 m implicitly by every mission-trace fixture; no dedicated test Covered by fixture assumption
RESTRICT-Op-AllSeasons All four seasons in scope; winter-first-only rejected FT-P-002, FT-P-003, FT-P-004, FT-P-005, FT-P-006 — multi-season fixtures required Covered (all DEFERRED on multi-season fixtures)
RESTRICT-Op-AllTerrains Forest, open field, urban edges, mixed terrain same as RESTRICT-Op-AllSeasons Covered (DEFERRED)
RESTRICT-Op-IntermittentModem Modem operator/GS link intermittent NFT-RES-R4, FT-P-016 (O8 nominal session), NFT-SEC-O9/O10 Covered
RESTRICT-SW-JetsonResidualBudget Onboard inference path runs within 6 GB residual RAM NFT-RES-LIM-Re1 Covered (HW DEFERRED)
RESTRICT-SW-FP16 Models use FP16 precision (INT8 rejected for MVP) NOT COVERED at autopilot test level — pinned at the model-loading layer (Tier 1 in ../detections; Tier 2/3 in autopilot config) NOT COVERED (see Uncovered Items § §5)
RESTRICT-SW-NoCloudInference No cloud egress for inference NFT-SEC-CraftedFrame (process boundary), implicit by environment.md autopilot-e2e network having no egress Covered
RESTRICT-SW-GPUMutualExclusion Tier 1 + any local large model serialise on the Jetson GPU NFT-RES-LIM-GPU Covered (HW DEFERRED)
RESTRICT-SW-MissionSchemaShared Autopilot consumes shared mission-schema; cannot fork FT-P-016 (O8 — POST validates against schema), FT-P-024 (Mp1 — schema-validated pull) Covered (fixtures DEFERRED)
RESTRICT-Arch-Tier1External Tier 1 lives in ../detections; autopilot consumes FT-P-001 (D6), NFT-SEC-Tier1SchemaViolation, FT-N-001 (R2 — Tier 1 unreachable inhibits BIT) Covered
RESTRICT-Arch-MissionExternal Mission state from missions service; autopilot doesn't author FT-P-024, FT-P-025, FT-P-016 Covered (fixtures DEFERRED)
RESTRICT-Arch-MapInMissions Central area map in missions /mapobjects FT-P-024, FT-P-025, FT-P-026 (Mp5), NFT-RES-Mp2, NFT-RES-Mp4 Covered (fixtures DEFERRED)
RESTRICT-Arch-GPSDeniedExternal GPS coords from separate GPS-denied service; autopilot does NOT implement NOT COVERED at autopilot test level — verified at suite-e2e tier via the live GPS-denied service NOT COVERED at autopilot tier (covered at suite-e2e tier)
RESTRICT-Arch-OperatorUIExternal Operator browser UI owned by Ground Station; autopilot pushes data implicit by NOT testing any UI rendering; verified by operator-stream protocol assertions in FT-P-016, FT-P-017022 Covered by absence
RESTRICT-Arch-AnnotationTrainingExternal Annotation + training in ../annotations, ../ai-training; autopilot doesn't own NOT TESTABLE at autopilot blackbox tier — process boundary NOT TESTABLE (intentional scope exclusion)
RESTRICT-Rel-BITGate Pre-flight BIT MUST gate takeoff FT-P-023 (R1), FT-N-001 (R2), FT-N-002 (R3), FT-N-003 (Mp2) Covered
RESTRICT-Rel-LostLinkDeterministic Lost operator-link failsafe deterministic + bounded NFT-RES-R4 Covered
RESTRICT-Rel-AirframeLossRedImmediate Airframe MAVLink loss → health red immediately NFT-RES-R7 (red after retry exhaustion); a dedicated "immediate red on link loss" scenario MAY be desirable (currently rolled into R7) Partially Covered (see Uncovered Items § §6)
RESTRICT-Rel-BatteryThresholds Battery RTL + land-now triggers (override only via operator) NFT-RES-R5, NFT-RES-R6 Covered (fixtures DEFERRED)
RESTRICT-Rel-GeofenceSymmetric Geofence INCLUSION + EXCLUSION enforcement NFT-RES-R9 (both) Covered (fixture DEFERRED)
RESTRICT-Rel-OperatorCmdAuth Operator commands authenticated + signed + replay-protected NFT-SEC-O9, NFT-SEC-O10, FT-P-016 happy path Covered (Q9 DEFERRED)
RESTRICT-Rel-StorageBounded On-device storage bounded; full = takeoff blocker; mid-flight eviction policy FT-N-002 (R3 — BIT block), NFT-RES-LIM-Storage Covered
RESTRICT-Rel-NoSilentErrors No silent error swallowing every NFT-SEC-* scenario asserts a counter + log entry; every NFT-RES-* asserts a structured-log + health transition Covered
RESTRICT-Rel-ClockBound Wall-clock bound to GPS once locked, else NTP at boot NFT-RES-R8 Covered
RESTRICT-Rel-MavlinkConformance MAVLink command surface MUST conform to ArduPilot/PX4 SITL every MAVLink-emitting scenario runs through mavlink-sitl; a dedicated conformance suite is recommended Partially Covered (see Uncovered Items § §3)

Coverage Summary

Category Total Items Covered Partially Covered Not Covered Coverage % (counting Partially as 0.5)
Acceptance Criteria 47 43 1 3 (43 + 0.5×1) / 47 ≈ 92.6 %
Restrictions 30 25 2 3 (25 + 0.5×2) / 30 ≈ 86.7 %
Total 77 68 3 6 (68 + 1.5) / 77 ≈ 90.3 %

(Coverage here is "test scenario exists for the item", not "fixture has been acquired and the test currently passes". Fixture status is tracked in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md.)

Uncovered Items Analysis

§ Item Reason not covered Risk Mitigation
§1 AC-Scan-SweepCoverage (wide-area sweep covers planned route) The "covers the planned route" property is a path-coverage assertion best tested by component-level tests in the scan_controller component (geometry coverage) rather than at the blackbox level Medium — incorrect sweep pattern leaks observation gaps Componenet-test in scan_controller (added by /decompose test tasks); a Tier-E "did the camera point at every planned waypoint area for ≥ N seconds" scenario can be added if needed
§2 AC-Gate-SeasonCoverage / AC-Q-SeasonGates Per-season coverage gates depend on dataset acquisition owned by ../ai-training and per-season flight tests (Q13) High — model performance on un-evaluated seasons unknown Tracked as release-gate item; D3/D4/D5/D1 scenarios DEFERRED until each season's dataset lands
§3 AC-Gate-MavlinkSITLConformance / RESTRICT-Rel-MavlinkConformance A dedicated "every command in architecture.md §7.7 exercised against SITL" suite is recommended in addition to the implicit coverage by R-scenarios Medium — could miss a rarely-used command Add a NFT-MavlinkConformance suite during Step 9 (Decompose Tests) — explicit per-command SITL exercise
§4 RESTRICT-HW-Z40K (Z40K compatibility) Requires a second camera HW for the swap test Medium — could miss a A40-specific assumption Run the Tier-HW rows on Z40K as a post-MVP smoke step
§5 RESTRICT-SW-FP16 (model precision) Pinned at config + model-loading layer; not externally observable beyond perf/latency Low — incorrect precision would manifest as either L1 latency or D2 regression failure Add a startup log assertion: "Tier 2/3 models loaded with precision=FP16" via the SUT's structured boot log
§6 RESTRICT-Rel-AirframeLossRedImmediate (immediate red on airframe link loss) NFT-RES-R7 asserts red after retry exhaustion; the "immediate red on link loss" path (no retries) is implicit LowMedium — depends on timing window between "link silent" and "considered lost" Add NFT-RES-AirframeImmediate scenario in Step 9 (Decompose Tests) — sustained zero MAVLink traffic for N seconds → immediate health red (no retry phase)

Scenario index by file

File Scenarios Read-back ID prefix
blackbox-tests.md 26 positive + 4 negative FT-P-001..FT-P-026, FT-N-001..FT-N-004
performance-tests.md 9 latency + 3 rate NFT-PERF-L1..L9, NFT-PERF-T1..T3
resilience-tests.md 6 R-rows + 2 Mp-rows NFT-RES-R4..R9, NFT-RES-Mp2, NFT-RES-Mp4
security-tests.md 10 SEC rows NFT-SEC-O9, NFT-SEC-O10, NFT-SEC-CraftedFrame, NFT-SEC-OversizeCrop, NFT-SEC-VlmSchemaViolation, NFT-SEC-VlmFreeFormText, NFT-SEC-IpcPeerAuth, NFT-SEC-Tier1SchemaViolation, NFT-SEC-MavlinkUnsigned, NFT-SEC-HealthExposesSecurity
resource-limit-tests.md 6 LIM rows NFT-RES-LIM-Re1, Re2, Storage, CPU, GPU, FileHandles

Total scenarios authored: 66.

Open dependencies summary

Dependency Affects (scenario count) Tracking
<DEFERRED: gimbal.csv + telemetry.csv pairs> FT-P-007/008/009/010, NFT-PERF-L6/L7 Leftover row "Gimbal CSV pairs"
<DEFERRED: multi-season annotated datasets (concealed, footpath, new classes, existing baseline)> FT-P-002/003/004/005/006 Leftover row "Concealed position image set + Footpath sequences + new-class eval set"
<DEFERRED: SITL or HW capture for L4/L5/L8> NFT-PERF-L4/L5/L8 Leftover row "MAVLink SITL traces" + camera frame sequences with zoom-band labelling
<DEFERRED: missions API mock fixtures (Mp1/Mp3/Mp4)> FT-P-024/025, NFT-RES-Mp4 Leftover row "Mock central area-map service responses"
<DEFERRED: vlm-io-pairs (real recordings)> NFT-PERF-L3, FT-P-015 (S5), NFT-SEC-VlmSchemaViolation real-recording variant Leftover row "Deep-analysis I/O pairs"
<DEFERRED: operator-envelopes (Q9-blocked)> NFT-SEC-O9/O10, full semantics of FT-P-016 Leftover row "Operator-command envelopes" + Q9
<DEFERRED: HW Jetson Orin Nano Super OR benchmarked replay> every Tier-HW scenario (L1, L2, L4, L5, L8, Re1, Re2, CPU, GPU) Leftover does not enumerate HW directly — tracked via the project-level Acceptance Gate
<DEFERRED: Q6 — MAVLink-2 signing decision> NFT-SEC-MavlinkUnsigned architecture.md §8 Q6
<DEFERRED: Q8 — MapObjects conflict resolution rule> FT-P-026 (Mp5) architecture.md §8 Q8
<DEFERRED: Q9 — operator-command auth scheme> NFT-SEC-O9/O10 full semantics architecture.md §8 Q9
<DEFERRED: Q13 — per-season gates> AC-Gate-SeasonCoverage architecture.md §8 Q13
<DEFERRED: Q14 — movement-detection classical vs learned-CV> FT-P-010 (M4) architecture.md §8 Q14

When any of the above dependencies resolves, the corresponding leftover entry is replayed (per tracker.mdc → Leftovers Mechanism) and the affected scenarios' Test status lines move from DEFERRED to READY in the source files.

Phase 3 — Test Data & Expected Results Validation Gate Outcome

Recorded by /test-spec Phase 3 on 2026-05-19.

Mechanical gate

Phase 3's mechanical contract is: every scenario MUST have either (a) a provided input + provided quantifiable expected result, OR (b) a behavioural trigger + observable behaviour + quantifiable pass/fail criterion. Scenarios that fail this contract are normally REMOVED. The 75 % final-coverage check then applies.

Shape Total scenarios Quantifiable comparison declared Input/trigger fully provided today Input/trigger DEFERRED (release-gate item)
Input/output 56 56 16 40
Behavioural 10 10 10 0
Total 66 66 (100 %) 26 (39 %) 40 (61 %)

Every scenario carries a Comparison method drawn from .cursor/skills/test-spec/templates/expected-results.md (exact, numeric_tolerance, threshold_min/max, range, regex, substring, set_contains, json_diff, schema_match, file_reference) — none of the 66 fail the quantifiability check.

Project-policy override (recorded 2026-05-19)

The Phase 3 75 % fixture-coverage gate is intentionally overridden for this project, per the decision recorded in _docs/00_problem/input_data/expected_results/results_report.md → "Decision (project policy)":

rather than block on the Phase 3 75 % gate, each deferred row is now registered with a structured <DEFERRED:> tag and surfaces in data_parameters.md → "Gaps that block /test-spec downstream". /test-spec Phase 2 can author scenarios for all 56 rows; deferred rows become release-gate items, not development-gate items. The acceptance_criteria.md → "Acceptance Gates (project-level)" hardware/replay benchmark requirement is preserved as the hard release gate — that one is NOT being deferred.

Under this policy:

  • No scenarios are removed by Phase 3. Every authored scenario remains in the spec; its Test status line in the source file (blackbox-tests.md, performance-tests.md, etc.) carries either READY or DEFERRED — <reason>.
  • Final coverage is computed at the scenario level, not the fixture level. Per the matrix above:
    • AC coverage: 92.6 % (43 + 0.5 × 1 / 47)
    • RESTRICT coverage: 86.7 % (25 + 0.5 × 2 / 30)
    • Total: 90.3 % — well above the 75 % gate.
  • Fixture acquisition is tracked as a release-gate concern in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md; on every /autodev invocation the leftover-replay step re-evaluates whether any deferred fixture has landed and moves the affected scenarios from DEFERRED to READY.
  • The project-level Acceptance Gate (acceptance_criteria.md → "Acceptance Gates" — HW/replay benchmark, per-season coverage, MAVLink SITL conformance) remains a hard release blocker. The override does NOT relax that gate.

Phase 3 verdict

PASSED — scenario-level coverage 90.3 % ≥ 75 % gate; every scenario has a quantifiable comparison; deferred-fixture tracking handled via leftovers replay; no scenarios removed.

Phase 4 — Test Runner Script Generation: SKIPPED in this invocation

Per phases/04-runner-scripts.md → "Skip condition":

If this skill was invoked from the /plan skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.

This invocation is greenfield Step 5 (Test Spec) and no source code exists yet — the _docs/02_document/components/*/description.md files describe 13 Rust components that the Implement step (Step 7) will create. Producing runner scripts here would write scripts/run-tests.sh and scripts/run-performance-tests.sh against a binary that does not yet exist.

Handoff to Step 6 (Decompose): the decomposer MUST create at least two task specs covering the test runner scripts:

  1. A task to create scripts/run-tests.sh (Tier B/E orchestration; calls docker compose -f e2e/docker-compose.autopilot-e2e.yml up and runs cargo test --release --test scenarios in e2e-consumer).
  2. A task to create scripts/run-performance-tests.sh (Tier HW orchestration; per environment.md → Hardware Execution Matrix).

Both tasks should be tagged as part of the test-infrastructure decomposition (Step 1t of decompose tests-only mode) so they land before any Tier-B test scenarios are implemented.