mirror of https://github.com/azaion/autopilot.git synced 2026-06-21 19:11:10 +00:00

Files

T

Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout

Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-19 11:02:01 +03:00

24 KiB

Raw Permalink Blame History

Traceability Matrix

Authored by /test-spec Phase 2 (2026-05-19).

This matrix maps every acceptance-criterion bullet from _docs/00_problem/acceptance_criteria.md and every restriction bullet from _docs/00_problem/restrictions.md to the test scenarios that exercise them. Coverage is scenario-level, not fixture-level — scenarios marked DEFERRED in the underlying *-tests.md files still count as covered for the purpose of "the test is specified"; the fixture-acquisition status is tracked separately in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md.

Acceptance Criteria Coverage

AC ID	Acceptance criterion (paraphrased; canonical text in `acceptance_criteria.md`)	Test IDs	Coverage
AC-L1	Tier 1 per-frame ≤ 100 ms at 1280 px on deployed compute	NFT-PERF-L1, FT-P-001 (functional contract)	Covered
AC-L2	Tier 2 per-ROI ≤ 200 ms	NFT-PERF-L2	Covered
AC-L3	Tier 3 per-ROI ≤ 5 s (when enabled)	NFT-PERF-L3	Covered (fixture DEFERRED)
AC-L4	Camera zoom transition (medium→high) ≤ 2 s	NFT-PERF-L4	Covered (fixture DEFERRED)
AC-L5	Decision-to-movement latency ≤ 500 ms	NFT-PERF-L5	Covered (fixture DEFERRED)
AC-L6	Movement candidate enqueue ≤ 1 s (wide sweep)	NFT-PERF-L6	Covered (fixture DEFERRED — gimbal.csv)
AC-L7	Movement candidate enqueue ≤ 1.5 s (zoomed-in)	NFT-PERF-L7	Covered (fixture DEFERRED — zoomed gimbal.csv)
AC-L8	Zoom-out → zoom-in transition ≤ 2 s	NFT-PERF-L8	Covered (fixture DEFERRED)
AC-L9	Operator command → outbound action ≤ 500 ms	NFT-PERF-L9	Covered (fixture DEFERRED for signed envelopes; placeholder usable today)
AC-T1	POI rate surfaced to operator ≤ 5 / min (hard cap)	NFT-PERF-T1	Covered
AC-T2	Position telemetry rate ∈ [1, 10] Hz (target 10)	NFT-PERF-T2	Covered (fixture DEFERRED — MAVLink replay)
AC-T3	Frame-rate floor < 10 fps for ≥ 5 s → suppress zoom-in AND health yellow	NFT-PERF-T3	Covered
AC-D1	New classes per-class P ≥ 0.80 AND R ≥ 0.80	FT-P-003	Covered (fixture DEFERRED — annotated eval set)
AC-D2	Existing-class regression Δ ≤ ± 0.02 vs baseline	FT-P-002	Covered (baseline JSON DEFERRED; visual fixtures present)
AC-D3	Concealed-position recall ≥ 0.60 (initial gate)	FT-P-004	Covered (fixture DEFERRED — multi-season set)
AC-D4	Concealed-position precision ≥ 0.20 (initial gate)	FT-P-005	Covered (fixture DEFERRED — same as D3)
AC-D5	Footpath recall ≥ 0.70	FT-P-006	Covered (fixture DEFERRED — polyline-annotated set)
AC-D6	Tier-1 normalised-box contract conformance (class ids 0..18, coords ∈ [0,1])	FT-P-001, NFT-SEC-Tier1SchemaViolation	Covered
AC-Mov-EnqueueWideSweep	Small movers during wide sweep MUST be detected and enqueued ≤ 1 s	FT-P-007 (M1 behavioural), NFT-PERF-L6 (latency dimension)	Covered
AC-Mov-ContinueDuringZoom	Movement detection continues during zoomed-in inspection	FT-P-008 (M2), NFT-PERF-L7	Covered
AC-Mov-StableObjectsRejected	Stable objects (trees, houses, roads) NOT treated as moving solely due to camera platform motion	FT-P-007 (M1 — set_contains explicitly excludes tree row)	Covered
AC-Mov-FPBudgetHonoured	Configurable per-zoom-band FP budget honoured	FT-P-009 (M3), FT-P-010 (M4 — Q14)	Covered (M4 fixture DEFERRED)
AC-Scan-SweepCoverage	Wide-area sweep covers planned route at wide/light/medium zoom	implicitly by FT-P-011 setup + scenario-runner BIT scenarios; NOT covered as a distinct test	NOT COVERED (see Uncovered Items § §1)
AC-Scan-SweepToZoomTransition	Sweep → detailed inspection transition ≤ 2 s	FT-P-011, NFT-PERF-L8	Covered
AC-Scan-TargetLock	Lock + pan + 2 s deep-analysis hold + per-POI timeout default 5 s	FT-P-015 (S5 — three cases)	Covered (fixture DEFERRED — vlm-mock with realistic timing)
AC-Scan-TargetFollowCentre	Target-follow within centre 25 % of frame	FT-P-013	Covered (fixture DEFERRED)
AC-Scan-GimbalLatency	Gimbal decision-to-movement ≤ 500 ms (links to L5)	NFT-PERF-L5	Covered
AC-Scan-POIOrdering	POI queue ordered by `confidence × proximity × age_factor`	FT-P-014	Covered
AC-Op-DecisionWindowScale	Decision window scales 30 s @ 0.40 → 120 s @ 1.00 linearly	FT-P-017 (O1), FT-P-018 (O2), FT-P-019 (O3), FT-N-004 (O4 below-threshold)	Covered
AC-Op-DeclinePersistsIgnored	Operator-decline → persistent ignored-item per (MGRS, class_group)	FT-P-020 (O5)	Covered
AC-Op-TimeoutForget	Timeout (no response) MUST NOT create ignored-item	FT-P-022 (O7)	Covered
AC-Op-IgnoredSuppress	New detection matching existing ignored-item NOT surfaced	FT-P-021 (O6)	Covered
AC-Op-ConfirmWaypointFollow	Operator-confirm → middle waypoint POST + target-follow mode	FT-P-016 (O8)	Covered (Q9 envelope DEFERRED; happy path uses placeholder)
AC-Op-ReplayUnsignedRejected	Replayed or unsigned operator command REJECTED with logged security WARN; state UNCHANGED	NFT-SEC-O9, NFT-SEC-O10	Covered (Q9 DEFERRED for full semantics)
AC-Rel-BITGatesTakeoff	BIT MUST pass before takeoff permitted	FT-P-023 (R1), FT-N-001 (R2), FT-N-002 (R3), FT-N-003 (Mp2 BIT gate)	Covered
AC-Rel-LostLinkRTL30s	Lost operator/GS link → known mission-safe outcome within configurable grace (default 30 s → RTL)	NFT-RES-R4	Covered
AC-Rel-AirframeLinkRedImmediate	Airframe command link loss → health red immediately; defer to airframe failsafe	NFT-RES-R7 (extension), implicitly by airframe-link health observation in NFT-RES-R5/R6	Covered
AC-Rel-BatteryFloors	Battery ≤ RTL floor → RTL; battery ≤ hard floor → land-now; operator override only	NFT-RES-R5, NFT-RES-R6	Covered (fixture DEFERRED)
AC-Rel-MavlinkExhaustionRed	MAVLink command exhaustion → airframe-link health red	NFT-RES-R7	Covered (fixture DEFERRED)
AC-Rel-DriftYellow	Wall-clock drift > 200 ms → health yellow	NFT-RES-R8	Covered
AC-Rel-GeofenceSymmetric	Geofence INCLUSION + EXCLUSION violations → waypoint refusal + RTL	NFT-RES-R9 (both cases)	Covered (fixture DEFERRED)
AC-Res-RSS6GB	Combined RSS on Jetson (excluding Tier 1) ≤ 6 GB sustained	NFT-RES-LIM-Re1, NFT-RES-LIM-CPU (CPU dimension), NFT-RES-LIM-FileHandles (FD dimension)	Covered (HW DEFERRED)
AC-Res-Tier1NonDegradation	Tier 1 per-frame latency Δ ± 5 ms under concurrent autopilot workload	NFT-RES-LIM-Re2, NFT-RES-LIM-GPU (GPU mutual exclusion)	Covered (HW DEFERRED)
AC-Mp-PreFlightPull30s	Pre-flight map pull ≤ 30 s; cache-fallback only with explicit operator ack	FT-P-024 (Mp1), FT-N-003 (Mp2 cache-fallback gate), NFT-RES-Mp2 (timing+recovery)	Covered
AC-Mp-PostFlightPush120s	Post-flight pass diff push ≤ 120 s; failure → persist + bounded retry	FT-P-025 (Mp3), NFT-RES-Mp4	Covered (fixture DEFERRED)
AC-Gate-HWBench	HW/replay benchmark suite MUST pass before product implementation	every Tier-HW row in environment.md `Hardware Execution Matrix` (filled by `hardware-assessment.md`)	Covered as a gate, executed at the Acceptance-Gates milestone
AC-Gate-SeasonCoverage	Per-season dataset coverage demonstrated before MVP sign-off (Q13)	NOT COVERED at blackbox test level — gated on annotation campaign and the `../ai-training` repo	NOT COVERED (see Uncovered Items § §2)
AC-Gate-MavlinkSITLConformance	MAVLink command surface MUST pass SITL conformance	implicitly by FT-P-016 (O8 confirms waypoint POST through SITL) + NFT-RES-R4/R5/R6/R7/R9 (all run through SITL); a dedicated conformance suite is recommended	Partially Covered (see Uncovered Items § §3)
AC-Q-Mov-Zoomed-FPRate	Movement detection FP rate at zoomed-in inspection (Q14)	FT-P-010 (M4)	Covered (Q14 DEFERRED)
AC-Q-MapObjectsConflict	MapObjects conflict resolution rule (Q8)	FT-P-026 (Mp5)	Covered (Q8 DEFERRED)
AC-Q-OperatorCmdAuth	Operator-command authentication conformance (Q9)	NFT-SEC-O9, NFT-SEC-O10, FT-P-016 (O8)	Covered (Q9 DEFERRED — placeholders used today)
AC-Q-MAVLinkSigning	Airframe MAVLink-2 message signing (Q6)	NFT-SEC-MavlinkUnsigned	Covered (Q6 DEFERRED)
AC-Q-SeasonGates	Per-season flight-test gates (Q13)	NOT COVERED — same as AC-Gate-SeasonCoverage	NOT COVERED

Restrictions Coverage

Restriction ID	Restriction (paraphrased; canonical in `restrictions.md`)	Test IDs	Coverage
RESTRICT-HW-Jetson	Compute device Jetson Orin Nano Super; 8 GB shared LPDDR5; ~6 GB residual after Tier 1	NFT-RES-LIM-Re1, NFT-RES-LIM-CPU, NFT-RES-LIM-Re2, all Tier-HW rows	Covered (HW DEFERRED)
RESTRICT-HW-A40	Primary camera ViewPro A40; vendor protocol mandatory	FT-P-011, FT-P-012, FT-P-013, NFT-PERF-L4 (zoom traversal floor)	Covered (HW DEFERRED for L4)
RESTRICT-HW-Z40K	Alternative camera ViewPro Z40K — system must remain compatible	NOT COVERED at autopilot test level — verified by component-swap regression run on the Z40K HW	NOT COVERED (see Uncovered Items § §4)
RESTRICT-HW-ThermalLater	Thermal sensor may be added later; not assumed today	implicit (no test depends on thermal)	Covered by absence (negative assumption)
RESTRICT-HW-ZoomFloor	40× optical zoom traversal 1–2 s wall-clock	NFT-PERF-L4 (asserts the ≤ 2 s ceiling that includes the physical floor)	Covered (HW DEFERRED)
RESTRICT-Op-Altitude	Flight altitude 600–1000 m	implicitly by every mission-trace fixture; no dedicated test	Covered by fixture assumption
RESTRICT-Op-AllSeasons	All four seasons in scope; winter-first-only rejected	FT-P-002, FT-P-003, FT-P-004, FT-P-005, FT-P-006 — multi-season fixtures required	Covered (all DEFERRED on multi-season fixtures)
RESTRICT-Op-AllTerrains	Forest, open field, urban edges, mixed terrain	same as RESTRICT-Op-AllSeasons	Covered (DEFERRED)
RESTRICT-Op-IntermittentModem	Modem operator/GS link intermittent	NFT-RES-R4, FT-P-016 (O8 nominal session), NFT-SEC-O9/O10	Covered
RESTRICT-SW-JetsonResidualBudget	Onboard inference path runs within 6 GB residual RAM	NFT-RES-LIM-Re1	Covered (HW DEFERRED)
RESTRICT-SW-FP16	Models use FP16 precision (INT8 rejected for MVP)	NOT COVERED at autopilot test level — pinned at the model-loading layer (Tier 1 in `../detections`; Tier 2/3 in autopilot config)	NOT COVERED (see Uncovered Items § §5)
RESTRICT-SW-NoCloudInference	No cloud egress for inference	NFT-SEC-CraftedFrame (process boundary), implicit by environment.md `autopilot-e2e` network having no egress	Covered
RESTRICT-SW-GPUMutualExclusion	Tier 1 + any local large model serialise on the Jetson GPU	NFT-RES-LIM-GPU	Covered (HW DEFERRED)
RESTRICT-SW-MissionSchemaShared	Autopilot consumes shared `mission-schema`; cannot fork	FT-P-016 (O8 — POST validates against schema), FT-P-024 (Mp1 — schema-validated pull)	Covered (fixtures DEFERRED)
RESTRICT-Arch-Tier1External	Tier 1 lives in `../detections`; autopilot consumes	FT-P-001 (D6), NFT-SEC-Tier1SchemaViolation, FT-N-001 (R2 — Tier 1 unreachable inhibits BIT)	Covered
RESTRICT-Arch-MissionExternal	Mission state from `missions` service; autopilot doesn't author	FT-P-024, FT-P-025, FT-P-016	Covered (fixtures DEFERRED)
RESTRICT-Arch-MapInMissions	Central area map in `missions /mapobjects`	FT-P-024, FT-P-025, FT-P-026 (Mp5), NFT-RES-Mp2, NFT-RES-Mp4	Covered (fixtures DEFERRED)
RESTRICT-Arch-GPSDeniedExternal	GPS coords from separate GPS-denied service; autopilot does NOT implement	NOT COVERED at autopilot test level — verified at suite-e2e tier via the live GPS-denied service	NOT COVERED at autopilot tier (covered at suite-e2e tier)
RESTRICT-Arch-OperatorUIExternal	Operator browser UI owned by Ground Station; autopilot pushes data	implicit by NOT testing any UI rendering; verified by operator-stream protocol assertions in FT-P-016, FT-P-017–022	Covered by absence
RESTRICT-Arch-AnnotationTrainingExternal	Annotation + training in `../annotations`, `../ai-training`; autopilot doesn't own	NOT TESTABLE at autopilot blackbox tier — process boundary	NOT TESTABLE (intentional scope exclusion)
RESTRICT-Rel-BITGate	Pre-flight BIT MUST gate takeoff	FT-P-023 (R1), FT-N-001 (R2), FT-N-002 (R3), FT-N-003 (Mp2)	Covered
RESTRICT-Rel-LostLinkDeterministic	Lost operator-link failsafe deterministic + bounded	NFT-RES-R4	Covered
RESTRICT-Rel-AirframeLossRedImmediate	Airframe MAVLink loss → health red immediately	NFT-RES-R7 (red after retry exhaustion); a dedicated "immediate red on link loss" scenario MAY be desirable (currently rolled into R7)	Partially Covered (see Uncovered Items § §6)
RESTRICT-Rel-BatteryThresholds	Battery RTL + land-now triggers (override only via operator)	NFT-RES-R5, NFT-RES-R6	Covered (fixtures DEFERRED)
RESTRICT-Rel-GeofenceSymmetric	Geofence INCLUSION + EXCLUSION enforcement	NFT-RES-R9 (both)	Covered (fixture DEFERRED)
RESTRICT-Rel-OperatorCmdAuth	Operator commands authenticated + signed + replay-protected	NFT-SEC-O9, NFT-SEC-O10, FT-P-016 happy path	Covered (Q9 DEFERRED)
RESTRICT-Rel-StorageBounded	On-device storage bounded; full = takeoff blocker; mid-flight eviction policy	FT-N-002 (R3 — BIT block), NFT-RES-LIM-Storage	Covered
RESTRICT-Rel-NoSilentErrors	No silent error swallowing	every NFT-SEC-* scenario asserts a counter + log entry; every NFT-RES-* asserts a structured-log + health transition	Covered
RESTRICT-Rel-ClockBound	Wall-clock bound to GPS once locked, else NTP at boot	NFT-RES-R8	Covered
RESTRICT-Rel-MavlinkConformance	MAVLink command surface MUST conform to ArduPilot/PX4 SITL	every MAVLink-emitting scenario runs through `mavlink-sitl`; a dedicated conformance suite is recommended	Partially Covered (see Uncovered Items § §3)

Coverage Summary

Category	Total Items	Covered	Partially Covered	Not Covered	Coverage % (counting Partially as 0.5)
Acceptance Criteria	47	43	1	3	(43 + 0.5×1) / 47 ≈ 92.6 %
Restrictions	30	25	2	3	(25 + 0.5×2) / 30 ≈ 86.7 %
Total	77	68	3	6	(68 + 1.5) / 77 ≈ 90.3 %

(Coverage here is "test scenario exists for the item", not "fixture has been acquired and the test currently passes". Fixture status is tracked in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md.)

Uncovered Items Analysis

§	Item	Reason not covered	Risk	Mitigation
§1	AC-Scan-SweepCoverage (wide-area sweep covers planned route)	The "covers the planned route" property is a path-coverage assertion best tested by component-level tests in the `scan_controller` component (geometry coverage) rather than at the blackbox level	Medium — incorrect sweep pattern leaks observation gaps	Componenet-test in `scan_controller` (added by `/decompose` test tasks); a Tier-E "did the camera point at every planned waypoint area for ≥ N seconds" scenario can be added if needed
§2	AC-Gate-SeasonCoverage / AC-Q-SeasonGates	Per-season coverage gates depend on dataset acquisition owned by `../ai-training` and per-season flight tests (Q13)	High — model performance on un-evaluated seasons unknown	Tracked as release-gate item; D3/D4/D5/D1 scenarios DEFERRED until each season's dataset lands
§3	AC-Gate-MavlinkSITLConformance / RESTRICT-Rel-MavlinkConformance	A dedicated "every command in `architecture.md §7.7` exercised against SITL" suite is recommended in addition to the implicit coverage by R-scenarios	Medium — could miss a rarely-used command	Add a `NFT-MavlinkConformance` suite during Step 9 (Decompose Tests) — explicit per-command SITL exercise
§4	RESTRICT-HW-Z40K (Z40K compatibility)	Requires a second camera HW for the swap test	Medium — could miss a A40-specific assumption	Run the Tier-HW rows on Z40K as a post-MVP smoke step
§5	RESTRICT-SW-FP16 (model precision)	Pinned at config + model-loading layer; not externally observable beyond perf/latency	Low — incorrect precision would manifest as either L1 latency or D2 regression failure	Add a startup log assertion: "Tier 2/3 models loaded with precision=FP16" via the SUT's structured boot log
§6	RESTRICT-Rel-AirframeLossRedImmediate (immediate red on airframe link loss)	NFT-RES-R7 asserts red after retry exhaustion; the "immediate red on link loss" path (no retries) is implicit	Low–Medium — depends on timing window between "link silent" and "considered lost"	Add `NFT-RES-AirframeImmediate` scenario in Step 9 (Decompose Tests) — sustained zero MAVLink traffic for N seconds → immediate health red (no retry phase)

Scenario index by file

File	Scenarios	Read-back ID prefix
`blackbox-tests.md`	26 positive + 4 negative	FT-P-001..FT-P-026, FT-N-001..FT-N-004
`performance-tests.md`	9 latency + 3 rate	NFT-PERF-L1..L9, NFT-PERF-T1..T3
`resilience-tests.md`	6 R-rows + 2 Mp-rows	NFT-RES-R4..R9, NFT-RES-Mp2, NFT-RES-Mp4
`security-tests.md`	10 SEC rows	NFT-SEC-O9, NFT-SEC-O10, NFT-SEC-CraftedFrame, NFT-SEC-OversizeCrop, NFT-SEC-VlmSchemaViolation, NFT-SEC-VlmFreeFormText, NFT-SEC-IpcPeerAuth, NFT-SEC-Tier1SchemaViolation, NFT-SEC-MavlinkUnsigned, NFT-SEC-HealthExposesSecurity
`resource-limit-tests.md`	6 LIM rows	NFT-RES-LIM-Re1, Re2, Storage, CPU, GPU, FileHandles

Total scenarios authored: 66.

Open dependencies summary

Dependency	Affects (scenario count)	Tracking
`<DEFERRED: gimbal.csv + telemetry.csv pairs>`	FT-P-007/008/009/010, NFT-PERF-L6/L7	Leftover row "Gimbal CSV pairs"
`<DEFERRED: multi-season annotated datasets (concealed, footpath, new classes, existing baseline)>`	FT-P-002/003/004/005/006	Leftover row "Concealed position image set + Footpath sequences + new-class eval set"
`<DEFERRED: SITL or HW capture for L4/L5/L8>`	NFT-PERF-L4/L5/L8	Leftover row "MAVLink SITL traces" + camera frame sequences with zoom-band labelling
`<DEFERRED: missions API mock fixtures (Mp1/Mp3/Mp4)>`	FT-P-024/025, NFT-RES-Mp4	Leftover row "Mock central area-map service responses"
`<DEFERRED: vlm-io-pairs (real recordings)>`	NFT-PERF-L3, FT-P-015 (S5), NFT-SEC-VlmSchemaViolation real-recording variant	Leftover row "Deep-analysis I/O pairs"
`<DEFERRED: operator-envelopes (Q9-blocked)>`	NFT-SEC-O9/O10, full semantics of FT-P-016	Leftover row "Operator-command envelopes" + Q9
`<DEFERRED: HW Jetson Orin Nano Super OR benchmarked replay>`	every Tier-HW scenario (L1, L2, L4, L5, L8, Re1, Re2, CPU, GPU)	Leftover does not enumerate HW directly — tracked via the project-level Acceptance Gate
`<DEFERRED: Q6 — MAVLink-2 signing decision>`	NFT-SEC-MavlinkUnsigned	architecture.md §8 Q6
`<DEFERRED: Q8 — MapObjects conflict resolution rule>`	FT-P-026 (Mp5)	architecture.md §8 Q8
`<DEFERRED: Q9 — operator-command auth scheme>`	NFT-SEC-O9/O10 full semantics	architecture.md §8 Q9
`<DEFERRED: Q13 — per-season gates>`	AC-Gate-SeasonCoverage	architecture.md §8 Q13
`<DEFERRED: Q14 — movement-detection classical vs learned-CV>`	FT-P-010 (M4)	architecture.md §8 Q14

When any of the above dependencies resolves, the corresponding leftover entry is replayed (per tracker.mdc → Leftovers Mechanism) and the affected scenarios' Test status lines move from DEFERRED to READY in the source files.

Phase 3 — Test Data & Expected Results Validation Gate Outcome

Recorded by /test-spec Phase 3 on 2026-05-19.

Mechanical gate

Phase 3's mechanical contract is: every scenario MUST have either (a) a provided input + provided quantifiable expected result, OR (b) a behavioural trigger + observable behaviour + quantifiable pass/fail criterion. Scenarios that fail this contract are normally REMOVED. The 75 % final-coverage check then applies.

Shape	Total scenarios	Quantifiable comparison declared	Input/trigger fully provided today	Input/trigger DEFERRED (release-gate item)
Input/output	56	56	16	40
Behavioural	10	10	10	0
Total	66	66 (100 %)	26 (39 %)	40 (61 %)

Every scenario carries a Comparison method drawn from .cursor/skills/test-spec/templates/expected-results.md (exact, numeric_tolerance, threshold_min/max, range, regex, substring, set_contains, json_diff, schema_match, file_reference) — none of the 66 fail the quantifiability check.

Project-policy override (recorded 2026-05-19)

The Phase 3 75 % fixture-coverage gate is intentionally overridden for this project, per the decision recorded in _docs/00_problem/input_data/expected_results/results_report.md → "Decision (project policy)":

rather than block on the Phase 3 75 % gate, each deferred row is now registered with a structured <DEFERRED:> tag and surfaces in data_parameters.md → "Gaps that block /test-spec downstream". /test-spec Phase 2 can author scenarios for all 56 rows; deferred rows become release-gate items, not development-gate items. The acceptance_criteria.md → "Acceptance Gates (project-level)" hardware/replay benchmark requirement is preserved as the hard release gate — that one is NOT being deferred.

Under this policy:

No scenarios are removed by Phase 3. Every authored scenario remains in the spec; its Test status line in the source file (blackbox-tests.md, performance-tests.md, etc.) carries either READY or DEFERRED — <reason>.
Final coverage is computed at the scenario level, not the fixture level. Per the matrix above:
- AC coverage: 92.6 % (43 + 0.5 × 1 / 47)
- RESTRICT coverage: 86.7 % (25 + 0.5 × 2 / 30)
- Total: 90.3 % — well above the 75 % gate.
Fixture acquisition is tracked as a release-gate concern in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md; on every /autodev invocation the leftover-replay step re-evaluates whether any deferred fixture has landed and moves the affected scenarios from DEFERRED to READY.
The project-level Acceptance Gate (acceptance_criteria.md → "Acceptance Gates" — HW/replay benchmark, per-season coverage, MAVLink SITL conformance) remains a hard release blocker. The override does NOT relax that gate.

Phase 3 verdict

PASSED — scenario-level coverage 90.3 % ≥ 75 % gate; every scenario has a quantifiable comparison; deferred-fixture tracking handled via leftovers replay; no scenarios removed.

Phase 4 — Test Runner Script Generation: SKIPPED in this invocation

Per phases/04-runner-scripts.md → "Skip condition":

If this skill was invoked from the /plan skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.

This invocation is greenfield Step 5 (Test Spec) and no source code exists yet — the _docs/02_document/components/*/description.md files describe 13 Rust components that the Implement step (Step 7) will create. Producing runner scripts here would write scripts/run-tests.sh and scripts/run-performance-tests.sh against a binary that does not yet exist.

Handoff to Step 6 (Decompose): the decomposer MUST create at least two task specs covering the test runner scripts:

A task to create scripts/run-tests.sh (Tier B/E orchestration; calls docker compose -f e2e/docker-compose.autopilot-e2e.yml up and runs cargo test --release --test scenarios in e2e-consumer).
A task to create scripts/run-performance-tests.sh (Tier HW orchestration; per environment.md → Hardware Execution Matrix).

Both tasks should be tagged as part of the test-infrastructure decomposition (Step 1t of decompose tests-only mode) so they land before any Tier-B test scenarios are implemented.

24 KiB Raw Permalink Blame History Unescape Escape