Sync .cursor from suite (autodev orchestrator + monorepo skills)

2026-04-22 11:06:33 +00:00 · 2026-04-18 22:04:23 +03:00
parent d883fdb3cc
commit 14a81b0fcc
60 changed files with 4232 additions and 1728 deletions
@@ -35,14 +35,19 @@ Analyze input data completeness and produce detailed black-box test specificatio

 ## Context Resolution

-Fixed paths — no mode detection needed:
+Fixed paths:

 - PROBLEM_DIR: `_docs/00_problem/`
 - SOLUTION_DIR: `_docs/01_solution/`
 - DOCUMENT_DIR: `_docs/02_document/`
 - TESTS_OUTPUT_DIR: `_docs/02_document/tests/`

-Announce the resolved paths to the user before proceeding.
+Announce the resolved paths and the detected invocation mode (below) to the user before proceeding.
+
+### Invocation Modes
+
+- **full** (default): runs all 4 phases against the whole `PROBLEM_DIR` + `DOCUMENT_DIR`. Used in greenfield Plan Step 1 and existing-code Step 3.
+- **cycle-update**: runs only a scoped refresh of the existing test-spec artifacts against the current feature cycle's completed tasks. Used by the existing-code flow's per-cycle sync step. See `modes/cycle-update.md` for the narrowed workflow.

 ## Input Specification

@@ -62,7 +67,7 @@ Every input data item MUST have a corresponding expected result that defines wha

 Expected results live inside `_docs/00_problem/input_data/` in one or both of:

-1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md`
+1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `templates/expected-results.md`

 2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file

@@ -77,7 +82,8 @@ input_data/
 └── data_parameters.md
 ```

-**Quantifiability requirements** (see template for full format and examples):
+**Quantifiability requirements** (see `templates/expected-results.md` for full format and examples):
+
 - Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`)
 - Structured data: exact JSON/CSV values, or a reference file in `expected_results/`
 - Counts: exact counts (e.g., "3 detections", "0 errors")
@@ -98,7 +104,7 @@ input_data/
 1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
 2. `restrictions.md` exists and is non-empty — **STOP if missing**
 3. `input_data/` exists and contains at least one file — **STOP if missing**
-4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."*
+4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `templates/expected-results.md` as the format reference."*
 5. `problem.md` exists and is non-empty — **STOP if missing**
 6. `solution.md` exists and is non-empty — **STOP if missing**
 7. Create TESTS_OUTPUT_DIR if it does not exist
@@ -136,6 +142,7 @@ TESTS_OUTPUT_DIR/
 | Phase 3 | Updated test data spec (if data added) | `test-data.md` |
 | Phase 3 | Updated test files (if tests removed) | respective test file |
 | Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
+| Hardware Assessment | Test Execution section | `environment.md` (updated) |
 | Phase 4 | Test runner script | `scripts/run-tests.sh` |
 | Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` |

@@ -150,339 +157,44 @@ If TESTS_OUTPUT_DIR already contains files:

 ## Progress Tracking

-At the start of execution, create a TodoWrite with all four phases. Update status as each phase completes.
+At the start of execution, create a TodoWrite with all four phases (plus the hardware assessment between Phase 3 and Phase 4). Update status as each phase completes.

 ## Workflow

-### Phase 1: Input Data Completeness Analysis
+### Phase 1: Input Data & Expected Results Completeness Analysis

-**Role**: Professional Quality Assurance Engineer
-**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
-**Constraints**: Analysis only — no test specs yet
-
-1. Read `_docs/01_solution/solution.md`
-2. Read `acceptance_criteria.md`, `restrictions.md`
-3. Read testing strategy from solution.md (if present)
-4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
-5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/`
-6. Analyze `input_data/` contents against:
-   - Coverage of acceptance criteria scenarios
-   - Coverage of restriction edge cases
-   - Coverage of testing strategy requirements
-7. Analyze `input_data/expected_results/results_report.md` completeness:
-   - Every input data item has a corresponding expected result row in the mapping
-   - Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
-   - Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
-   - Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
-8. Present input-to-expected-result pairing assessment:
-
-| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
-|------------|--------------------------|---------------|----------------|
-| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
-
-9. Threshold: at least 75% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
-10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
-11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
-
-**BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient.
+Read and follow `phases/01-input-data-analysis.md`.

 ---

 ### Phase 2: Test Scenario Specification

-**Role**: Professional Quality Assurance Engineer
-**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios
-**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
-
-Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
-
-1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
-2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
-3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
-4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
-5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
-6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
-7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
-8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
-
-**Self-verification**:
- [ ] Every acceptance criterion is covered by at least one test scenario
- [ ] Every restriction is verified by at least one test scenario
- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md`
- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
- [ ] Positive and negative scenarios are balanced
- [ ] Consumer app has no direct access to system internals
- [ ] Test environment matches project constraints (see Hardware-Dependency & Execution Environment Assessment below)
- [ ] External dependencies have mock/stub services defined
- [ ] Traceability matrix has no uncovered AC or restrictions
-
-**Save action**: Write all files under TESTS_OUTPUT_DIR:
- `environment.md`
- `test-data.md`
- `blackbox-tests.md`
- `performance-tests.md`
- `resilience-tests.md`
- `security-tests.md`
- `resource-limit-tests.md`
- `traceability-matrix.md`
-
-**BLOCKING**: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed.
-
-Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
+Read and follow `phases/02-test-scenarios.md`.

 ---

 ### Phase 3: Test Data Validation Gate (HARD GATE)

-**Role**: Professional Quality Assurance Engineer
-**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 75%.
-**Constraints**: This phase is MANDATORY and cannot be skipped.
-
-#### Step 1 — Build the requirements checklist
-
-Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, classify its shape (input/output or behavioral) and extract:
-
-**Input/output tests:**
-
-| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
-|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
-| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
-
-**Behavioral tests:**
-
-| # | Test Scenario ID | Test Name | Trigger Condition | Observable Behavior | Pass/Fail Criterion | Quantifiable? |
-|---|-----------------|-----------|-------------------|--------------------|--------------------|---------------|
-| 1 | [ID] | [name] | [e.g., service receives SIGTERM] | [e.g., drain logs emitted, port closed] | [e.g., drain completes ≤30s] | [Yes/No] |
-
-Present both tables to the user.
-
-#### Step 2 — Ask user to provide missing test data AND expected results
-
-For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
-
-> **Option A — Provide the missing items**: Supply what is missing:
-> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
-> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
->
-> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
-> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
-> - "HTTP 200 with JSON body matching `expected_response_01.json`"
-> - "Processing time < 500ms"
-> - "0 false positives in the output set"
->
-> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
-
-**BLOCKING**: Wait for the user's response for every missing item.
-
-#### Step 3 — Validate provided data and expected results
-
-For each item where the user chose **Option A**:
-
-**Input data validation**:
-1. Verify the data file(s) exist at the indicated location
-2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
-3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
-
-**Expected result validation**:
-4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/`
-5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
-   - Exact values (counts, strings, status codes)
-   - Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
-   - Pattern matches (regex, substring, JSON schema)
-   - Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
-   - Reference file for structural comparison (JSON diff, CSV diff)
-6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
-7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
-
-If any validation fails, report the specific issue and loop back to Step 2 for that item.
-
-#### Step 4 — Remove tests without data or expected results
-
-For each item where the user chose **Option B**:
-
-1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
-2. Remove the test scenario from the respective test file
-3. Remove corresponding rows from `traceability-matrix.md`
-4. Update `test-data.md` to reflect the removal
-
-**Save action**: Write updated files under TESTS_OUTPUT_DIR:
- `test-data.md`
- Affected test files (if tests removed)
- `traceability-matrix.md` (if tests removed)
-
-#### Step 5 — Final coverage check
-
-After all removals, recalculate coverage:
-
-1. Count remaining test scenarios that trace to acceptance criteria
-2. Count total acceptance criteria + restrictions
-3. Calculate coverage percentage: `covered_items / total_items * 100`
-
-| Metric | Value |
-|--------|-------|
-| Total AC + Restrictions | ? |
-| Covered by remaining tests | ? |
-| **Coverage %** | **?%** |
-
-**Decision**:
-
- **Coverage ≥ 75%** → Phase 3 **PASSED**. Present final summary to user.
- **Coverage < 75%** → Phase 3 **FAILED**. Report:
-  > ❌ Test coverage dropped to **X%** (minimum 75% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
-  >
-  > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
-  > |---|---|---|
-  >
-  > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
-
-  **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 75%.
-
-#### Phase 3 Completion
-
-When coverage ≥ 75% and all remaining tests have validated data AND quantifiable expected results:
-
-1. Present the final coverage report
-2. List all removed tests (if any) with reasons
-3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
-4. Confirm all artifacts are saved and consistent
+Read and follow `phases/03-data-validation-gate.md`.

 ---

-### Hardware-Dependency & Execution Environment Assessment (BLOCKING — runs before Phase 4)
+### Hardware-Dependency & Execution Environment Assessment (BLOCKING — runs between Phase 3 and Phase 4)

-Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). However, hardware-dependent projects may require local execution to exercise the real code paths. This assessment determines the right execution strategy by scanning both documentation and source code.
-
-#### Step 1 — Documentation scan
-
-Check the following files for mentions of hardware-specific requirements:
-
-| File | Look for |
-|------|----------|
-| `_docs/00_problem/restrictions.md` | Platform requirements, hardware constraints, OS-specific features |
-| `_docs/01_solution/solution.md` | Engine selection logic, platform-dependent paths, hardware acceleration |
-| `_docs/02_document/architecture.md` | Component diagrams showing hardware layers, engine adapters |
-| `_docs/02_document/components/*/description.md` | Per-component hardware mentions |
-| `TESTS_OUTPUT_DIR/environment.md` | Existing environment decisions |
-
-#### Step 2 — Code scan
-
-Search the project source for indicators of hardware dependence. The project is **hardware-dependent** if ANY of the following are found:
-
-| Category | Code indicators (imports, APIs, config) |
-|----------|-----------------------------------------|
-| GPU / CUDA | `import pycuda`, `import tensorrt`, `import pynvml`, `torch.cuda`, `nvidia-smi`, `CUDA_VISIBLE_DEVICES`, `runtime: nvidia` |
-| Apple Neural Engine / CoreML | `import coremltools`, `CoreML`, `MLModel`, `ComputeUnit`, `MPS`, `sys.platform == "darwin"`, `platform.machine() == "arm64"` |
-| OpenCL / Vulkan | `import pyopencl`, `clCreateContext`, vulkan headers |
-| TPU / FPGA | `import tensorflow.distribute.TPUStrategy`, FPGA bitstream loaders |
-| Sensors / Cameras | `import cv2.VideoCapture(0)` (device index), serial port access, GPIO, V4L2 |
-| OS-specific services | Kernel modules (`modprobe`), host-level drivers, platform-gated code (`sys.platform` branches selecting different backends) |
-
-Also check dependency files (`requirements.txt`, `setup.py`, `pyproject.toml`, `Cargo.toml`, `*.csproj`) for hardware-specific packages.
-
-#### Step 3 — Classify the project
-
-Based on Steps 1–2, classify the project:
-
- **Not hardware-dependent**: no indicators found → use Docker (preferred default), skip to "Record the decision" below
- **Hardware-dependent**: one or more indicators found → proceed to Step 4
-
-#### Step 4 — Present execution environment choice
-
-Present the findings and ask the user using Choose format:
-
-```
-══════════════════════════════════════
- DECISION REQUIRED: Test execution environment
-══════════════════════════════════════
- Hardware dependencies detected:
- - [list each indicator found, with file:line]
-══════════════════════════════════════
- Running in Docker means these hardware code paths
- are NOT exercised — Docker uses a Linux VM where
- [specific hardware, e.g. CoreML / CUDA] is unavailable.
- The system would fall back to [fallback engine/path].
-══════════════════════════════════════
- A) Local execution only (tests the real hardware path)
- B) Docker execution only (tests the fallback path)
- C) Both local and Docker (tests both paths, requires
-    two test runs — recommended for CI with heterogeneous
-    runners)
-══════════════════════════════════════
- Recommendation: [A, B, or C] — [reason]
-══════════════════════════════════════
-```
-
-#### Step 5 — Record the decision
-
-Write or update a **"Test Execution"** section in `TESTS_OUTPUT_DIR/environment.md` with:
-
-1. **Decision**: local / docker / both
-2. **Hardware dependencies found**: list with file references
-3. **Execution instructions** per chosen mode:
-   - **Local mode**: prerequisites (OS, SDK, hardware), how to start services, how to run the test runner, environment variables
-   - **Docker mode**: docker-compose profile/command, required images, how results are collected
-   - **Both mode**: instructions for each, plus guidance on which CI runner type runs which mode
+Read and follow `phases/hardware-assessment.md`.

 ---

 ### Phase 4: Test Runner Script Generation

-**Skip condition**: If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.
-
-**Role**: DevOps engineer
-**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
-**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Docker Suitability Assessment decision above.
-
-#### Step 1 — Detect test infrastructure
-
-1. Identify the project's test runner from manifests and config files:
-   - Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini)
-   - .NET: `dotnet test` (*.csproj, *.sln)
-   - Rust: `cargo test` (Cargo.toml)
-   - Node: `npm test` or `vitest` / `jest` (package.json)
-2. Check Docker Suitability Assessment result:
-   - If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
-   - If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
-3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
-4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
-
-#### Step 2 — Generate test runner
-
-**Docker is the default.** Only generate a local `scripts/run-tests.sh` if the Hardware-Dependency Assessment determined **local** or **both** execution (i.e., the project requires real hardware like GPU/CoreML/TPU/sensors). For all other projects, use `docker-compose.test.yml` — it provides reproducibility, isolation, and CI parity without a custom shell script.
-
-**If local script is needed** — create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
-
-1. Set `set -euo pipefail` and trap cleanup on EXIT
-2. **Install all project and test dependencies** (e.g. `pip install -q -r requirements.txt -r e2e/requirements.txt`, `dotnet restore`, `npm ci`). This prevents collection-time import errors on fresh environments.
-3. Optionally accept a `--unit-only` flag to skip blackbox tests
-4. Run unit/blackbox tests using the detected test runner (activate virtualenv if present, run test runner directly on host)
-5. Print a summary of passed/failed/skipped tests
-6. Exit 0 on all pass, exit 1 on any failure
-
-**If Docker** — generate or update `docker-compose.test.yml` that builds the test image, installs all dependencies inside the container, runs the test suite, and exits with the test runner's exit code.
-
-#### Step 3 — Generate `scripts/run-performance-tests.sh`
-
-Create `scripts/run-performance-tests.sh` at the project root. The script must:
-
-1. Set `set -euo pipefail` and trap cleanup on EXIT
-2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
-3. Start the system under test (local or docker-compose, matching the Docker Suitability Assessment decision)
-4. Run load/performance scenarios using the detected tool
-5. Compare results against threshold values from the test spec
-6. Print a pass/fail summary per scenario
-7. Exit 0 if all thresholds met, exit 1 otherwise
-
-#### Step 4 — Verify scripts
-
-1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
-2. Mark both scripts as executable (`chmod +x`)
-3. Present a summary of what each script does to the user
-
-**Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
+Read and follow `phases/04-runner-scripts.md`.

 ---

+### cycle-update mode
+
+If invoked in `cycle-update` mode (see "Invocation Modes" above), read and follow `modes/cycle-update.md` instead of the full 4-phase workflow.
+
 ## Escalation Rules

 | Situation | Action |
@@ -511,6 +223,7 @@ Create `scripts/run-performance-tests.sh` at the project root. The script must:
 ## Trigger Conditions

 When the user wants to:
+
 - Specify blackbox tests before implementation or refactoring
 - Analyze input data completeness for test coverage
 - Produce test scenarios from acceptance criteria
@@ -527,36 +240,30 @@ When the user wants to:
 │   → verify AC, restrictions, input_data (incl. expected_results.md)  │
 │                                                                      │
 │ Phase 1: Input Data & Expected Results Completeness Analysis         │
-│   → assess input_data/ coverage vs AC scenarios (≥75%)               │
-│   → verify every input has a quantifiable expected result            │
-│   → present input→expected-result pairing assessment                 │
+│   → phases/01-input-data-analysis.md                                 │
 │   [BLOCKING: user confirms input data + expected results coverage]   │
 │                                                                      │
 │ Phase 2: Test Scenario Specification                                 │
-│   → environment.md                                                   │
-│   → test-data.md (with expected results mapping)                     │
-│   → blackbox-tests.md (positive + negative)                          │
-│   → performance-tests.md                                             │
-│   → resilience-tests.md                                              │
-│   → security-tests.md                                                │
-│   → resource-limit-tests.md                                          │
-│   → traceability-matrix.md                                           │
+│   → phases/02-test-scenarios.md                                      │
+│   → environment.md · test-data.md · blackbox-tests.md                │
+│   → performance-tests.md · resilience-tests.md · security-tests.md   │
+│   → resource-limit-tests.md · traceability-matrix.md                 │
 │   [BLOCKING: user confirms test coverage]                            │
 │                                                                      │
 │ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)    │
-│   → build test-data + expected-result requirements checklist         │
-│   → ask user: provide data+result (A) or remove test (B)             │
-│   → validate input data (quality + quantity)                         │
-│   → validate expected results (quantifiable + comparison method)     │
-│   → remove tests without data or expected result, warn user          │
-│   → final coverage check (≥75% or FAIL + loop back)                  │
+│   → phases/03-data-validation-gate.md                                │
 │   [BLOCKING: coverage ≥ 75% required to pass]                        │
 │                                                                      │
+│ Hardware-Dependency Assessment (BLOCKING, pre-Phase-4)               │
+│   → phases/hardware-assessment.md                                    │
+│                                                                      │
 │ Phase 4: Test Runner Script Generation                               │
-│   → detect test runner + docker-compose + load tool                  │
+│   → phases/04-runner-scripts.md                                      │
 │   → scripts/run-tests.sh (unit + blackbox)                           │
 │   → scripts/run-performance-tests.sh (load/perf scenarios)           │
-│   → verify scripts are valid and executable                          │
+│                                                                      │
+│ cycle-update mode (scoped refresh)                                   │
+│   → modes/cycle-update.md                                            │
 ├──────────────────────────────────────────────────────────────────────┤
 │ Principles: Black-box only · Traceability · Save immediately         │
 │             Ask don't assume · Spec don't code                       │
@@ -0,0 +1,26 @@
+# Mode: cycle-update
+
+A scoped refresh of existing test-spec artifacts against the current feature cycle's completed tasks. Used by `existing-code` flow's per-cycle sync step.
+
+## Inputs
+
+- The list of task spec files in `_docs/02_tasks/done/` implemented in the current cycle
+- `_docs/03_implementation/implementation_report_{feature_slug}_cycle{N}.md`
+
+## Phases that run
+
+- Skip Phase 1 (input data analysis)
+- Skip Phase 4 (script generation)
+- Run a **narrowed** Phase 2 and Phase 3 per the rules below
+
+## Narrowed rules
+
+1. For each new AC in the cycle's task specs, check `traceability-matrix.md`. If not covered, append one row.
+2. For each new component surface exposed in the cycle (new endpoint, event, DTO field — detectable from task Scope and from diffs against `module-layout.md`), append scenarios to the relevant `blackbox-tests.md` / `performance-tests.md` / `security-tests.md` / `resilience-tests.md` / `resource-limit-tests.md` category. Reuse the existing test template shapes.
+3. For each NFR declared in a cycle task spec, propagate it to the matching spec file. If the NFR conflicts with an existing spec entry, present via the Choose format.
+4. Do NOT rewrite unaffected sections. Preserve existing traceability IDs.
+5. Save only the files that changed, update `traceability-matrix.md` last.
+
+## Save action
+
+Save only the changed test artifact files under `TESTS_OUTPUT_DIR`. Update `traceability-matrix.md` last, after all per-category files are written.
@@ -0,0 +1,39 @@
+# Phase 1: Input Data & Expected Results Completeness Analysis
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios, and whether every input is paired with a quantifiable expected result.
+**Constraints**: Analysis only — no test specs yet.
+
+## Steps
+
+1. Read `_docs/01_solution/solution.md`
+2. Read `acceptance_criteria.md`, `restrictions.md`
+3. Read testing strategy from `solution.md` (if present)
+4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
+5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/`
+6. Analyze `input_data/` contents against:
+   - Coverage of acceptance criteria scenarios
+   - Coverage of restriction edge cases
+   - Coverage of testing strategy requirements
+7. Analyze `input_data/expected_results/results_report.md` completeness:
+   - Every input data item has a corresponding expected result row in the mapping
+   - Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
+   - Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
+   - Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
+8. Present input-to-expected-result pairing assessment:
+
+| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
+|------------|--------------------------|---------------|----------------|
+| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
+
+9. Threshold: at least 75% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
+10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
+11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
+
+## Blocking
+
+**BLOCKING**: Do NOT proceed to Phase 2 until the user confirms both input data coverage AND expected results completeness are sufficient.
+
+## No save action
+
+Phase 1 does not write an artifact. Findings feed Phase 2.
@@ -0,0 +1,49 @@
+# Phase 2: Test Scenario Specification
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios.
+**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
+
+## Steps
+
+Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
+
+1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
+2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
+3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
+4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
+5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
+6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
+7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
+8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
+
+## Self-verification
+
+- [ ] Every acceptance criterion is covered by at least one test scenario
+- [ ] Every restriction is verified by at least one test scenario
+- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md`
+- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
+- [ ] Positive and negative scenarios are balanced
+- [ ] Consumer app has no direct access to system internals
+- [ ] Test environment matches project constraints (see `phases/hardware-assessment.md`, which runs before Phase 4)
+- [ ] External dependencies have mock/stub services defined
+- [ ] Traceability matrix has no uncovered AC or restrictions
+
+## Save action
+
+Write all files under TESTS_OUTPUT_DIR:
+
+- `environment.md`
+- `test-data.md`
+- `blackbox-tests.md`
+- `performance-tests.md`
+- `resilience-tests.md`
+- `security-tests.md`
+- `resource-limit-tests.md`
+- `traceability-matrix.md`
+
+## Blocking
+
+**BLOCKING**: Present test coverage summary (from `traceability-matrix.md`) to user. Do NOT proceed to Phase 3 until confirmed.
+
+Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
@@ -0,0 +1,118 @@
+# Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 75%.
+**Constraints**: This phase is MANDATORY and cannot be skipped.
+
+## Step 1 — Build the requirements checklist
+
+Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, classify its shape (input/output or behavioral) and extract:
+
+**Input/output tests:**
+
+| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
+|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
+| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
+
+**Behavioral tests:**
+
+| # | Test Scenario ID | Test Name | Trigger Condition | Observable Behavior | Pass/Fail Criterion | Quantifiable? |
+|---|-----------------|-----------|-------------------|--------------------|--------------------|---------------|
+| 1 | [ID] | [name] | [e.g., service receives SIGTERM] | [e.g., drain logs emitted, port closed] | [e.g., drain completes ≤30s] | [Yes/No] |
+
+Present both tables to the user.
+
+## Step 2 — Ask user to provide missing test data AND expected results
+
+For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
+
+> **Option A — Provide the missing items**: Supply what is missing:
+> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
+> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
+>
+> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
+> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
+> - "HTTP 200 with JSON body matching `expected_response_01.json`"
+> - "Processing time < 500ms"
+> - "0 false positives in the output set"
+>
+> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
+
+**BLOCKING**: Wait for the user's response for every missing item.
+
+## Step 3 — Validate provided data and expected results
+
+For each item where the user chose **Option A**:
+
+**Input data validation**:
+
+1. Verify the data file(s) exist at the indicated location
+2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
+3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
+
+**Expected result validation**:
+
+4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/`
+5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
+   - Exact values (counts, strings, status codes)
+   - Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
+   - Pattern matches (regex, substring, JSON schema)
+   - Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
+   - Reference file for structural comparison (JSON diff, CSV diff)
+6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
+7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
+
+If any validation fails, report the specific issue and loop back to Step 2 for that item.
+
+## Step 4 — Remove tests without data or expected results
+
+For each item where the user chose **Option B**:
+
+1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
+2. Remove the test scenario from the respective test file
+3. Remove corresponding rows from `traceability-matrix.md`
+4. Update `test-data.md` to reflect the removal
+
+**Save action**: Write updated files under TESTS_OUTPUT_DIR:
+
+- `test-data.md`
+- Affected test files (if tests removed)
+- `traceability-matrix.md` (if tests removed)
+
+## Step 5 — Final coverage check
+
+After all removals, recalculate coverage:
+
+1. Count remaining test scenarios that trace to acceptance criteria
+2. Count total acceptance criteria + restrictions
+3. Calculate coverage percentage: `covered_items / total_items * 100`
+
+| Metric | Value |
+|--------|-------|
+| Total AC + Restrictions | ? |
+| Covered by remaining tests | ? |
+| **Coverage %** | **?%** |
+
+**Decision**:
+
+- **Coverage ≥ 75%** → Phase 3 **PASSED**. Present final summary to user.
+- **Coverage < 75%** → Phase 3 **FAILED**. Report:
+  > ❌ Test coverage dropped to **X%** (minimum 75% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
+  >
+  > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
+  > |---|---|---|
+  >
+  > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
+
+  **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 75%.
+
+## Phase 3 Completion
+
+When coverage ≥ 75% and all remaining tests have validated data AND quantifiable expected results:
+
+1. Present the final coverage report
+2. List all removed tests (if any) with reasons
+3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
+4. Confirm all artifacts are saved and consistent
+
+After Phase 3 completion, run `phases/hardware-assessment.md` before Phase 4.
@@ -0,0 +1,60 @@
+# Phase 4: Test Runner Script Generation
+
+**Skip condition**: If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.
+
+**Role**: DevOps engineer
+**Goal**: Generate executable shell scripts that run the specified tests, so autodev and CI can invoke them consistently.
+**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Hardware-Dependency Assessment decision recorded in `environment.md`.
+
+**Prerequisite**: `phases/hardware-assessment.md` must have completed and written the "Test Execution" section to `TESTS_OUTPUT_DIR/environment.md`.
+
+## Step 1 — Detect test infrastructure
+
+1. Identify the project's test runner from manifests and config files:
+   - Python: `pytest` (`pyproject.toml`, `setup.cfg`, `pytest.ini`)
+   - .NET: `dotnet test` (`*.csproj`, `*.sln`)
+   - Rust: `cargo test` (`Cargo.toml`)
+   - Node: `npm test` or `vitest` / `jest` (`package.json`)
+2. Check the Hardware-Dependency Assessment result recorded in `environment.md`:
+   - If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
+   - If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
+   - If **both** was chosen → generate both
+3. Identify performance/load testing tools from dependencies (`k6`, `locust`, `artillery`, `wrk`, or built-in benchmarks)
+4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
+
+## Step 2 — Generate test runner
+
+**Docker is the default.** Only generate a local `scripts/run-tests.sh` if the Hardware-Dependency Assessment determined **local** or **both** execution (i.e., the project requires real hardware like GPU/CoreML/TPU/sensors). For all other projects, use `docker-compose.test.yml` — it provides reproducibility, isolation, and CI parity without a custom shell script.
+
+**If local script is needed** — create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
+
+1. Set `set -euo pipefail` and trap cleanup on EXIT
+2. **Install all project and test dependencies** (e.g. `pip install -q -r requirements.txt -r e2e/requirements.txt`, `dotnet restore`, `npm ci`). This prevents collection-time import errors on fresh environments.
+3. Optionally accept a `--unit-only` flag to skip blackbox tests
+4. Run unit/blackbox tests using the detected test runner (activate virtualenv if present, run test runner directly on host)
+5. Print a summary of passed/failed/skipped tests
+6. Exit 0 on all pass, exit 1 on any failure
+
+**If Docker** — generate or update `docker-compose.test.yml` that builds the test image, installs all dependencies inside the container, runs the test suite, and exits with the test runner's exit code.
+
+## Step 3 — Generate `scripts/run-performance-tests.sh`
+
+Create `scripts/run-performance-tests.sh` at the project root. The script must:
+
+1. Set `set -euo pipefail` and trap cleanup on EXIT
+2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
+3. Start the system under test (local or docker-compose, matching the Hardware-Dependency Assessment decision)
+4. Run load/performance scenarios using the detected tool
+5. Compare results against threshold values from the test spec
+6. Print a pass/fail summary per scenario
+7. Exit 0 if all thresholds met, exit 1 otherwise
+
+## Step 4 — Verify scripts
+
+1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
+2. Mark both scripts as executable (`chmod +x`)
+3. Present a summary of what each script does to the user
+
+## Save action
+
+Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
@@ -0,0 +1,78 @@
+# Hardware-Dependency & Execution Environment Assessment (BLOCKING)
+
+Runs between Phase 3 and Phase 4.
+
+Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). However, hardware-dependent projects may require local execution to exercise the real code paths. This assessment determines the right execution strategy by scanning both documentation and source code.
+
+## Step 1 — Documentation scan
+
+Check the following files for mentions of hardware-specific requirements:
+
+| File | Look for |
+|------|----------|
+| `_docs/00_problem/restrictions.md` | Platform requirements, hardware constraints, OS-specific features |
+| `_docs/01_solution/solution.md` | Engine selection logic, platform-dependent paths, hardware acceleration |
+| `_docs/02_document/architecture.md` | Component diagrams showing hardware layers, engine adapters |
+| `_docs/02_document/components/*/description.md` | Per-component hardware mentions |
+| `TESTS_OUTPUT_DIR/environment.md` | Existing environment decisions |
+
+## Step 2 — Code scan
+
+Search the project source for indicators of hardware dependence. The project is **hardware-dependent** if ANY of the following are found:
+
+| Category | Code indicators (imports, APIs, config) |
+|----------|-----------------------------------------|
+| GPU / CUDA | `import pycuda`, `import tensorrt`, `import pynvml`, `torch.cuda`, `nvidia-smi`, `CUDA_VISIBLE_DEVICES`, `runtime: nvidia` |
+| Apple Neural Engine / CoreML | `import coremltools`, `CoreML`, `MLModel`, `ComputeUnit`, `MPS`, `sys.platform == "darwin"`, `platform.machine() == "arm64"` |
+| OpenCL / Vulkan | `import pyopencl`, `clCreateContext`, vulkan headers |
+| TPU / FPGA | `import tensorflow.distribute.TPUStrategy`, FPGA bitstream loaders |
+| Sensors / Cameras | `import cv2.VideoCapture(0)` (device index), serial port access, GPIO, V4L2 |
+| OS-specific services | Kernel modules (`modprobe`), host-level drivers, platform-gated code (`sys.platform` branches selecting different backends) |
+
+Also check dependency files (`requirements.txt`, `setup.py`, `pyproject.toml`, `Cargo.toml`, `*.csproj`) for hardware-specific packages.
+
+## Step 3 — Classify the project
+
+Based on Steps 1–2, classify the project:
+
+- **Not hardware-dependent**: no indicators found → use Docker (preferred default), skip to Step 5 "Record the decision"
+- **Hardware-dependent**: one or more indicators found → proceed to Step 4
+
+## Step 4 — Present execution environment choice
+
+Present the findings and ask the user using Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Test execution environment
+══════════════════════════════════════
+ Hardware dependencies detected:
+ - [list each indicator found, with file:line]
+══════════════════════════════════════
+ Running in Docker means these hardware code paths
+ are NOT exercised — Docker uses a Linux VM where
+ [specific hardware, e.g. CoreML / CUDA] is unavailable.
+ The system would fall back to [fallback engine/path].
+══════════════════════════════════════
+ A) Local execution only (tests the real hardware path)
+ B) Docker execution only (tests the fallback path)
+ C) Both local and Docker (tests both paths, requires
+    two test runs — recommended for CI with heterogeneous
+    runners)
+══════════════════════════════════════
+ Recommendation: [A, B, or C] — [reason]
+══════════════════════════════════════
+```
+
+## Step 5 — Record the decision
+
+Write or update a **"Test Execution"** section in `TESTS_OUTPUT_DIR/environment.md` with:
+
+1. **Decision**: local / docker / both
+2. **Hardware dependencies found**: list with file references
+3. **Execution instructions** per chosen mode:
+   - **Local mode**: prerequisites (OS, SDK, hardware), how to start services, how to run the test runner, environment variables
+   - **Docker mode**: docker-compose profile/command, required images, how results are collected
+   - **Both mode**: instructions for each, plus guidance on which CI runner type runs which mode
+
+The decision is consumed by Phase 4 to choose between local `scripts/run-tests.sh` and `docker-compose.test.yml`.