Sync .cursor from suite (autodev orchestrator + monorepo skills)

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-18 22:04:23 +03:00
parent d883fdb3cc
commit 14a81b0fcc
60 changed files with 4232 additions and 1728 deletions
+38 -331
View File
@@ -35,14 +35,19 @@ Analyze input data completeness and produce detailed black-box test specificatio
## Context Resolution
Fixed paths — no mode detection needed:
Fixed paths:
- PROBLEM_DIR: `_docs/00_problem/`
- SOLUTION_DIR: `_docs/01_solution/`
- DOCUMENT_DIR: `_docs/02_document/`
- TESTS_OUTPUT_DIR: `_docs/02_document/tests/`
Announce the resolved paths to the user before proceeding.
Announce the resolved paths and the detected invocation mode (below) to the user before proceeding.
### Invocation Modes
- **full** (default): runs all 4 phases against the whole `PROBLEM_DIR` + `DOCUMENT_DIR`. Used in greenfield Plan Step 1 and existing-code Step 3.
- **cycle-update**: runs only a scoped refresh of the existing test-spec artifacts against the current feature cycle's completed tasks. Used by the existing-code flow's per-cycle sync step. See `modes/cycle-update.md` for the narrowed workflow.
## Input Specification
@@ -62,7 +67,7 @@ Every input data item MUST have a corresponding expected result that defines wha
Expected results live inside `_docs/00_problem/input_data/` in one or both of:
1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md`
1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `templates/expected-results.md`
2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file
@@ -77,7 +82,8 @@ input_data/
└── data_parameters.md
```
**Quantifiability requirements** (see template for full format and examples):
**Quantifiability requirements** (see `templates/expected-results.md` for full format and examples):
- Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`)
- Structured data: exact JSON/CSV values, or a reference file in `expected_results/`
- Counts: exact counts (e.g., "3 detections", "0 errors")
@@ -98,7 +104,7 @@ input_data/
1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
2. `restrictions.md` exists and is non-empty — **STOP if missing**
3. `input_data/` exists and contains at least one file — **STOP if missing**
4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."*
4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `templates/expected-results.md` as the format reference."*
5. `problem.md` exists and is non-empty — **STOP if missing**
6. `solution.md` exists and is non-empty — **STOP if missing**
7. Create TESTS_OUTPUT_DIR if it does not exist
@@ -136,6 +142,7 @@ TESTS_OUTPUT_DIR/
| Phase 3 | Updated test data spec (if data added) | `test-data.md` |
| Phase 3 | Updated test files (if tests removed) | respective test file |
| Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
| Hardware Assessment | Test Execution section | `environment.md` (updated) |
| Phase 4 | Test runner script | `scripts/run-tests.sh` |
| Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` |
@@ -150,339 +157,44 @@ If TESTS_OUTPUT_DIR already contains files:
## Progress Tracking
At the start of execution, create a TodoWrite with all four phases. Update status as each phase completes.
At the start of execution, create a TodoWrite with all four phases (plus the hardware assessment between Phase 3 and Phase 4). Update status as each phase completes.
## Workflow
### Phase 1: Input Data Completeness Analysis
### Phase 1: Input Data & Expected Results Completeness Analysis
**Role**: Professional Quality Assurance Engineer
**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
**Constraints**: Analysis only — no test specs yet
1. Read `_docs/01_solution/solution.md`
2. Read `acceptance_criteria.md`, `restrictions.md`
3. Read testing strategy from solution.md (if present)
4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/`
6. Analyze `input_data/` contents against:
- Coverage of acceptance criteria scenarios
- Coverage of restriction edge cases
- Coverage of testing strategy requirements
7. Analyze `input_data/expected_results/results_report.md` completeness:
- Every input data item has a corresponding expected result row in the mapping
- Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
- Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
- Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
8. Present input-to-expected-result pairing assessment:
| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
|------------|--------------------------|---------------|----------------|
| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
9. Threshold: at least 75% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
**BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient.
Read and follow `phases/01-input-data-analysis.md`.
---
### Phase 2: Test Scenario Specification
**Role**: Professional Quality Assurance Engineer
**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios
**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
**Self-verification**:
- [ ] Every acceptance criterion is covered by at least one test scenario
- [ ] Every restriction is verified by at least one test scenario
- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md`
- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
- [ ] Positive and negative scenarios are balanced
- [ ] Consumer app has no direct access to system internals
- [ ] Test environment matches project constraints (see Hardware-Dependency & Execution Environment Assessment below)
- [ ] External dependencies have mock/stub services defined
- [ ] Traceability matrix has no uncovered AC or restrictions
**Save action**: Write all files under TESTS_OUTPUT_DIR:
- `environment.md`
- `test-data.md`
- `blackbox-tests.md`
- `performance-tests.md`
- `resilience-tests.md`
- `security-tests.md`
- `resource-limit-tests.md`
- `traceability-matrix.md`
**BLOCKING**: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed.
Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
Read and follow `phases/02-test-scenarios.md`.
---
### Phase 3: Test Data Validation Gate (HARD GATE)
**Role**: Professional Quality Assurance Engineer
**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 75%.
**Constraints**: This phase is MANDATORY and cannot be skipped.
#### Step 1 — Build the requirements checklist
Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, classify its shape (input/output or behavioral) and extract:
**Input/output tests:**
| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
**Behavioral tests:**
| # | Test Scenario ID | Test Name | Trigger Condition | Observable Behavior | Pass/Fail Criterion | Quantifiable? |
|---|-----------------|-----------|-------------------|--------------------|--------------------|---------------|
| 1 | [ID] | [name] | [e.g., service receives SIGTERM] | [e.g., drain logs emitted, port closed] | [e.g., drain completes ≤30s] | [Yes/No] |
Present both tables to the user.
#### Step 2 — Ask user to provide missing test data AND expected results
For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
> **Option A — Provide the missing items**: Supply what is missing:
> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
>
> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
> - "HTTP 200 with JSON body matching `expected_response_01.json`"
> - "Processing time < 500ms"
> - "0 false positives in the output set"
>
> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
**BLOCKING**: Wait for the user's response for every missing item.
#### Step 3 — Validate provided data and expected results
For each item where the user chose **Option A**:
**Input data validation**:
1. Verify the data file(s) exist at the indicated location
2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
**Expected result validation**:
4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/`
5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
- Exact values (counts, strings, status codes)
- Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
- Pattern matches (regex, substring, JSON schema)
- Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
- Reference file for structural comparison (JSON diff, CSV diff)
6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
If any validation fails, report the specific issue and loop back to Step 2 for that item.
#### Step 4 — Remove tests without data or expected results
For each item where the user chose **Option B**:
1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
2. Remove the test scenario from the respective test file
3. Remove corresponding rows from `traceability-matrix.md`
4. Update `test-data.md` to reflect the removal
**Save action**: Write updated files under TESTS_OUTPUT_DIR:
- `test-data.md`
- Affected test files (if tests removed)
- `traceability-matrix.md` (if tests removed)
#### Step 5 — Final coverage check
After all removals, recalculate coverage:
1. Count remaining test scenarios that trace to acceptance criteria
2. Count total acceptance criteria + restrictions
3. Calculate coverage percentage: `covered_items / total_items * 100`
| Metric | Value |
|--------|-------|
| Total AC + Restrictions | ? |
| Covered by remaining tests | ? |
| **Coverage %** | **?%** |
**Decision**:
- **Coverage ≥ 75%** → Phase 3 **PASSED**. Present final summary to user.
- **Coverage < 75%** → Phase 3 **FAILED**. Report:
> ❌ Test coverage dropped to **X%** (minimum 75% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
>
> | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
> |---|---|---|
>
> **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
**BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 75%.
#### Phase 3 Completion
When coverage ≥ 75% and all remaining tests have validated data AND quantifiable expected results:
1. Present the final coverage report
2. List all removed tests (if any) with reasons
3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
4. Confirm all artifacts are saved and consistent
Read and follow `phases/03-data-validation-gate.md`.
---
### Hardware-Dependency & Execution Environment Assessment (BLOCKING — runs before Phase 4)
### Hardware-Dependency & Execution Environment Assessment (BLOCKING — runs between Phase 3 and Phase 4)
Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). However, hardware-dependent projects may require local execution to exercise the real code paths. This assessment determines the right execution strategy by scanning both documentation and source code.
#### Step 1 — Documentation scan
Check the following files for mentions of hardware-specific requirements:
| File | Look for |
|------|----------|
| `_docs/00_problem/restrictions.md` | Platform requirements, hardware constraints, OS-specific features |
| `_docs/01_solution/solution.md` | Engine selection logic, platform-dependent paths, hardware acceleration |
| `_docs/02_document/architecture.md` | Component diagrams showing hardware layers, engine adapters |
| `_docs/02_document/components/*/description.md` | Per-component hardware mentions |
| `TESTS_OUTPUT_DIR/environment.md` | Existing environment decisions |
#### Step 2 — Code scan
Search the project source for indicators of hardware dependence. The project is **hardware-dependent** if ANY of the following are found:
| Category | Code indicators (imports, APIs, config) |
|----------|-----------------------------------------|
| GPU / CUDA | `import pycuda`, `import tensorrt`, `import pynvml`, `torch.cuda`, `nvidia-smi`, `CUDA_VISIBLE_DEVICES`, `runtime: nvidia` |
| Apple Neural Engine / CoreML | `import coremltools`, `CoreML`, `MLModel`, `ComputeUnit`, `MPS`, `sys.platform == "darwin"`, `platform.machine() == "arm64"` |
| OpenCL / Vulkan | `import pyopencl`, `clCreateContext`, vulkan headers |
| TPU / FPGA | `import tensorflow.distribute.TPUStrategy`, FPGA bitstream loaders |
| Sensors / Cameras | `import cv2.VideoCapture(0)` (device index), serial port access, GPIO, V4L2 |
| OS-specific services | Kernel modules (`modprobe`), host-level drivers, platform-gated code (`sys.platform` branches selecting different backends) |
Also check dependency files (`requirements.txt`, `setup.py`, `pyproject.toml`, `Cargo.toml`, `*.csproj`) for hardware-specific packages.
#### Step 3 — Classify the project
Based on Steps 12, classify the project:
- **Not hardware-dependent**: no indicators found → use Docker (preferred default), skip to "Record the decision" below
- **Hardware-dependent**: one or more indicators found → proceed to Step 4
#### Step 4 — Present execution environment choice
Present the findings and ask the user using Choose format:
```
══════════════════════════════════════
DECISION REQUIRED: Test execution environment
══════════════════════════════════════
Hardware dependencies detected:
- [list each indicator found, with file:line]
══════════════════════════════════════
Running in Docker means these hardware code paths
are NOT exercised — Docker uses a Linux VM where
[specific hardware, e.g. CoreML / CUDA] is unavailable.
The system would fall back to [fallback engine/path].
══════════════════════════════════════
A) Local execution only (tests the real hardware path)
B) Docker execution only (tests the fallback path)
C) Both local and Docker (tests both paths, requires
two test runs — recommended for CI with heterogeneous
runners)
══════════════════════════════════════
Recommendation: [A, B, or C] — [reason]
══════════════════════════════════════
```
#### Step 5 — Record the decision
Write or update a **"Test Execution"** section in `TESTS_OUTPUT_DIR/environment.md` with:
1. **Decision**: local / docker / both
2. **Hardware dependencies found**: list with file references
3. **Execution instructions** per chosen mode:
- **Local mode**: prerequisites (OS, SDK, hardware), how to start services, how to run the test runner, environment variables
- **Docker mode**: docker-compose profile/command, required images, how results are collected
- **Both mode**: instructions for each, plus guidance on which CI runner type runs which mode
Read and follow `phases/hardware-assessment.md`.
---
### Phase 4: Test Runner Script Generation
**Skip condition**: If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.
**Role**: DevOps engineer
**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Docker Suitability Assessment decision above.
#### Step 1 — Detect test infrastructure
1. Identify the project's test runner from manifests and config files:
- Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini)
- .NET: `dotnet test` (*.csproj, *.sln)
- Rust: `cargo test` (Cargo.toml)
- Node: `npm test` or `vitest` / `jest` (package.json)
2. Check Docker Suitability Assessment result:
- If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
- If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
#### Step 2 — Generate test runner
**Docker is the default.** Only generate a local `scripts/run-tests.sh` if the Hardware-Dependency Assessment determined **local** or **both** execution (i.e., the project requires real hardware like GPU/CoreML/TPU/sensors). For all other projects, use `docker-compose.test.yml` — it provides reproducibility, isolation, and CI parity without a custom shell script.
**If local script is needed** — create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
1. Set `set -euo pipefail` and trap cleanup on EXIT
2. **Install all project and test dependencies** (e.g. `pip install -q -r requirements.txt -r e2e/requirements.txt`, `dotnet restore`, `npm ci`). This prevents collection-time import errors on fresh environments.
3. Optionally accept a `--unit-only` flag to skip blackbox tests
4. Run unit/blackbox tests using the detected test runner (activate virtualenv if present, run test runner directly on host)
5. Print a summary of passed/failed/skipped tests
6. Exit 0 on all pass, exit 1 on any failure
**If Docker** — generate or update `docker-compose.test.yml` that builds the test image, installs all dependencies inside the container, runs the test suite, and exits with the test runner's exit code.
#### Step 3 — Generate `scripts/run-performance-tests.sh`
Create `scripts/run-performance-tests.sh` at the project root. The script must:
1. Set `set -euo pipefail` and trap cleanup on EXIT
2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
3. Start the system under test (local or docker-compose, matching the Docker Suitability Assessment decision)
4. Run load/performance scenarios using the detected tool
5. Compare results against threshold values from the test spec
6. Print a pass/fail summary per scenario
7. Exit 0 if all thresholds met, exit 1 otherwise
#### Step 4 — Verify scripts
1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
2. Mark both scripts as executable (`chmod +x`)
3. Present a summary of what each script does to the user
**Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
Read and follow `phases/04-runner-scripts.md`.
---
### cycle-update mode
If invoked in `cycle-update` mode (see "Invocation Modes" above), read and follow `modes/cycle-update.md` instead of the full 4-phase workflow.
## Escalation Rules
| Situation | Action |
@@ -511,6 +223,7 @@ Create `scripts/run-performance-tests.sh` at the project root. The script must:
## Trigger Conditions
When the user wants to:
- Specify blackbox tests before implementation or refactoring
- Analyze input data completeness for test coverage
- Produce test scenarios from acceptance criteria
@@ -527,36 +240,30 @@ When the user wants to:
│ → verify AC, restrictions, input_data (incl. expected_results.md) │
│ │
│ Phase 1: Input Data & Expected Results Completeness Analysis │
│ → assess input_data/ coverage vs AC scenarios (≥75%)
│ → verify every input has a quantifiable expected result │
│ → present input→expected-result pairing assessment │
│ → phases/01-input-data-analysis.md
│ [BLOCKING: user confirms input data + expected results coverage] │
│ │
│ Phase 2: Test Scenario Specification │
│ → environment.md
│ → test-data.md (with expected results mapping)
│ → blackbox-tests.md (positive + negative)
│ → performance-tests.md
│ → resilience-tests.md │
│ → security-tests.md │
│ → resource-limit-tests.md │
│ → traceability-matrix.md │
│ → phases/02-test-scenarios.md
│ → environment.md · test-data.md · blackbox-tests.md
│ → performance-tests.md · resilience-tests.md · security-tests.md
│ → resource-limit-tests.md · traceability-matrix.md
│ [BLOCKING: user confirms test coverage] │
│ │
│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │
│ → build test-data + expected-result requirements checklist
│ → ask user: provide data+result (A) or remove test (B) │
│ → validate input data (quality + quantity) │
│ → validate expected results (quantifiable + comparison method) │
│ → remove tests without data or expected result, warn user │
│ → final coverage check (≥75% or FAIL + loop back) │
│ → phases/03-data-validation-gate.md
│ [BLOCKING: coverage ≥ 75% required to pass] │
│ │
│ Hardware-Dependency Assessment (BLOCKING, pre-Phase-4) │
│ → phases/hardware-assessment.md │
│ │
│ Phase 4: Test Runner Script Generation │
│ → detect test runner + docker-compose + load tool
│ → phases/04-runner-scripts.md
│ → scripts/run-tests.sh (unit + blackbox) │
│ → scripts/run-performance-tests.sh (load/perf scenarios) │
→ verify scripts are valid and executable
│ cycle-update mode (scoped refresh) │
│ → modes/cycle-update.md │
├──────────────────────────────────────────────────────────────────────┤
│ Principles: Black-box only · Traceability · Save immediately │
│ Ask don't assume · Spec don't code │
@@ -0,0 +1,26 @@
# Mode: cycle-update
A scoped refresh of existing test-spec artifacts against the current feature cycle's completed tasks. Used by `existing-code` flow's per-cycle sync step.
## Inputs
- The list of task spec files in `_docs/02_tasks/done/` implemented in the current cycle
- `_docs/03_implementation/implementation_report_{feature_slug}_cycle{N}.md`
## Phases that run
- Skip Phase 1 (input data analysis)
- Skip Phase 4 (script generation)
- Run a **narrowed** Phase 2 and Phase 3 per the rules below
## Narrowed rules
1. For each new AC in the cycle's task specs, check `traceability-matrix.md`. If not covered, append one row.
2. For each new component surface exposed in the cycle (new endpoint, event, DTO field — detectable from task Scope and from diffs against `module-layout.md`), append scenarios to the relevant `blackbox-tests.md` / `performance-tests.md` / `security-tests.md` / `resilience-tests.md` / `resource-limit-tests.md` category. Reuse the existing test template shapes.
3. For each NFR declared in a cycle task spec, propagate it to the matching spec file. If the NFR conflicts with an existing spec entry, present via the Choose format.
4. Do NOT rewrite unaffected sections. Preserve existing traceability IDs.
5. Save only the files that changed, update `traceability-matrix.md` last.
## Save action
Save only the changed test artifact files under `TESTS_OUTPUT_DIR`. Update `traceability-matrix.md` last, after all per-category files are written.
@@ -0,0 +1,39 @@
# Phase 1: Input Data & Expected Results Completeness Analysis
**Role**: Professional Quality Assurance Engineer
**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios, and whether every input is paired with a quantifiable expected result.
**Constraints**: Analysis only — no test specs yet.
## Steps
1. Read `_docs/01_solution/solution.md`
2. Read `acceptance_criteria.md`, `restrictions.md`
3. Read testing strategy from `solution.md` (if present)
4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/`
6. Analyze `input_data/` contents against:
- Coverage of acceptance criteria scenarios
- Coverage of restriction edge cases
- Coverage of testing strategy requirements
7. Analyze `input_data/expected_results/results_report.md` completeness:
- Every input data item has a corresponding expected result row in the mapping
- Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
- Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
- Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
8. Present input-to-expected-result pairing assessment:
| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
|------------|--------------------------|---------------|----------------|
| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
9. Threshold: at least 75% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
## Blocking
**BLOCKING**: Do NOT proceed to Phase 2 until the user confirms both input data coverage AND expected results completeness are sufficient.
## No save action
Phase 1 does not write an artifact. Findings feed Phase 2.
@@ -0,0 +1,49 @@
# Phase 2: Test Scenario Specification
**Role**: Professional Quality Assurance Engineer
**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios.
**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
## Steps
Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
## Self-verification
- [ ] Every acceptance criterion is covered by at least one test scenario
- [ ] Every restriction is verified by at least one test scenario
- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md`
- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
- [ ] Positive and negative scenarios are balanced
- [ ] Consumer app has no direct access to system internals
- [ ] Test environment matches project constraints (see `phases/hardware-assessment.md`, which runs before Phase 4)
- [ ] External dependencies have mock/stub services defined
- [ ] Traceability matrix has no uncovered AC or restrictions
## Save action
Write all files under TESTS_OUTPUT_DIR:
- `environment.md`
- `test-data.md`
- `blackbox-tests.md`
- `performance-tests.md`
- `resilience-tests.md`
- `security-tests.md`
- `resource-limit-tests.md`
- `traceability-matrix.md`
## Blocking
**BLOCKING**: Present test coverage summary (from `traceability-matrix.md`) to user. Do NOT proceed to Phase 3 until confirmed.
Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
@@ -0,0 +1,118 @@
# Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)
**Role**: Professional Quality Assurance Engineer
**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 75%.
**Constraints**: This phase is MANDATORY and cannot be skipped.
## Step 1 — Build the requirements checklist
Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, classify its shape (input/output or behavioral) and extract:
**Input/output tests:**
| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
**Behavioral tests:**
| # | Test Scenario ID | Test Name | Trigger Condition | Observable Behavior | Pass/Fail Criterion | Quantifiable? |
|---|-----------------|-----------|-------------------|--------------------|--------------------|---------------|
| 1 | [ID] | [name] | [e.g., service receives SIGTERM] | [e.g., drain logs emitted, port closed] | [e.g., drain completes ≤30s] | [Yes/No] |
Present both tables to the user.
## Step 2 — Ask user to provide missing test data AND expected results
For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
> **Option A — Provide the missing items**: Supply what is missing:
> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
>
> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
> - "HTTP 200 with JSON body matching `expected_response_01.json`"
> - "Processing time < 500ms"
> - "0 false positives in the output set"
>
> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
**BLOCKING**: Wait for the user's response for every missing item.
## Step 3 — Validate provided data and expected results
For each item where the user chose **Option A**:
**Input data validation**:
1. Verify the data file(s) exist at the indicated location
2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
**Expected result validation**:
4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/`
5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
- Exact values (counts, strings, status codes)
- Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
- Pattern matches (regex, substring, JSON schema)
- Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
- Reference file for structural comparison (JSON diff, CSV diff)
6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
If any validation fails, report the specific issue and loop back to Step 2 for that item.
## Step 4 — Remove tests without data or expected results
For each item where the user chose **Option B**:
1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
2. Remove the test scenario from the respective test file
3. Remove corresponding rows from `traceability-matrix.md`
4. Update `test-data.md` to reflect the removal
**Save action**: Write updated files under TESTS_OUTPUT_DIR:
- `test-data.md`
- Affected test files (if tests removed)
- `traceability-matrix.md` (if tests removed)
## Step 5 — Final coverage check
After all removals, recalculate coverage:
1. Count remaining test scenarios that trace to acceptance criteria
2. Count total acceptance criteria + restrictions
3. Calculate coverage percentage: `covered_items / total_items * 100`
| Metric | Value |
|--------|-------|
| Total AC + Restrictions | ? |
| Covered by remaining tests | ? |
| **Coverage %** | **?%** |
**Decision**:
- **Coverage ≥ 75%** → Phase 3 **PASSED**. Present final summary to user.
- **Coverage < 75%** → Phase 3 **FAILED**. Report:
> ❌ Test coverage dropped to **X%** (minimum 75% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
>
> | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
> |---|---|---|
>
> **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
**BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 75%.
## Phase 3 Completion
When coverage ≥ 75% and all remaining tests have validated data AND quantifiable expected results:
1. Present the final coverage report
2. List all removed tests (if any) with reasons
3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
4. Confirm all artifacts are saved and consistent
After Phase 3 completion, run `phases/hardware-assessment.md` before Phase 4.
@@ -0,0 +1,60 @@
# Phase 4: Test Runner Script Generation
**Skip condition**: If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.
**Role**: DevOps engineer
**Goal**: Generate executable shell scripts that run the specified tests, so autodev and CI can invoke them consistently.
**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Hardware-Dependency Assessment decision recorded in `environment.md`.
**Prerequisite**: `phases/hardware-assessment.md` must have completed and written the "Test Execution" section to `TESTS_OUTPUT_DIR/environment.md`.
## Step 1 — Detect test infrastructure
1. Identify the project's test runner from manifests and config files:
- Python: `pytest` (`pyproject.toml`, `setup.cfg`, `pytest.ini`)
- .NET: `dotnet test` (`*.csproj`, `*.sln`)
- Rust: `cargo test` (`Cargo.toml`)
- Node: `npm test` or `vitest` / `jest` (`package.json`)
2. Check the Hardware-Dependency Assessment result recorded in `environment.md`:
- If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
- If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
- If **both** was chosen → generate both
3. Identify performance/load testing tools from dependencies (`k6`, `locust`, `artillery`, `wrk`, or built-in benchmarks)
4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
## Step 2 — Generate test runner
**Docker is the default.** Only generate a local `scripts/run-tests.sh` if the Hardware-Dependency Assessment determined **local** or **both** execution (i.e., the project requires real hardware like GPU/CoreML/TPU/sensors). For all other projects, use `docker-compose.test.yml` — it provides reproducibility, isolation, and CI parity without a custom shell script.
**If local script is needed** — create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
1. Set `set -euo pipefail` and trap cleanup on EXIT
2. **Install all project and test dependencies** (e.g. `pip install -q -r requirements.txt -r e2e/requirements.txt`, `dotnet restore`, `npm ci`). This prevents collection-time import errors on fresh environments.
3. Optionally accept a `--unit-only` flag to skip blackbox tests
4. Run unit/blackbox tests using the detected test runner (activate virtualenv if present, run test runner directly on host)
5. Print a summary of passed/failed/skipped tests
6. Exit 0 on all pass, exit 1 on any failure
**If Docker** — generate or update `docker-compose.test.yml` that builds the test image, installs all dependencies inside the container, runs the test suite, and exits with the test runner's exit code.
## Step 3 — Generate `scripts/run-performance-tests.sh`
Create `scripts/run-performance-tests.sh` at the project root. The script must:
1. Set `set -euo pipefail` and trap cleanup on EXIT
2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
3. Start the system under test (local or docker-compose, matching the Hardware-Dependency Assessment decision)
4. Run load/performance scenarios using the detected tool
5. Compare results against threshold values from the test spec
6. Print a pass/fail summary per scenario
7. Exit 0 if all thresholds met, exit 1 otherwise
## Step 4 — Verify scripts
1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
2. Mark both scripts as executable (`chmod +x`)
3. Present a summary of what each script does to the user
## Save action
Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
@@ -0,0 +1,78 @@
# Hardware-Dependency & Execution Environment Assessment (BLOCKING)
Runs between Phase 3 and Phase 4.
Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). However, hardware-dependent projects may require local execution to exercise the real code paths. This assessment determines the right execution strategy by scanning both documentation and source code.
## Step 1 — Documentation scan
Check the following files for mentions of hardware-specific requirements:
| File | Look for |
|------|----------|
| `_docs/00_problem/restrictions.md` | Platform requirements, hardware constraints, OS-specific features |
| `_docs/01_solution/solution.md` | Engine selection logic, platform-dependent paths, hardware acceleration |
| `_docs/02_document/architecture.md` | Component diagrams showing hardware layers, engine adapters |
| `_docs/02_document/components/*/description.md` | Per-component hardware mentions |
| `TESTS_OUTPUT_DIR/environment.md` | Existing environment decisions |
## Step 2 — Code scan
Search the project source for indicators of hardware dependence. The project is **hardware-dependent** if ANY of the following are found:
| Category | Code indicators (imports, APIs, config) |
|----------|-----------------------------------------|
| GPU / CUDA | `import pycuda`, `import tensorrt`, `import pynvml`, `torch.cuda`, `nvidia-smi`, `CUDA_VISIBLE_DEVICES`, `runtime: nvidia` |
| Apple Neural Engine / CoreML | `import coremltools`, `CoreML`, `MLModel`, `ComputeUnit`, `MPS`, `sys.platform == "darwin"`, `platform.machine() == "arm64"` |
| OpenCL / Vulkan | `import pyopencl`, `clCreateContext`, vulkan headers |
| TPU / FPGA | `import tensorflow.distribute.TPUStrategy`, FPGA bitstream loaders |
| Sensors / Cameras | `import cv2.VideoCapture(0)` (device index), serial port access, GPIO, V4L2 |
| OS-specific services | Kernel modules (`modprobe`), host-level drivers, platform-gated code (`sys.platform` branches selecting different backends) |
Also check dependency files (`requirements.txt`, `setup.py`, `pyproject.toml`, `Cargo.toml`, `*.csproj`) for hardware-specific packages.
## Step 3 — Classify the project
Based on Steps 12, classify the project:
- **Not hardware-dependent**: no indicators found → use Docker (preferred default), skip to Step 5 "Record the decision"
- **Hardware-dependent**: one or more indicators found → proceed to Step 4
## Step 4 — Present execution environment choice
Present the findings and ask the user using Choose format:
```
══════════════════════════════════════
DECISION REQUIRED: Test execution environment
══════════════════════════════════════
Hardware dependencies detected:
- [list each indicator found, with file:line]
══════════════════════════════════════
Running in Docker means these hardware code paths
are NOT exercised — Docker uses a Linux VM where
[specific hardware, e.g. CoreML / CUDA] is unavailable.
The system would fall back to [fallback engine/path].
══════════════════════════════════════
A) Local execution only (tests the real hardware path)
B) Docker execution only (tests the fallback path)
C) Both local and Docker (tests both paths, requires
two test runs — recommended for CI with heterogeneous
runners)
══════════════════════════════════════
Recommendation: [A, B, or C] — [reason]
══════════════════════════════════════
```
## Step 5 — Record the decision
Write or update a **"Test Execution"** section in `TESTS_OUTPUT_DIR/environment.md` with:
1. **Decision**: local / docker / both
2. **Hardware dependencies found**: list with file references
3. **Execution instructions** per chosen mode:
- **Local mode**: prerequisites (OS, SDK, hardware), how to start services, how to run the test runner, environment variables
- **Docker mode**: docker-compose profile/command, required images, how results are collected
- **Both mode**: instructions for each, plus guidance on which CI runner type runs which mode
The decision is consumed by Phase 4 to choose between local `scripts/run-tests.sh` and `docker-compose.test.yml`.