Sync .cursor from suite (autodev orchestrator + monorepo skills)

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-18 22:04:19 +03:00
parent 229188acbf
commit c48dbfbb81
60 changed files with 4232 additions and 1728 deletions
@@ -0,0 +1,39 @@
# Phase 1: Input Data & Expected Results Completeness Analysis
**Role**: Professional Quality Assurance Engineer
**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios, and whether every input is paired with a quantifiable expected result.
**Constraints**: Analysis only — no test specs yet.
## Steps
1. Read `_docs/01_solution/solution.md`
2. Read `acceptance_criteria.md`, `restrictions.md`
3. Read testing strategy from `solution.md` (if present)
4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/`
6. Analyze `input_data/` contents against:
- Coverage of acceptance criteria scenarios
- Coverage of restriction edge cases
- Coverage of testing strategy requirements
7. Analyze `input_data/expected_results/results_report.md` completeness:
- Every input data item has a corresponding expected result row in the mapping
- Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
- Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
- Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
8. Present input-to-expected-result pairing assessment:
| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
|------------|--------------------------|---------------|----------------|
| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
9. Threshold: at least 75% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
## Blocking
**BLOCKING**: Do NOT proceed to Phase 2 until the user confirms both input data coverage AND expected results completeness are sufficient.
## No save action
Phase 1 does not write an artifact. Findings feed Phase 2.
@@ -0,0 +1,49 @@
# Phase 2: Test Scenario Specification
**Role**: Professional Quality Assurance Engineer
**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios.
**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
## Steps
Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
## Self-verification
- [ ] Every acceptance criterion is covered by at least one test scenario
- [ ] Every restriction is verified by at least one test scenario
- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md`
- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
- [ ] Positive and negative scenarios are balanced
- [ ] Consumer app has no direct access to system internals
- [ ] Test environment matches project constraints (see `phases/hardware-assessment.md`, which runs before Phase 4)
- [ ] External dependencies have mock/stub services defined
- [ ] Traceability matrix has no uncovered AC or restrictions
## Save action
Write all files under TESTS_OUTPUT_DIR:
- `environment.md`
- `test-data.md`
- `blackbox-tests.md`
- `performance-tests.md`
- `resilience-tests.md`
- `security-tests.md`
- `resource-limit-tests.md`
- `traceability-matrix.md`
## Blocking
**BLOCKING**: Present test coverage summary (from `traceability-matrix.md`) to user. Do NOT proceed to Phase 3 until confirmed.
Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
@@ -0,0 +1,118 @@
# Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)
**Role**: Professional Quality Assurance Engineer
**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 75%.
**Constraints**: This phase is MANDATORY and cannot be skipped.
## Step 1 — Build the requirements checklist
Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, classify its shape (input/output or behavioral) and extract:
**Input/output tests:**
| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
**Behavioral tests:**
| # | Test Scenario ID | Test Name | Trigger Condition | Observable Behavior | Pass/Fail Criterion | Quantifiable? |
|---|-----------------|-----------|-------------------|--------------------|--------------------|---------------|
| 1 | [ID] | [name] | [e.g., service receives SIGTERM] | [e.g., drain logs emitted, port closed] | [e.g., drain completes ≤30s] | [Yes/No] |
Present both tables to the user.
## Step 2 — Ask user to provide missing test data AND expected results
For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
> **Option A — Provide the missing items**: Supply what is missing:
> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
>
> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
> - "HTTP 200 with JSON body matching `expected_response_01.json`"
> - "Processing time < 500ms"
> - "0 false positives in the output set"
>
> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
**BLOCKING**: Wait for the user's response for every missing item.
## Step 3 — Validate provided data and expected results
For each item where the user chose **Option A**:
**Input data validation**:
1. Verify the data file(s) exist at the indicated location
2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
**Expected result validation**:
4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/`
5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
- Exact values (counts, strings, status codes)
- Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
- Pattern matches (regex, substring, JSON schema)
- Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
- Reference file for structural comparison (JSON diff, CSV diff)
6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
If any validation fails, report the specific issue and loop back to Step 2 for that item.
## Step 4 — Remove tests without data or expected results
For each item where the user chose **Option B**:
1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
2. Remove the test scenario from the respective test file
3. Remove corresponding rows from `traceability-matrix.md`
4. Update `test-data.md` to reflect the removal
**Save action**: Write updated files under TESTS_OUTPUT_DIR:
- `test-data.md`
- Affected test files (if tests removed)
- `traceability-matrix.md` (if tests removed)
## Step 5 — Final coverage check
After all removals, recalculate coverage:
1. Count remaining test scenarios that trace to acceptance criteria
2. Count total acceptance criteria + restrictions
3. Calculate coverage percentage: `covered_items / total_items * 100`
| Metric | Value |
|--------|-------|
| Total AC + Restrictions | ? |
| Covered by remaining tests | ? |
| **Coverage %** | **?%** |
**Decision**:
- **Coverage ≥ 75%** → Phase 3 **PASSED**. Present final summary to user.
- **Coverage < 75%** → Phase 3 **FAILED**. Report:
> ❌ Test coverage dropped to **X%** (minimum 75% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
>
> | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
> |---|---|---|
>
> **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
**BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 75%.
## Phase 3 Completion
When coverage ≥ 75% and all remaining tests have validated data AND quantifiable expected results:
1. Present the final coverage report
2. List all removed tests (if any) with reasons
3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
4. Confirm all artifacts are saved and consistent
After Phase 3 completion, run `phases/hardware-assessment.md` before Phase 4.
@@ -0,0 +1,60 @@
# Phase 4: Test Runner Script Generation
**Skip condition**: If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.
**Role**: DevOps engineer
**Goal**: Generate executable shell scripts that run the specified tests, so autodev and CI can invoke them consistently.
**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Hardware-Dependency Assessment decision recorded in `environment.md`.
**Prerequisite**: `phases/hardware-assessment.md` must have completed and written the "Test Execution" section to `TESTS_OUTPUT_DIR/environment.md`.
## Step 1 — Detect test infrastructure
1. Identify the project's test runner from manifests and config files:
- Python: `pytest` (`pyproject.toml`, `setup.cfg`, `pytest.ini`)
- .NET: `dotnet test` (`*.csproj`, `*.sln`)
- Rust: `cargo test` (`Cargo.toml`)
- Node: `npm test` or `vitest` / `jest` (`package.json`)
2. Check the Hardware-Dependency Assessment result recorded in `environment.md`:
- If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
- If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
- If **both** was chosen → generate both
3. Identify performance/load testing tools from dependencies (`k6`, `locust`, `artillery`, `wrk`, or built-in benchmarks)
4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
## Step 2 — Generate test runner
**Docker is the default.** Only generate a local `scripts/run-tests.sh` if the Hardware-Dependency Assessment determined **local** or **both** execution (i.e., the project requires real hardware like GPU/CoreML/TPU/sensors). For all other projects, use `docker-compose.test.yml` — it provides reproducibility, isolation, and CI parity without a custom shell script.
**If local script is needed** — create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
1. Set `set -euo pipefail` and trap cleanup on EXIT
2. **Install all project and test dependencies** (e.g. `pip install -q -r requirements.txt -r e2e/requirements.txt`, `dotnet restore`, `npm ci`). This prevents collection-time import errors on fresh environments.
3. Optionally accept a `--unit-only` flag to skip blackbox tests
4. Run unit/blackbox tests using the detected test runner (activate virtualenv if present, run test runner directly on host)
5. Print a summary of passed/failed/skipped tests
6. Exit 0 on all pass, exit 1 on any failure
**If Docker** — generate or update `docker-compose.test.yml` that builds the test image, installs all dependencies inside the container, runs the test suite, and exits with the test runner's exit code.
## Step 3 — Generate `scripts/run-performance-tests.sh`
Create `scripts/run-performance-tests.sh` at the project root. The script must:
1. Set `set -euo pipefail` and trap cleanup on EXIT
2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
3. Start the system under test (local or docker-compose, matching the Hardware-Dependency Assessment decision)
4. Run load/performance scenarios using the detected tool
5. Compare results against threshold values from the test spec
6. Print a pass/fail summary per scenario
7. Exit 0 if all thresholds met, exit 1 otherwise
## Step 4 — Verify scripts
1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
2. Mark both scripts as executable (`chmod +x`)
3. Present a summary of what each script does to the user
## Save action
Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
@@ -0,0 +1,78 @@
# Hardware-Dependency & Execution Environment Assessment (BLOCKING)
Runs between Phase 3 and Phase 4.
Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). However, hardware-dependent projects may require local execution to exercise the real code paths. This assessment determines the right execution strategy by scanning both documentation and source code.
## Step 1 — Documentation scan
Check the following files for mentions of hardware-specific requirements:
| File | Look for |
|------|----------|
| `_docs/00_problem/restrictions.md` | Platform requirements, hardware constraints, OS-specific features |
| `_docs/01_solution/solution.md` | Engine selection logic, platform-dependent paths, hardware acceleration |
| `_docs/02_document/architecture.md` | Component diagrams showing hardware layers, engine adapters |
| `_docs/02_document/components/*/description.md` | Per-component hardware mentions |
| `TESTS_OUTPUT_DIR/environment.md` | Existing environment decisions |
## Step 2 — Code scan
Search the project source for indicators of hardware dependence. The project is **hardware-dependent** if ANY of the following are found:
| Category | Code indicators (imports, APIs, config) |
|----------|-----------------------------------------|
| GPU / CUDA | `import pycuda`, `import tensorrt`, `import pynvml`, `torch.cuda`, `nvidia-smi`, `CUDA_VISIBLE_DEVICES`, `runtime: nvidia` |
| Apple Neural Engine / CoreML | `import coremltools`, `CoreML`, `MLModel`, `ComputeUnit`, `MPS`, `sys.platform == "darwin"`, `platform.machine() == "arm64"` |
| OpenCL / Vulkan | `import pyopencl`, `clCreateContext`, vulkan headers |
| TPU / FPGA | `import tensorflow.distribute.TPUStrategy`, FPGA bitstream loaders |
| Sensors / Cameras | `import cv2.VideoCapture(0)` (device index), serial port access, GPIO, V4L2 |
| OS-specific services | Kernel modules (`modprobe`), host-level drivers, platform-gated code (`sys.platform` branches selecting different backends) |
Also check dependency files (`requirements.txt`, `setup.py`, `pyproject.toml`, `Cargo.toml`, `*.csproj`) for hardware-specific packages.
## Step 3 — Classify the project
Based on Steps 12, classify the project:
- **Not hardware-dependent**: no indicators found → use Docker (preferred default), skip to Step 5 "Record the decision"
- **Hardware-dependent**: one or more indicators found → proceed to Step 4
## Step 4 — Present execution environment choice
Present the findings and ask the user using Choose format:
```
══════════════════════════════════════
DECISION REQUIRED: Test execution environment
══════════════════════════════════════
Hardware dependencies detected:
- [list each indicator found, with file:line]
══════════════════════════════════════
Running in Docker means these hardware code paths
are NOT exercised — Docker uses a Linux VM where
[specific hardware, e.g. CoreML / CUDA] is unavailable.
The system would fall back to [fallback engine/path].
══════════════════════════════════════
A) Local execution only (tests the real hardware path)
B) Docker execution only (tests the fallback path)
C) Both local and Docker (tests both paths, requires
two test runs — recommended for CI with heterogeneous
runners)
══════════════════════════════════════
Recommendation: [A, B, or C] — [reason]
══════════════════════════════════════
```
## Step 5 — Record the decision
Write or update a **"Test Execution"** section in `TESTS_OUTPUT_DIR/environment.md` with:
1. **Decision**: local / docker / both
2. **Hardware dependencies found**: list with file references
3. **Execution instructions** per chosen mode:
- **Local mode**: prerequisites (OS, SDK, hardware), how to start services, how to run the test runner, environment variables
- **Docker mode**: docker-compose profile/command, required images, how results are collected
- **Both mode**: instructions for each, plus guidance on which CI runner type runs which mode
The decision is consumed by Phase 4 to choose between local `scripts/run-tests.sh` and `docker-compose.test.yml`.