mirror of
https://github.com/azaion/loader.git
synced 2026-04-22 22:06:33 +00:00
555 lines
32 KiB
Markdown
555 lines
32 KiB
Markdown
---
|
||
name: test-spec
|
||
description: |
|
||
Test specification skill. Analyzes input data and expected results completeness,
|
||
then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits)
|
||
that treat the system as a black box. Every test pairs input data with quantifiable expected results
|
||
so tests can verify correctness, not just execution.
|
||
4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate,
|
||
test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/.
|
||
Trigger phrases:
|
||
- "test spec", "test specification", "test scenarios"
|
||
- "blackbox test spec", "black box tests", "blackbox tests"
|
||
- "performance tests", "resilience tests", "security tests"
|
||
category: build
|
||
tags: [testing, black-box, blackbox-tests, test-specification, qa]
|
||
disable-model-invocation: true
|
||
---
|
||
|
||
# Test Scenario Specification
|
||
|
||
Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
|
||
|
||
## Core Principles
|
||
|
||
- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
|
||
- **Traceability**: every test traces to at least one acceptance criterion or restriction
|
||
- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
|
||
- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
|
||
- **Spec, don't code**: this workflow produces test specifications, never test implementation code
|
||
- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed
|
||
- **No test without expected result**: every test scenario MUST pair input data with a quantifiable expected result; a test that cannot compare actual output against a known-correct answer is not verifiable and must be removed
|
||
|
||
## Context Resolution
|
||
|
||
Fixed paths — no mode detection needed:
|
||
|
||
- PROBLEM_DIR: `_docs/00_problem/`
|
||
- SOLUTION_DIR: `_docs/01_solution/`
|
||
- DOCUMENT_DIR: `_docs/02_document/`
|
||
- TESTS_OUTPUT_DIR: `_docs/02_document/tests/`
|
||
|
||
Announce the resolved paths to the user before proceeding.
|
||
|
||
## Input Specification
|
||
|
||
### Required Files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `_docs/00_problem/problem.md` | Problem description and context |
|
||
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
|
||
| `_docs/00_problem/restrictions.md` | Constraints and limitations |
|
||
| `_docs/00_problem/input_data/` | Reference data examples, expected results, and optional reference files |
|
||
| `_docs/01_solution/solution.md` | Finalized solution |
|
||
|
||
### Expected Results Specification
|
||
|
||
Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be **quantifiable** — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict.
|
||
|
||
Expected results live inside `_docs/00_problem/input_data/` in one or both of:
|
||
|
||
1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md`
|
||
|
||
2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file
|
||
|
||
```
|
||
input_data/
|
||
├── expected_results/ ← required: expected results folder
|
||
│ ├── results_report.md ← required: input→expected result mapping
|
||
│ ├── image_01_expected.csv ← per-file expected detections
|
||
│ └── video_01_expected.csv
|
||
├── image_01.jpg
|
||
├── empty_scene.jpg
|
||
└── data_parameters.md
|
||
```
|
||
|
||
**Quantifiability requirements** (see template for full format and examples):
|
||
- Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`)
|
||
- Structured data: exact JSON/CSV values, or a reference file in `expected_results/`
|
||
- Counts: exact counts (e.g., "3 detections", "0 errors")
|
||
- Text/patterns: exact string or regex pattern to match
|
||
- Timing: threshold (e.g., "response ≤ 500ms")
|
||
- Error cases: expected error code, message pattern, or HTTP status
|
||
|
||
### Optional Files (used when available)
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
|
||
| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
|
||
| `DOCUMENT_DIR/components/` | Component specs for interface identification |
|
||
|
||
### Prerequisite Checks (BLOCKING)
|
||
|
||
1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
|
||
2. `restrictions.md` exists and is non-empty — **STOP if missing**
|
||
3. `input_data/` exists and contains at least one file — **STOP if missing**
|
||
4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."*
|
||
5. `problem.md` exists and is non-empty — **STOP if missing**
|
||
6. `solution.md` exists and is non-empty — **STOP if missing**
|
||
7. Create TESTS_OUTPUT_DIR if it does not exist
|
||
8. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**
|
||
|
||
## Artifact Management
|
||
|
||
### Directory Structure
|
||
|
||
```
|
||
TESTS_OUTPUT_DIR/
|
||
├── environment.md
|
||
├── test-data.md
|
||
├── blackbox-tests.md
|
||
├── performance-tests.md
|
||
├── resilience-tests.md
|
||
├── security-tests.md
|
||
├── resource-limit-tests.md
|
||
└── traceability-matrix.md
|
||
```
|
||
|
||
### Save Timing
|
||
|
||
| Phase | Save immediately after | Filename |
|
||
|-------|------------------------|----------|
|
||
| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
|
||
| Phase 2 | Environment spec | `environment.md` |
|
||
| Phase 2 | Test data spec | `test-data.md` |
|
||
| Phase 2 | Blackbox tests | `blackbox-tests.md` |
|
||
| Phase 2 | Performance tests | `performance-tests.md` |
|
||
| Phase 2 | Resilience tests | `resilience-tests.md` |
|
||
| Phase 2 | Security tests | `security-tests.md` |
|
||
| Phase 2 | Resource limit tests | `resource-limit-tests.md` |
|
||
| Phase 2 | Traceability matrix | `traceability-matrix.md` |
|
||
| Phase 3 | Updated test data spec (if data added) | `test-data.md` |
|
||
| Phase 3 | Updated test files (if tests removed) | respective test file |
|
||
| Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
|
||
| Phase 4 | Test runner script | `scripts/run-tests.sh` |
|
||
| Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` |
|
||
|
||
### Resumability
|
||
|
||
If TESTS_OUTPUT_DIR already contains files:
|
||
|
||
1. List existing files and match them to the save timing table above
|
||
2. Identify which phase/artifacts are complete
|
||
3. Resume from the next incomplete artifact
|
||
4. Inform the user which artifacts are being skipped
|
||
|
||
## Progress Tracking
|
||
|
||
At the start of execution, create a TodoWrite with all four phases. Update status as each phase completes.
|
||
|
||
## Workflow
|
||
|
||
### Phase 1: Input Data Completeness Analysis
|
||
|
||
**Role**: Professional Quality Assurance Engineer
|
||
**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
|
||
**Constraints**: Analysis only — no test specs yet
|
||
|
||
1. Read `_docs/01_solution/solution.md`
|
||
2. Read `acceptance_criteria.md`, `restrictions.md`
|
||
3. Read testing strategy from solution.md (if present)
|
||
4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
|
||
5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/`
|
||
6. Analyze `input_data/` contents against:
|
||
- Coverage of acceptance criteria scenarios
|
||
- Coverage of restriction edge cases
|
||
- Coverage of testing strategy requirements
|
||
7. Analyze `input_data/expected_results/results_report.md` completeness:
|
||
- Every input data item has a corresponding expected result row in the mapping
|
||
- Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
|
||
- Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
|
||
- Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
|
||
8. Present input-to-expected-result pairing assessment:
|
||
|
||
| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
|
||
|------------|--------------------------|---------------|----------------|
|
||
| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
|
||
|
||
9. Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
|
||
10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
|
||
11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
|
||
|
||
**BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient.
|
||
|
||
---
|
||
|
||
### Phase 2: Test Scenario Specification
|
||
|
||
**Role**: Professional Quality Assurance Engineer
|
||
**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios
|
||
**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
|
||
|
||
Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
|
||
|
||
1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
|
||
2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
|
||
3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
|
||
4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
|
||
5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
|
||
6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
|
||
7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
|
||
8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
|
||
|
||
**Self-verification**:
|
||
- [ ] Every acceptance criterion is covered by at least one test scenario
|
||
- [ ] Every restriction is verified by at least one test scenario
|
||
- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md`
|
||
- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
|
||
- [ ] Positive and negative scenarios are balanced
|
||
- [ ] Consumer app has no direct access to system internals
|
||
- [ ] Test environment matches project constraints (see Hardware-Dependency & Execution Environment Assessment below)
|
||
- [ ] External dependencies have mock/stub services defined
|
||
- [ ] Traceability matrix has no uncovered AC or restrictions
|
||
|
||
**Save action**: Write all files under TESTS_OUTPUT_DIR:
|
||
- `environment.md`
|
||
- `test-data.md`
|
||
- `blackbox-tests.md`
|
||
- `performance-tests.md`
|
||
- `resilience-tests.md`
|
||
- `security-tests.md`
|
||
- `resource-limit-tests.md`
|
||
- `traceability-matrix.md`
|
||
|
||
**BLOCKING**: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed.
|
||
|
||
Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
|
||
|
||
---
|
||
|
||
### Phase 3: Test Data Validation Gate (HARD GATE)
|
||
|
||
**Role**: Professional Quality Assurance Engineer
|
||
**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%.
|
||
**Constraints**: This phase is MANDATORY and cannot be skipped.
|
||
|
||
#### Step 1 — Build the test-data and expected-result requirements checklist
|
||
|
||
Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, extract:
|
||
|
||
| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
|
||
|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
|
||
| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
|
||
|
||
Present this table to the user.
|
||
|
||
#### Step 2 — Ask user to provide missing test data AND expected results
|
||
|
||
For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
|
||
|
||
> **Option A — Provide the missing items**: Supply what is missing:
|
||
> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
|
||
> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
|
||
>
|
||
> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
|
||
> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
|
||
> - "HTTP 200 with JSON body matching `expected_response_01.json`"
|
||
> - "Processing time < 500ms"
|
||
> - "0 false positives in the output set"
|
||
>
|
||
> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
|
||
|
||
**BLOCKING**: Wait for the user's response for every missing item.
|
||
|
||
#### Step 3 — Validate provided data and expected results
|
||
|
||
For each item where the user chose **Option A**:
|
||
|
||
**Input data validation**:
|
||
1. Verify the data file(s) exist at the indicated location
|
||
2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
|
||
3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
|
||
|
||
**Expected result validation**:
|
||
4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/`
|
||
5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
|
||
- Exact values (counts, strings, status codes)
|
||
- Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
|
||
- Pattern matches (regex, substring, JSON schema)
|
||
- Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
|
||
- Reference file for structural comparison (JSON diff, CSV diff)
|
||
6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
|
||
7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
|
||
|
||
If any validation fails, report the specific issue and loop back to Step 2 for that item.
|
||
|
||
#### Step 4 — Remove tests without data or expected results
|
||
|
||
For each item where the user chose **Option B**:
|
||
|
||
1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
|
||
2. Remove the test scenario from the respective test file
|
||
3. Remove corresponding rows from `traceability-matrix.md`
|
||
4. Update `test-data.md` to reflect the removal
|
||
|
||
**Save action**: Write updated files under TESTS_OUTPUT_DIR:
|
||
- `test-data.md`
|
||
- Affected test files (if tests removed)
|
||
- `traceability-matrix.md` (if tests removed)
|
||
|
||
#### Step 5 — Final coverage check
|
||
|
||
After all removals, recalculate coverage:
|
||
|
||
1. Count remaining test scenarios that trace to acceptance criteria
|
||
2. Count total acceptance criteria + restrictions
|
||
3. Calculate coverage percentage: `covered_items / total_items * 100`
|
||
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Total AC + Restrictions | ? |
|
||
| Covered by remaining tests | ? |
|
||
| **Coverage %** | **?%** |
|
||
|
||
**Decision**:
|
||
|
||
- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user.
|
||
- **Coverage < 70%** → Phase 3 **FAILED**. Report:
|
||
> ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
|
||
>
|
||
> | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
|
||
> |---|---|---|
|
||
>
|
||
> **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
|
||
|
||
**BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%.
|
||
|
||
#### Phase 3 Completion
|
||
|
||
When coverage ≥ 70% and all remaining tests have validated data AND quantifiable expected results:
|
||
|
||
1. Present the final coverage report
|
||
2. List all removed tests (if any) with reasons
|
||
3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
|
||
4. Confirm all artifacts are saved and consistent
|
||
|
||
---
|
||
|
||
### Hardware-Dependency & Execution Environment Assessment (BLOCKING — runs before Phase 4)
|
||
|
||
Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). However, hardware-dependent projects may require local execution to exercise the real code paths. This assessment determines the right execution strategy by scanning both documentation and source code.
|
||
|
||
#### Step 1 — Documentation scan
|
||
|
||
Check the following files for mentions of hardware-specific requirements:
|
||
|
||
| File | Look for |
|
||
|------|----------|
|
||
| `_docs/00_problem/restrictions.md` | Platform requirements, hardware constraints, OS-specific features |
|
||
| `_docs/01_solution/solution.md` | Engine selection logic, platform-dependent paths, hardware acceleration |
|
||
| `_docs/02_document/architecture.md` | Component diagrams showing hardware layers, engine adapters |
|
||
| `_docs/02_document/components/*/description.md` | Per-component hardware mentions |
|
||
| `TESTS_OUTPUT_DIR/environment.md` | Existing environment decisions |
|
||
|
||
#### Step 2 — Code scan
|
||
|
||
Search the project source for indicators of hardware dependence. The project is **hardware-dependent** if ANY of the following are found:
|
||
|
||
| Category | Code indicators (imports, APIs, config) |
|
||
|----------|-----------------------------------------|
|
||
| GPU / CUDA | `import pycuda`, `import tensorrt`, `import pynvml`, `torch.cuda`, `nvidia-smi`, `CUDA_VISIBLE_DEVICES`, `runtime: nvidia` |
|
||
| Apple Neural Engine / CoreML | `import coremltools`, `CoreML`, `MLModel`, `ComputeUnit`, `MPS`, `sys.platform == "darwin"`, `platform.machine() == "arm64"` |
|
||
| OpenCL / Vulkan | `import pyopencl`, `clCreateContext`, vulkan headers |
|
||
| TPU / FPGA | `import tensorflow.distribute.TPUStrategy`, FPGA bitstream loaders |
|
||
| Sensors / Cameras | `import cv2.VideoCapture(0)` (device index), serial port access, GPIO, V4L2 |
|
||
| OS-specific services | Kernel modules (`modprobe`), host-level drivers, platform-gated code (`sys.platform` branches selecting different backends) |
|
||
|
||
Also check dependency files (`requirements.txt`, `setup.py`, `pyproject.toml`, `Cargo.toml`, `*.csproj`) for hardware-specific packages.
|
||
|
||
#### Step 3 — Classify the project
|
||
|
||
Based on Steps 1–2, classify the project:
|
||
|
||
- **Not hardware-dependent**: no indicators found → use Docker (preferred default), skip to "Record the decision" below
|
||
- **Hardware-dependent**: one or more indicators found → proceed to Step 4
|
||
|
||
#### Step 4 — Present execution environment choice
|
||
|
||
Present the findings and ask the user using Choose format:
|
||
|
||
```
|
||
══════════════════════════════════════
|
||
DECISION REQUIRED: Test execution environment
|
||
══════════════════════════════════════
|
||
Hardware dependencies detected:
|
||
- [list each indicator found, with file:line]
|
||
══════════════════════════════════════
|
||
Running in Docker means these hardware code paths
|
||
are NOT exercised — Docker uses a Linux VM where
|
||
[specific hardware, e.g. CoreML / CUDA] is unavailable.
|
||
The system would fall back to [fallback engine/path].
|
||
══════════════════════════════════════
|
||
A) Local execution only (tests the real hardware path)
|
||
B) Docker execution only (tests the fallback path)
|
||
C) Both local and Docker (tests both paths, requires
|
||
two test runs — recommended for CI with heterogeneous
|
||
runners)
|
||
══════════════════════════════════════
|
||
Recommendation: [A, B, or C] — [reason]
|
||
══════════════════════════════════════
|
||
```
|
||
|
||
#### Step 5 — Record the decision
|
||
|
||
Write or update a **"Test Execution"** section in `TESTS_OUTPUT_DIR/environment.md` with:
|
||
|
||
1. **Decision**: local / docker / both
|
||
2. **Hardware dependencies found**: list with file references
|
||
3. **Execution instructions** per chosen mode:
|
||
- **Local mode**: prerequisites (OS, SDK, hardware), how to start services, how to run the test runner, environment variables
|
||
- **Docker mode**: docker-compose profile/command, required images, how results are collected
|
||
- **Both mode**: instructions for each, plus guidance on which CI runner type runs which mode
|
||
|
||
---
|
||
|
||
### Phase 4: Test Runner Script Generation
|
||
|
||
**Skip condition**: If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.
|
||
|
||
**Role**: DevOps engineer
|
||
**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
|
||
**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Docker Suitability Assessment decision above.
|
||
|
||
#### Step 1 — Detect test infrastructure
|
||
|
||
1. Identify the project's test runner from manifests and config files:
|
||
- Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini)
|
||
- .NET: `dotnet test` (*.csproj, *.sln)
|
||
- Rust: `cargo test` (Cargo.toml)
|
||
- Node: `npm test` or `vitest` / `jest` (package.json)
|
||
2. Check Docker Suitability Assessment result:
|
||
- If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
|
||
- If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
|
||
3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
|
||
4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
|
||
|
||
#### Step 2 — Generate test runner
|
||
|
||
**Docker is the default.** Only generate a local `scripts/run-tests.sh` if the Hardware-Dependency Assessment determined **local** or **both** execution (i.e., the project requires real hardware like GPU/CoreML/TPU/sensors). For all other projects, use `docker-compose.test.yml` — it provides reproducibility, isolation, and CI parity without a custom shell script.
|
||
|
||
**If local script is needed** — create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
|
||
|
||
1. Set `set -euo pipefail` and trap cleanup on EXIT
|
||
2. **Install all project and test dependencies** (e.g. `pip install -q -r requirements.txt -r e2e/requirements.txt`, `dotnet restore`, `npm ci`). This prevents collection-time import errors on fresh environments.
|
||
3. Optionally accept a `--unit-only` flag to skip blackbox tests
|
||
4. Run unit/blackbox tests using the detected test runner (activate virtualenv if present, run test runner directly on host)
|
||
5. Print a summary of passed/failed/skipped tests
|
||
6. Exit 0 on all pass, exit 1 on any failure
|
||
|
||
**If Docker** — generate or update `docker-compose.test.yml` that builds the test image, installs all dependencies inside the container, runs the test suite, and exits with the test runner's exit code.
|
||
|
||
#### Step 3 — Generate `scripts/run-performance-tests.sh`
|
||
|
||
Create `scripts/run-performance-tests.sh` at the project root. The script must:
|
||
|
||
1. Set `set -euo pipefail` and trap cleanup on EXIT
|
||
2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
|
||
3. Start the system under test (local or docker-compose, matching the Docker Suitability Assessment decision)
|
||
4. Run load/performance scenarios using the detected tool
|
||
5. Compare results against threshold values from the test spec
|
||
6. Print a pass/fail summary per scenario
|
||
7. Exit 0 if all thresholds met, exit 1 otherwise
|
||
|
||
#### Step 4 — Verify scripts
|
||
|
||
1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
|
||
2. Mark both scripts as executable (`chmod +x`)
|
||
3. Present a summary of what each script does to the user
|
||
|
||
**Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
|
||
|
||
---
|
||
|
||
## Escalation Rules
|
||
|
||
| Situation | Action |
|
||
|-----------|--------|
|
||
| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
|
||
| Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template |
|
||
| Ambiguous requirements | ASK user |
|
||
| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
|
||
| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
|
||
| Test scenario conflicts with restrictions | ASK user to clarify intent |
|
||
| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
|
||
| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
|
||
| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
|
||
|
||
## Common Mistakes
|
||
|
||
- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
|
||
- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
|
||
- **Missing expected results**: input data without a paired expected result is useless — the test cannot determine pass/fail without knowing what "correct" looks like
|
||
- **Non-quantifiable expected results**: "should return good results" is not verifiable; expected results must have exact values, tolerances, thresholds, or pattern matches that code can evaluate
|
||
- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
|
||
- **Untraceable tests**: every test should trace to at least one AC or restriction
|
||
- **Writing test code**: this skill produces specifications, never implementation code
|
||
- **Tests without data**: every test scenario MUST have concrete test data AND a quantifiable expected result; a test spec without either is not executable and must be removed
|
||
|
||
## Trigger Conditions
|
||
|
||
When the user wants to:
|
||
- Specify blackbox tests before implementation or refactoring
|
||
- Analyze input data completeness for test coverage
|
||
- Produce test scenarios from acceptance criteria
|
||
|
||
**Keywords**: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios"
|
||
|
||
## Methodology Quick Reference
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────────────┐
|
||
│ Test Scenario Specification (4-Phase) │
|
||
├──────────────────────────────────────────────────────────────────────┤
|
||
│ PREREQ: Data Gate (BLOCKING) │
|
||
│ → verify AC, restrictions, input_data (incl. expected_results.md) │
|
||
│ │
|
||
│ Phase 1: Input Data & Expected Results Completeness Analysis │
|
||
│ → assess input_data/ coverage vs AC scenarios (≥70%) │
|
||
│ → verify every input has a quantifiable expected result │
|
||
│ → present input→expected-result pairing assessment │
|
||
│ [BLOCKING: user confirms input data + expected results coverage] │
|
||
│ │
|
||
│ Phase 2: Test Scenario Specification │
|
||
│ → environment.md │
|
||
│ → test-data.md (with expected results mapping) │
|
||
│ → blackbox-tests.md (positive + negative) │
|
||
│ → performance-tests.md │
|
||
│ → resilience-tests.md │
|
||
│ → security-tests.md │
|
||
│ → resource-limit-tests.md │
|
||
│ → traceability-matrix.md │
|
||
│ [BLOCKING: user confirms test coverage] │
|
||
│ │
|
||
│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │
|
||
│ → build test-data + expected-result requirements checklist │
|
||
│ → ask user: provide data+result (A) or remove test (B) │
|
||
│ → validate input data (quality + quantity) │
|
||
│ → validate expected results (quantifiable + comparison method) │
|
||
│ → remove tests without data or expected result, warn user │
|
||
│ → final coverage check (≥70% or FAIL + loop back) │
|
||
│ [BLOCKING: coverage ≥ 70% required to pass] │
|
||
│ │
|
||
│ Phase 4: Test Runner Script Generation │
|
||
│ → detect test runner + docker-compose + load tool │
|
||
│ → scripts/run-tests.sh (unit + blackbox) │
|
||
│ → scripts/run-performance-tests.sh (load/perf scenarios) │
|
||
│ → verify scripts are valid and executable │
|
||
├──────────────────────────────────────────────────────────────────────┤
|
||
│ Principles: Black-box only · Traceability · Save immediately │
|
||
│ Ask don't assume · Spec don't code │
|
||
│ No test without data · No test without expected result │
|
||
└──────────────────────────────────────────────────────────────────────┘
|
||
```
|