Made-with: Cursor
26 KiB
name, description, category, tags, disable-model-invocation
| name | description | category | tags | disable-model-invocation | |||||
|---|---|---|---|---|---|---|---|---|---|
| test-spec | Test specification skill. Analyzes input data and expected results completeness, then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits) that treat the system as a black box. Every test pairs input data with quantifiable expected results so tests can verify correctness, not just execution. 4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate, test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/. Trigger phrases: - "test spec", "test specification", "test scenarios" - "blackbox test spec", "black box tests", "blackbox tests" - "performance tests", "resilience tests", "security tests" | build |
|
true |
Test Scenario Specification
Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
Core Principles
- Black-box only: tests describe observable behavior through public interfaces; no internal implementation details
- Traceability: every test traces to at least one acceptance criterion or restriction
- Save immediately: write artifacts to disk after each phase; never accumulate unsaved work
- Ask, don't assume: when requirements are ambiguous, ask the user before proceeding
- Spec, don't code: this workflow produces test specifications, never test implementation code
- No test without data: every test scenario MUST have concrete test data; tests without data are removed
- No test without expected result: every test scenario MUST pair input data with a quantifiable expected result; a test that cannot compare actual output against a known-correct answer is not verifiable and must be removed
Context Resolution
Fixed paths — no mode detection needed:
- PROBLEM_DIR:
_docs/00_problem/ - SOLUTION_DIR:
_docs/01_solution/ - DOCUMENT_DIR:
_docs/02_document/ - TESTS_OUTPUT_DIR:
_docs/02_document/tests/
Announce the resolved paths to the user before proceeding.
Input Specification
Required Files
| File | Purpose |
|---|---|
_docs/00_problem/problem.md |
Problem description and context |
_docs/00_problem/acceptance_criteria.md |
Measurable acceptance criteria |
_docs/00_problem/restrictions.md |
Constraints and limitations |
_docs/00_problem/input_data/ |
Reference data examples, expected results, and optional reference files |
_docs/01_solution/solution.md |
Finalized solution |
Expected Results Specification
Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be quantifiable — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict.
Expected results live inside _docs/00_problem/input_data/ in one or both of:
-
Mapping file (
input_data/expected_results/results_report.md): a table pairing each input with its quantifiable expected output, using the format defined in.cursor/skills/test-spec/templates/expected-results.md -
Reference files folder (
input_data/expected_results/): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file
input_data/
├── expected_results/ ← required: expected results folder
│ ├── results_report.md ← required: input→expected result mapping
│ ├── image_01_expected.csv ← per-file expected detections
│ └── video_01_expected.csv
├── image_01.jpg
├── empty_scene.jpg
└── data_parameters.md
Quantifiability requirements (see template for full format and examples):
- Numeric values: exact value or value ± tolerance (e.g.,
confidence ≥ 0.85,position ± 10px) - Structured data: exact JSON/CSV values, or a reference file in
expected_results/ - Counts: exact counts (e.g., "3 detections", "0 errors")
- Text/patterns: exact string or regex pattern to match
- Timing: threshold (e.g., "response ≤ 500ms")
- Error cases: expected error code, message pattern, or HTTP status
Optional Files (used when available)
| File | Purpose |
|---|---|
DOCUMENT_DIR/architecture.md |
System architecture for environment design |
DOCUMENT_DIR/system-flows.md |
System flows for test scenario coverage |
DOCUMENT_DIR/components/ |
Component specs for interface identification |
Prerequisite Checks (BLOCKING)
acceptance_criteria.mdexists and is non-empty — STOP if missingrestrictions.mdexists and is non-empty — STOP if missinginput_data/exists and contains at least one file — STOP if missinginput_data/expected_results/results_report.mdexists and is non-empty — STOP if missing. Prompt the user: "Expected results mapping is required. Please create_docs/00_problem/input_data/expected_results/results_report.mdpairing each input with its quantifiable expected output. Use.cursor/skills/test-spec/templates/expected-results.mdas the format reference."problem.mdexists and is non-empty — STOP if missingsolution.mdexists and is non-empty — STOP if missing- Create TESTS_OUTPUT_DIR if it does not exist
- If TESTS_OUTPUT_DIR already contains files, ask user: resume from last checkpoint or start fresh?
Artifact Management
Directory Structure
TESTS_OUTPUT_DIR/
├── environment.md
├── test-data.md
├── blackbox-tests.md
├── performance-tests.md
├── resilience-tests.md
├── security-tests.md
├── resource-limit-tests.md
└── traceability-matrix.md
Save Timing
| Phase | Save immediately after | Filename |
|---|---|---|
| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
| Phase 2 | Environment spec | environment.md |
| Phase 2 | Test data spec | test-data.md |
| Phase 2 | Blackbox tests | blackbox-tests.md |
| Phase 2 | Performance tests | performance-tests.md |
| Phase 2 | Resilience tests | resilience-tests.md |
| Phase 2 | Security tests | security-tests.md |
| Phase 2 | Resource limit tests | resource-limit-tests.md |
| Phase 2 | Traceability matrix | traceability-matrix.md |
| Phase 3 | Updated test data spec (if data added) | test-data.md |
| Phase 3 | Updated test files (if tests removed) | respective test file |
| Phase 3 | Updated traceability matrix (if tests removed) | traceability-matrix.md |
| Phase 4 | Test runner script | scripts/run-tests.sh |
| Phase 4 | Performance test runner script | scripts/run-performance-tests.sh |
Resumability
If TESTS_OUTPUT_DIR already contains files:
- List existing files and match them to the save timing table above
- Identify which phase/artifacts are complete
- Resume from the next incomplete artifact
- Inform the user which artifacts are being skipped
Progress Tracking
At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes.
Workflow
Phase 1: Input Data Completeness Analysis
Role: Professional Quality Assurance Engineer Goal: Assess whether the available input data is sufficient to build comprehensive test scenarios Constraints: Analysis only — no test specs yet
- Read
_docs/01_solution/solution.md - Read
acceptance_criteria.md,restrictions.md - Read testing strategy from solution.md (if present)
- If
DOCUMENT_DIR/architecture.mdandDOCUMENT_DIR/system-flows.mdexist, read them for additional context on system interfaces and flows - Read
input_data/expected_results/results_report.mdand any referenced files ininput_data/expected_results/ - Analyze
input_data/contents against:- Coverage of acceptance criteria scenarios
- Coverage of restriction edge cases
- Coverage of testing strategy requirements
- Analyze
input_data/expected_results/results_report.mdcompleteness:- Every input data item has a corresponding expected result row in the mapping
- Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
- Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
- Reference files in
input_data/expected_results/that are cited in the mapping actually exist and are valid
- Present input-to-expected-result pairing assessment:
| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
|---|---|---|---|
| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
- Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result (see
.cursor/rules/cursor-meta.mdcQuality Thresholds table) - If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to
input_data/and updateinput_data/expected_results/results_report.md - If expected results are missing or not quantifiable, ask user to provide them before proceeding
BLOCKING: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient.
Phase 2: Test Scenario Specification
Role: Professional Quality Assurance Engineer Goal: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios Constraints: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
- Define test environment using
.cursor/skills/plan/templates/test-environment.mdas structure - Define test data management using
.cursor/skills/plan/templates/test-data.mdas structure - Write blackbox test scenarios (positive + negative) using
.cursor/skills/plan/templates/blackbox-tests.mdas structure - Write performance test scenarios using
.cursor/skills/plan/templates/performance-tests.mdas structure - Write resilience test scenarios using
.cursor/skills/plan/templates/resilience-tests.mdas structure - Write security test scenarios using
.cursor/skills/plan/templates/security-tests.mdas structure - Write resource limit test scenarios using
.cursor/skills/plan/templates/resource-limit-tests.mdas structure - Build traceability matrix using
.cursor/skills/plan/templates/traceability-matrix.mdas structure
Self-verification:
- Every acceptance criterion is covered by at least one test scenario
- Every restriction is verified by at least one test scenario
- Every test scenario has a quantifiable expected result from
input_data/expected_results/results_report.md - Expected results use comparison methods from
.cursor/skills/test-spec/templates/expected-results.md - Positive and negative scenarios are balanced
- Consumer app has no direct access to system internals
- Docker environment is self-contained (
docker compose upsufficient) - External dependencies have mock/stub services defined
- Traceability matrix has no uncovered AC or restrictions
Save action: Write all files under TESTS_OUTPUT_DIR:
environment.mdtest-data.mdblackbox-tests.mdperformance-tests.mdresilience-tests.mdsecurity-tests.mdresource-limit-tests.mdtraceability-matrix.md
BLOCKING: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed.
Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
Phase 3: Test Data Validation Gate (HARD GATE)
Role: Professional Quality Assurance Engineer Goal: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%. Constraints: This phase is MANDATORY and cannot be skipped.
Step 1 — Build the test-data and expected-result requirements checklist
Scan blackbox-tests.md, performance-tests.md, resilience-tests.md, security-tests.md, and resource-limit-tests.md. For every test scenario, extract:
| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
|---|---|---|---|---|---|---|---|---|
| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
Present this table to the user.
Step 2 — Ask user to provide missing test data AND expected results
For each row where Input Provided? is No OR Expected Result Provided? is No, ask the user:
Option A — Provide the missing items: Supply what is missing:
- Missing input data: Place test data files in
_docs/00_problem/input_data/or indicate the location.- Missing expected result: Provide the quantifiable expected result for this input. Update
_docs/00_problem/input_data/expected_results/results_report.mdwith a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in_docs/00_problem/input_data/expected_results/. Use.cursor/skills/test-spec/templates/expected-results.mdfor format guidance.Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
- "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
- "HTTP 200 with JSON body matching
expected_response_01.json"- "Processing time < 500ms"
- "0 false positives in the output set"
Option B — Skip this test: If you cannot provide the data or expected result, this test scenario will be removed from the specification.
BLOCKING: Wait for the user's response for every missing item.
Step 3 — Validate provided data and expected results
For each item where the user chose Option A:
Input data validation:
- Verify the data file(s) exist at the indicated location
- Verify quality: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
- Verify quantity: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
Expected result validation:
4. Verify the expected result exists in input_data/expected_results/results_report.md or as a referenced file in input_data/expected_results/
5. Verify quantifiability: the expected result can be evaluated programmatically — it must contain at least one of:
- Exact values (counts, strings, status codes)
- Numeric values with tolerance (e.g.,
± 10px,≥ 0.85) - Pattern matches (regex, substring, JSON schema)
- Thresholds (e.g.,
< 500ms,≤ 5% error rate) - Reference file for structural comparison (JSON diff, CSV diff)
- Verify completeness: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
- Verify consistency: the expected result is consistent with the acceptance criteria it traces to
If any validation fails, report the specific issue and loop back to Step 2 for that item.
Step 4 — Remove tests without data or expected results
For each item where the user chose Option B:
- Warn the user:
⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result. - Remove the test scenario from the respective test file
- Remove corresponding rows from
traceability-matrix.md - Update
test-data.mdto reflect the removal
Save action: Write updated files under TESTS_OUTPUT_DIR:
test-data.md- Affected test files (if tests removed)
traceability-matrix.md(if tests removed)
Step 5 — Final coverage check
After all removals, recalculate coverage:
- Count remaining test scenarios that trace to acceptance criteria
- Count total acceptance criteria + restrictions
- Calculate coverage percentage:
covered_items / total_items * 100
| Metric | Value |
|---|---|
| Total AC + Restrictions | ? |
| Covered by remaining tests | ? |
| Coverage % | ?% |
Decision:
-
Coverage ≥ 70% → Phase 3 PASSED. Present final summary to user.
-
Coverage < 70% → Phase 3 FAILED. Report:
❌ Test coverage dropped to X% (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
Uncovered Item Type (AC/Restriction) Missing Test Data Needed Action required: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
BLOCKING: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%.
Phase 3 Completion
When coverage ≥ 70% and all remaining tests have validated data AND quantifiable expected results:
- Present the final coverage report
- List all removed tests (if any) with reasons
- Confirm every remaining test has: input data + quantifiable expected result + comparison method
- Confirm all artifacts are saved and consistent
Phase 4: Test Runner Script Generation
Role: DevOps engineer Goal: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently. Constraints: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure.
Step 1 — Detect test infrastructure
- Identify the project's test runner from manifests and config files:
- Python:
pytest(pyproject.toml, setup.cfg, pytest.ini) - .NET:
dotnet test(*.csproj, *.sln) - Rust:
cargo test(Cargo.toml) - Node:
npm testorvitest/jest(package.json)
- Python:
- Identify docker-compose files for integration/blackbox tests (
docker-compose.test.yml,e2e/docker-compose*.yml) - Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
- Read
TESTS_OUTPUT_DIR/environment.mdfor infrastructure requirements
Step 2 — Generate scripts/run-tests.sh
Create scripts/run-tests.sh at the project root using .cursor/skills/test-spec/templates/run-tests-script.md as structural guidance. The script must:
- Set
set -euo pipefailand trap cleanup on EXIT - Optionally accept a
--unit-onlyflag to skip blackbox tests - Run unit tests using the detected test runner
- If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down
- Print a summary of passed/failed/skipped tests
- Exit 0 on all pass, exit 1 on any failure
Step 3 — Generate scripts/run-performance-tests.sh
Create scripts/run-performance-tests.sh at the project root. The script must:
- Set
set -euo pipefailand trap cleanup on EXIT - Read thresholds from
_docs/02_document/tests/performance-tests.md(or accept as CLI args) - Spin up the system under test (docker-compose or local)
- Run load/performance scenarios using the detected tool
- Compare results against threshold values from the test spec
- Print a pass/fail summary per scenario
- Exit 0 if all thresholds met, exit 1 otherwise
Step 4 — Verify scripts
- Verify both scripts are syntactically valid (
bash -n scripts/run-tests.sh) - Mark both scripts as executable (
chmod +x) - Present a summary of what each script does to the user
Save action: Write scripts/run-tests.sh and scripts/run-performance-tests.sh to the project root.
Escalation Rules
| Situation | Action |
|---|---|
| Missing acceptance_criteria.md, restrictions.md, or input_data/ | STOP — specification cannot proceed |
| Missing input_data/expected_results/results_report.md | STOP — ask user to provide expected results mapping using the template |
| Ambiguous requirements | ASK user |
| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
| Test scenario conflicts with restrictions | ASK user to clarify intent |
| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
Common Mistakes
- Referencing internals: tests must be black-box — no internal module names, no direct DB queries against the system under test
- Vague expected outcomes: "works correctly" is not a test outcome; use specific measurable values
- Missing expected results: input data without a paired expected result is useless — the test cannot determine pass/fail without knowing what "correct" looks like
- Non-quantifiable expected results: "should return good results" is not verifiable; expected results must have exact values, tolerances, thresholds, or pattern matches that code can evaluate
- Missing negative scenarios: every positive scenario category should have corresponding negative/edge-case tests
- Untraceable tests: every test should trace to at least one AC or restriction
- Writing test code: this skill produces specifications, never implementation code
- Tests without data: every test scenario MUST have concrete test data AND a quantifiable expected result; a test spec without either is not executable and must be removed
Trigger Conditions
When the user wants to:
- Specify blackbox tests before implementation or refactoring
- Analyze input data completeness for test coverage
- Produce test scenarios from acceptance criteria
Keywords: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios"
Methodology Quick Reference
┌──────────────────────────────────────────────────────────────────────┐
│ Test Scenario Specification (4-Phase) │
├──────────────────────────────────────────────────────────────────────┤
│ PREREQ: Data Gate (BLOCKING) │
│ → verify AC, restrictions, input_data (incl. expected_results.md) │
│ │
│ Phase 1: Input Data & Expected Results Completeness Analysis │
│ → assess input_data/ coverage vs AC scenarios (≥70%) │
│ → verify every input has a quantifiable expected result │
│ → present input→expected-result pairing assessment │
│ [BLOCKING: user confirms input data + expected results coverage] │
│ │
│ Phase 2: Test Scenario Specification │
│ → environment.md │
│ → test-data.md (with expected results mapping) │
│ → blackbox-tests.md (positive + negative) │
│ → performance-tests.md │
│ → resilience-tests.md │
│ → security-tests.md │
│ → resource-limit-tests.md │
│ → traceability-matrix.md │
│ [BLOCKING: user confirms test coverage] │
│ │
│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │
│ → build test-data + expected-result requirements checklist │
│ → ask user: provide data+result (A) or remove test (B) │
│ → validate input data (quality + quantity) │
│ → validate expected results (quantifiable + comparison method) │
│ → remove tests without data or expected result, warn user │
│ → final coverage check (≥70% or FAIL + loop back) │
│ [BLOCKING: coverage ≥ 70% required to pass] │
│ │
│ Phase 4: Test Runner Script Generation │
│ → detect test runner + docker-compose + load tool │
│ → scripts/run-tests.sh (unit + blackbox) │
│ → scripts/run-performance-tests.sh (load/perf scenarios) │
│ → verify scripts are valid and executable │
├──────────────────────────────────────────────────────────────────────┤
│ Principles: Black-box only · Traceability · Save immediately │
│ Ask don't assume · Spec don't code │
│ No test without data · No test without expected result │
└──────────────────────────────────────────────────────────────────────┘