--- name: test-spec description: | Test specification skill. Analyzes input data and expected results completeness, then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits) that treat the system as a black box. Every test pairs input data with quantifiable expected results so tests can verify correctness, not just execution. 4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate, test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/. Trigger phrases: - "test spec", "test specification", "test scenarios" - "blackbox test spec", "black box tests", "blackbox tests" - "performance tests", "resilience tests", "security tests" category: build tags: [testing, black-box, blackbox-tests, test-specification, qa] disable-model-invocation: true --- # Test Scenario Specification Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals. ## Core Principles - **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details - **Traceability**: every test traces to at least one acceptance criterion or restriction - **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work - **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding - **Spec, don't code**: this workflow produces test specifications, never test implementation code - **No test without data**: every test scenario MUST have concrete test data; tests without data are removed - **No test without expected result**: every test scenario MUST pair input data with a quantifiable expected result; a test that cannot compare actual output against a known-correct answer is not verifiable and must be removed ## Context Resolution Fixed paths — no mode detection needed: - PROBLEM_DIR: `_docs/00_problem/` - SOLUTION_DIR: `_docs/01_solution/` - DOCUMENT_DIR: `_docs/02_document/` - TESTS_OUTPUT_DIR: `_docs/02_document/tests/` Announce the resolved paths to the user before proceeding. ## Input Specification ### Required Files | File | Purpose | |------|---------| | `_docs/00_problem/problem.md` | Problem description and context | | `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria | | `_docs/00_problem/restrictions.md` | Constraints and limitations | | `_docs/00_problem/input_data/` | Reference data examples, expected results, and optional reference files | | `_docs/01_solution/solution.md` | Finalized solution | ### Expected Results Specification Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be **quantifiable** — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict. Expected results live inside `_docs/00_problem/input_data/` in one or both of: 1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md` 2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file ``` input_data/ ├── expected_results/ ← required: expected results folder │ ├── results_report.md ← required: input→expected result mapping │ ├── image_01_expected.csv ← per-file expected detections │ └── video_01_expected.csv ├── image_01.jpg ├── empty_scene.jpg └── data_parameters.md ``` **Quantifiability requirements** (see template for full format and examples): - Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`) - Structured data: exact JSON/CSV values, or a reference file in `expected_results/` - Counts: exact counts (e.g., "3 detections", "0 errors") - Text/patterns: exact string or regex pattern to match - Timing: threshold (e.g., "response ≤ 500ms") - Error cases: expected error code, message pattern, or HTTP status ### Optional Files (used when available) | File | Purpose | |------|---------| | `DOCUMENT_DIR/architecture.md` | System architecture for environment design | | `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage | | `DOCUMENT_DIR/components/` | Component specs for interface identification | ### Prerequisite Checks (BLOCKING) 1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing** 2. `restrictions.md` exists and is non-empty — **STOP if missing** 3. `input_data/` exists and contains at least one file — **STOP if missing** 4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."* 5. `problem.md` exists and is non-empty — **STOP if missing** 6. `solution.md` exists and is non-empty — **STOP if missing** 7. Create TESTS_OUTPUT_DIR if it does not exist 8. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?** ## Artifact Management ### Directory Structure ``` TESTS_OUTPUT_DIR/ ├── environment.md ├── test-data.md ├── blackbox-tests.md ├── performance-tests.md ├── resilience-tests.md ├── security-tests.md ├── resource-limit-tests.md └── traceability-matrix.md ``` ### Save Timing | Phase | Save immediately after | Filename | |-------|------------------------|----------| | Phase 1 | Input data analysis (no file — findings feed Phase 2) | — | | Phase 2 | Environment spec | `environment.md` | | Phase 2 | Test data spec | `test-data.md` | | Phase 2 | Blackbox tests | `blackbox-tests.md` | | Phase 2 | Performance tests | `performance-tests.md` | | Phase 2 | Resilience tests | `resilience-tests.md` | | Phase 2 | Security tests | `security-tests.md` | | Phase 2 | Resource limit tests | `resource-limit-tests.md` | | Phase 2 | Traceability matrix | `traceability-matrix.md` | | Phase 3 | Updated test data spec (if data added) | `test-data.md` | | Phase 3 | Updated test files (if tests removed) | respective test file | | Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` | | Phase 4 | Test runner script | `scripts/run-tests.sh` | | Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` | ### Resumability If TESTS_OUTPUT_DIR already contains files: 1. List existing files and match them to the save timing table above 2. Identify which phase/artifacts are complete 3. Resume from the next incomplete artifact 4. Inform the user which artifacts are being skipped ## Progress Tracking At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes. ## Workflow ### Phase 1: Input Data Completeness Analysis **Role**: Professional Quality Assurance Engineer **Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios **Constraints**: Analysis only — no test specs yet 1. Read `_docs/01_solution/solution.md` 2. Read `acceptance_criteria.md`, `restrictions.md` 3. Read testing strategy from solution.md (if present) 4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows 5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/` 6. Analyze `input_data/` contents against: - Coverage of acceptance criteria scenarios - Coverage of restriction edge cases - Coverage of testing strategy requirements 7. Analyze `input_data/expected_results/results_report.md` completeness: - Every input data item has a corresponding expected result row in the mapping - Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result") - Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template - Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid 8. Present input-to-expected-result pairing assessment: | Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) | |------------|--------------------------|---------------|----------------| | [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] | 9. Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table) 10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md` 11. If expected results are missing or not quantifiable, ask user to provide them before proceeding **BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient. --- ### Phase 2: Test Scenario Specification **Role**: Professional Quality Assurance Engineer **Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios **Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built. Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios: 1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure 2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure 3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure 4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure 5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure 6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure 7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure 8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure **Self-verification**: - [ ] Every acceptance criterion is covered by at least one test scenario - [ ] Every restriction is verified by at least one test scenario - [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md` - [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md` - [ ] Positive and negative scenarios are balanced - [ ] Consumer app has no direct access to system internals - [ ] Docker environment is self-contained (`docker compose up` sufficient) - [ ] External dependencies have mock/stub services defined - [ ] Traceability matrix has no uncovered AC or restrictions **Save action**: Write all files under TESTS_OUTPUT_DIR: - `environment.md` - `test-data.md` - `blackbox-tests.md` - `performance-tests.md` - `resilience-tests.md` - `security-tests.md` - `resource-limit-tests.md` - `traceability-matrix.md` **BLOCKING**: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed. Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.). --- ### Phase 3: Test Data Validation Gate (HARD GATE) **Role**: Professional Quality Assurance Engineer **Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%. **Constraints**: This phase is MANDATORY and cannot be skipped. #### Step 1 — Build the test-data and expected-result requirements checklist Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, extract: | # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? | |---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------| | 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] | Present this table to the user. #### Step 2 — Ask user to provide missing test data AND expected results For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user: > **Option A — Provide the missing items**: Supply what is missing: > - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location. > - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance. > > Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples: > - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px" > - "HTTP 200 with JSON body matching `expected_response_01.json`" > - "Processing time < 500ms" > - "0 false positives in the output set" > > **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification. **BLOCKING**: Wait for the user's response for every missing item. #### Step 3 — Validate provided data and expected results For each item where the user chose **Option A**: **Input data validation**: 1. Verify the data file(s) exist at the indicated location 2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges) 3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants) **Expected result validation**: 4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/` 5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of: - Exact values (counts, strings, status codes) - Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`) - Pattern matches (regex, substring, JSON schema) - Thresholds (e.g., `< 500ms`, `≤ 5% error rate`) - Reference file for structural comparison (JSON diff, CSV diff) 6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple) 7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to If any validation fails, report the specific issue and loop back to Step 2 for that item. #### Step 4 — Remove tests without data or expected results For each item where the user chose **Option B**: 1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.` 2. Remove the test scenario from the respective test file 3. Remove corresponding rows from `traceability-matrix.md` 4. Update `test-data.md` to reflect the removal **Save action**: Write updated files under TESTS_OUTPUT_DIR: - `test-data.md` - Affected test files (if tests removed) - `traceability-matrix.md` (if tests removed) #### Step 5 — Final coverage check After all removals, recalculate coverage: 1. Count remaining test scenarios that trace to acceptance criteria 2. Count total acceptance criteria + restrictions 3. Calculate coverage percentage: `covered_items / total_items * 100` | Metric | Value | |--------|-------| | Total AC + Restrictions | ? | | Covered by remaining tests | ? | | **Coverage %** | **?%** | **Decision**: - **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user. - **Coverage < 70%** → Phase 3 **FAILED**. Report: > ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions: > > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed | > |---|---|---| > > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply. **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%. #### Phase 3 Completion When coverage ≥ 70% and all remaining tests have validated data AND quantifiable expected results: 1. Present the final coverage report 2. List all removed tests (if any) with reasons 3. Confirm every remaining test has: input data + quantifiable expected result + comparison method 4. Confirm all artifacts are saved and consistent --- ### Phase 4: Test Runner Script Generation **Role**: DevOps engineer **Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently. **Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. #### Step 1 — Detect test infrastructure 1. Identify the project's test runner from manifests and config files: - Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini) - .NET: `dotnet test` (*.csproj, *.sln) - Rust: `cargo test` (Cargo.toml) - Node: `npm test` or `vitest` / `jest` (package.json) 2. Identify docker-compose files for integration/blackbox tests (`docker-compose.test.yml`, `e2e/docker-compose*.yml`) 3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks) 4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements #### Step 2 — Generate `scripts/run-tests.sh` Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must: 1. Set `set -euo pipefail` and trap cleanup on EXIT 2. Optionally accept a `--unit-only` flag to skip blackbox tests 3. Run unit tests using the detected test runner 4. If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down 5. Print a summary of passed/failed/skipped tests 6. Exit 0 on all pass, exit 1 on any failure #### Step 3 — Generate `scripts/run-performance-tests.sh` Create `scripts/run-performance-tests.sh` at the project root. The script must: 1. Set `set -euo pipefail` and trap cleanup on EXIT 2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args) 3. Spin up the system under test (docker-compose or local) 4. Run load/performance scenarios using the detected tool 5. Compare results against threshold values from the test spec 6. Print a pass/fail summary per scenario 7. Exit 0 if all thresholds met, exit 1 otherwise #### Step 4 — Verify scripts 1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`) 2. Mark both scripts as executable (`chmod +x`) 3. Present a summary of what each script does to the user **Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root. --- ## Escalation Rules | Situation | Action | |-----------|--------| | Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed | | Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template | | Ambiguous requirements | ASK user | | Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate | | Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding | | Test scenario conflicts with restrictions | ASK user to clarify intent | | System interfaces unclear (no architecture.md) | ASK user or derive from solution.md | | Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test | | Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec | ## Common Mistakes - **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test - **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values - **Missing expected results**: input data without a paired expected result is useless — the test cannot determine pass/fail without knowing what "correct" looks like - **Non-quantifiable expected results**: "should return good results" is not verifiable; expected results must have exact values, tolerances, thresholds, or pattern matches that code can evaluate - **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests - **Untraceable tests**: every test should trace to at least one AC or restriction - **Writing test code**: this skill produces specifications, never implementation code - **Tests without data**: every test scenario MUST have concrete test data AND a quantifiable expected result; a test spec without either is not executable and must be removed ## Trigger Conditions When the user wants to: - Specify blackbox tests before implementation or refactoring - Analyze input data completeness for test coverage - Produce test scenarios from acceptance criteria **Keywords**: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios" ## Methodology Quick Reference ``` ┌──────────────────────────────────────────────────────────────────────┐ │ Test Scenario Specification (4-Phase) │ ├──────────────────────────────────────────────────────────────────────┤ │ PREREQ: Data Gate (BLOCKING) │ │ → verify AC, restrictions, input_data (incl. expected_results.md) │ │ │ │ Phase 1: Input Data & Expected Results Completeness Analysis │ │ → assess input_data/ coverage vs AC scenarios (≥70%) │ │ → verify every input has a quantifiable expected result │ │ → present input→expected-result pairing assessment │ │ [BLOCKING: user confirms input data + expected results coverage] │ │ │ │ Phase 2: Test Scenario Specification │ │ → environment.md │ │ → test-data.md (with expected results mapping) │ │ → blackbox-tests.md (positive + negative) │ │ → performance-tests.md │ │ → resilience-tests.md │ │ → security-tests.md │ │ → resource-limit-tests.md │ │ → traceability-matrix.md │ │ [BLOCKING: user confirms test coverage] │ │ │ │ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │ │ → build test-data + expected-result requirements checklist │ │ → ask user: provide data+result (A) or remove test (B) │ │ → validate input data (quality + quantity) │ │ → validate expected results (quantifiable + comparison method) │ │ → remove tests without data or expected result, warn user │ │ → final coverage check (≥70% or FAIL + loop back) │ │ [BLOCKING: coverage ≥ 70% required to pass] │ │ │ │ Phase 4: Test Runner Script Generation │ │ → detect test runner + docker-compose + load tool │ │ → scripts/run-tests.sh (unit + blackbox) │ │ → scripts/run-performance-tests.sh (load/perf scenarios) │ │ → verify scripts are valid and executable │ ├──────────────────────────────────────────────────────────────────────┤ │ Principles: Black-box only · Traceability · Save immediately │ │ Ask don't assume · Spec don't code │ │ No test without data · No test without expected result │ └──────────────────────────────────────────────────────────────────────┘ ```