--- name: test-spec description: | Test specification skill. Analyzes input data and expected results completeness, then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits) that treat the system as a black box. Every test pairs input data with quantifiable expected results so tests can verify correctness, not just execution. 4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate, test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/. Trigger phrases: - "test spec", "test specification", "test scenarios" - "blackbox test spec", "black box tests", "blackbox tests" - "performance tests", "resilience tests", "security tests" category: build tags: [testing, black-box, blackbox-tests, test-specification, qa] disable-model-invocation: true --- # Test Scenario Specification Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals. ## Core Principles - **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details - **Traceability**: every test traces to at least one acceptance criterion or restriction - **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work - **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding - **Spec, don't code**: this workflow produces test specifications, never test implementation code - **Every test must have a pass/fail criterion**. Two acceptable shapes: - **Input/output shape**: concrete input data paired with a quantifiable expected result (exact value, tolerance, threshold, pattern, reference file). Typical for functional blackbox tests, performance tests with load data, data-processing pipelines. - **Behavioral shape**: a trigger condition + observable system behavior + quantifiable pass/fail criterion, with no input data required. Typical for startup/shutdown tests, retry/backoff policies, state transitions, logging/metrics emission, resilience scenarios. Example criteria: "startup logs `service ready` within 5s", "retry emits 3 attempts with exponential backoff (base 100ms ± 20ms)", "on SIGTERM, service drains in-flight requests within 30s grace period", "health endpoint returns 503 while migrations run". - For behavioral tests the observable (log line, metric value, state transition, emitted event, elapsed time) must still be quantifiable — the test must programmatically decide pass/fail. - A test that cannot produce a pass/fail verdict through either shape is not verifiable and must be removed. ## Context Resolution Fixed paths: - PROBLEM_DIR: `_docs/00_problem/` - SOLUTION_DIR: `_docs/01_solution/` - DOCUMENT_DIR: `_docs/02_document/` - TESTS_OUTPUT_DIR: `_docs/02_document/tests/` Announce the resolved paths and the detected invocation mode (below) to the user before proceeding. ### Invocation Modes - **full** (default): runs all 4 phases against the whole `PROBLEM_DIR` + `DOCUMENT_DIR`. Used in greenfield Plan Step 1 and existing-code Step 3. - **cycle-update**: runs only a scoped refresh of the existing test-spec artifacts against the current feature cycle's completed tasks. Used by the existing-code flow's per-cycle sync step. See `modes/cycle-update.md` for the narrowed workflow. ## Input Specification ### Required Files | File | Purpose | |------|---------| | `_docs/00_problem/problem.md` | Problem description and context | | `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria | | `_docs/00_problem/restrictions.md` | Constraints and limitations | | `_docs/00_problem/input_data/` | Reference data examples, expected results, and optional reference files | | `_docs/01_solution/solution.md` | Finalized solution | ### Expected Results Specification Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be **quantifiable** — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict. Expected results live inside `_docs/00_problem/input_data/` in one or both of: 1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `templates/expected-results.md` 2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file ``` input_data/ ├── expected_results/ ← required: expected results folder │ ├── results_report.md ← required: input→expected result mapping │ ├── image_01_expected.csv ← per-file expected detections │ └── video_01_expected.csv ├── image_01.jpg ├── empty_scene.jpg └── data_parameters.md ``` **Quantifiability requirements** (see `templates/expected-results.md` for full format and examples): - Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`) - Structured data: exact JSON/CSV values, or a reference file in `expected_results/` - Counts: exact counts (e.g., "3 detections", "0 errors") - Text/patterns: exact string or regex pattern to match - Timing: threshold (e.g., "response ≤ 500ms") - Error cases: expected error code, message pattern, or HTTP status ### Optional Files (used when available) | File | Purpose | |------|---------| | `DOCUMENT_DIR/architecture.md` | System architecture for environment design | | `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage | | `DOCUMENT_DIR/components/` | Component specs for interface identification | ### Prerequisite Checks (BLOCKING) 1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing** 2. `restrictions.md` exists and is non-empty — **STOP if missing** 3. `input_data/` exists and contains at least one file — **STOP if missing** 4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `templates/expected-results.md` as the format reference."* 5. `problem.md` exists and is non-empty — **STOP if missing** 6. `solution.md` exists and is non-empty — **STOP if missing** 7. Create TESTS_OUTPUT_DIR if it does not exist 8. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?** ## Artifact Management ### Directory Structure ``` TESTS_OUTPUT_DIR/ ├── environment.md ├── test-data.md ├── blackbox-tests.md ├── performance-tests.md ├── resilience-tests.md ├── security-tests.md ├── resource-limit-tests.md └── traceability-matrix.md ``` ### Save Timing | Phase | Save immediately after | Filename | |-------|------------------------|----------| | Phase 1 | Input data analysis (no file — findings feed Phase 2) | — | | Phase 2 | Environment spec | `environment.md` | | Phase 2 | Test data spec | `test-data.md` | | Phase 2 | Blackbox tests | `blackbox-tests.md` | | Phase 2 | Performance tests | `performance-tests.md` | | Phase 2 | Resilience tests | `resilience-tests.md` | | Phase 2 | Security tests | `security-tests.md` | | Phase 2 | Resource limit tests | `resource-limit-tests.md` | | Phase 2 | Traceability matrix | `traceability-matrix.md` | | Phase 3 | Updated test data spec (if data added) | `test-data.md` | | Phase 3 | Updated test files (if tests removed) | respective test file | | Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` | | Hardware Assessment | Test Execution section | `environment.md` (updated) | | Phase 4 | Test runner script | `scripts/run-tests.sh` | | Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` | ### Resumability If TESTS_OUTPUT_DIR already contains files: 1. List existing files and match them to the save timing table above 2. Identify which phase/artifacts are complete 3. Resume from the next incomplete artifact 4. Inform the user which artifacts are being skipped ## Progress Tracking At the start of execution, create a TodoWrite with all four phases (plus the hardware assessment between Phase 3 and Phase 4). Update status as each phase completes. ## Workflow ### Phase 1: Input Data & Expected Results Completeness Analysis Read and follow `phases/01-input-data-analysis.md`. --- ### Phase 2: Test Scenario Specification Read and follow `phases/02-test-scenarios.md`. --- ### Phase 3: Test Data Validation Gate (HARD GATE) Read and follow `phases/03-data-validation-gate.md`. --- ### Hardware-Dependency & Execution Environment Assessment (BLOCKING — runs between Phase 3 and Phase 4) Read and follow `phases/hardware-assessment.md`. --- ### Phase 4: Test Runner Script Generation Read and follow `phases/04-runner-scripts.md`. --- ### cycle-update mode If invoked in `cycle-update` mode (see "Invocation Modes" above), read and follow `modes/cycle-update.md` instead of the full 4-phase workflow. ## Escalation Rules | Situation | Action | |-----------|--------| | Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed | | Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template | | Ambiguous requirements | ASK user | | Input data coverage below 75% (Phase 1) | Search internet for supplementary data, ASK user to validate | | Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding | | Test scenario conflicts with restrictions | ASK user to clarify intent | | System interfaces unclear (no architecture.md) | ASK user or derive from solution.md | | Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test | | Final coverage below 75% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec | ## Common Mistakes - **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test - **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values - **Missing pass/fail criterion**: input/output tests without an expected result, OR behavioral tests without a measurable observable — both are unverifiable and must be removed - **Non-quantifiable criteria**: "should return good results", "works correctly", "behaves properly" — not verifiable. Use exact values, tolerances, thresholds, pattern matches, or timing bounds that code can evaluate. - **Forcing the wrong shape**: do not invent fake input data for a behavioral test (e.g., "input: SIGTERM signal") just to fit the input/output shape. Classify the test correctly and use the matching checklist. - **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests - **Untraceable tests**: every test should trace to at least one AC or restriction - **Writing test code**: this skill produces specifications, never implementation code ## Trigger Conditions When the user wants to: - Specify blackbox tests before implementation or refactoring - Analyze input data completeness for test coverage - Produce test scenarios from acceptance criteria **Keywords**: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios" ## Methodology Quick Reference ``` ┌──────────────────────────────────────────────────────────────────────┐ │ Test Scenario Specification (4-Phase) │ ├──────────────────────────────────────────────────────────────────────┤ │ PREREQ: Data Gate (BLOCKING) │ │ → verify AC, restrictions, input_data (incl. expected_results.md) │ │ │ │ Phase 1: Input Data & Expected Results Completeness Analysis │ │ → phases/01-input-data-analysis.md │ │ [BLOCKING: user confirms input data + expected results coverage] │ │ │ │ Phase 2: Test Scenario Specification │ │ → phases/02-test-scenarios.md │ │ → environment.md · test-data.md · blackbox-tests.md │ │ → performance-tests.md · resilience-tests.md · security-tests.md │ │ → resource-limit-tests.md · traceability-matrix.md │ │ [BLOCKING: user confirms test coverage] │ │ │ │ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │ │ → phases/03-data-validation-gate.md │ │ [BLOCKING: coverage ≥ 75% required to pass] │ │ │ │ Hardware-Dependency Assessment (BLOCKING, pre-Phase-4) │ │ → phases/hardware-assessment.md │ │ │ │ Phase 4: Test Runner Script Generation │ │ → phases/04-runner-scripts.md │ │ → scripts/run-tests.sh (unit + blackbox) │ │ → scripts/run-performance-tests.sh (load/perf scenarios) │ │ │ │ cycle-update mode (scoped refresh) │ │ → modes/cycle-update.md │ ├──────────────────────────────────────────────────────────────────────┤ │ Principles: Black-box only · Traceability · Save immediately │ │ Ask don't assume · Spec don't code │ │ No test without data · No test without expected result │ └──────────────────────────────────────────────────────────────────────┘ ```