---
name: test-spec
description: |
  Test specification skill. Analyzes input data and expected results completeness,
  then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits)
  that treat the system as a black box. Every test pairs input data with quantifiable expected results
  so tests can verify correctness, not just execution.
  4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate,
  test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/.
  Trigger phrases:
  - "test spec", "test specification", "test scenarios"
  - "blackbox test spec", "black box tests", "blackbox tests"
  - "performance tests", "resilience tests", "security tests"
category: build
tags: [testing, black-box, blackbox-tests, test-specification, qa]
disable-model-invocation: true
---

# Test Scenario Specification

Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals.

## Core Principles

- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
- **Traceability**: every test traces to at least one acceptance criterion or restriction
- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
- **Spec, don't code**: this workflow produces test specifications, never test implementation code
- **Every test must have a pass/fail criterion**. Two acceptable shapes:
  - **Input/output shape**: concrete input data paired with a quantifiable expected result (exact value, tolerance, threshold, pattern, reference file). Typical for functional blackbox tests, performance tests with load data, data-processing pipelines.
  - **Behavioral shape**: a trigger condition + observable system behavior + quantifiable pass/fail criterion, with no input data required. Typical for startup/shutdown tests, retry/backoff policies, state transitions, logging/metrics emission, resilience scenarios. Example criteria: "startup logs `service ready` within 5s", "retry emits 3 attempts with exponential backoff (base 100ms ± 20ms)", "on SIGTERM, service drains in-flight requests within 30s grace period", "health endpoint returns 503 while migrations run".
- For behavioral tests the observable (log line, metric value, state transition, emitted event, elapsed time) must still be quantifiable — the test must programmatically decide pass/fail.
- A test that cannot produce a pass/fail verdict through either shape is not verifiable and must be removed.

## Context Resolution

Fixed paths:

- PROBLEM_DIR: `_docs/00_problem/`
- SOLUTION_DIR: `_docs/01_solution/`
- DOCUMENT_DIR: `_docs/02_document/`
- TESTS_OUTPUT_DIR: `_docs/02_document/tests/`

Announce the resolved paths and the detected invocation mode (below) to the user before proceeding.

### Invocation Modes

- **full** (default): runs all 4 phases against the whole `PROBLEM_DIR` + `DOCUMENT_DIR`. Used in greenfield Plan Step 1 and existing-code Step 3.
- **cycle-update**: runs only a scoped refresh of the existing test-spec artifacts against the current feature cycle's completed tasks. Used by the existing-code flow's per-cycle sync step. See `modes/cycle-update.md` for the narrowed workflow.

## Input Specification

### Required Files

| File | Purpose |
|------|---------|
| `_docs/00_problem/problem.md` | Problem description and context |
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
| `_docs/00_problem/restrictions.md` | Constraints and limitations |
| `_docs/00_problem/input_data/` | Reference data examples, expected results, and optional reference files |
| `_docs/01_solution/solution.md` | Finalized solution |

### Expected Results Specification

Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be **quantifiable** — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict.

Expected results live inside `_docs/00_problem/input_data/` in one or both of:

1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `templates/expected-results.md`

2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file

```
input_data/
├── expected_results/            ← required: expected results folder
│   ├── results_report.md        ← required: input→expected result mapping
│   ├── image_01_expected.csv    ← per-file expected detections
│   └── video_01_expected.csv
├── image_01.jpg
├── empty_scene.jpg
└── data_parameters.md
```

**Quantifiability requirements** (see `templates/expected-results.md` for full format and examples):

- Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`)
- Structured data: exact JSON/CSV values, or a reference file in `expected_results/`
- Counts: exact counts (e.g., "3 detections", "0 errors")
- Text/patterns: exact string or regex pattern to match
- Timing: threshold (e.g., "response ≤ 500ms")
- Error cases: expected error code, message pattern, or HTTP status

### Optional Files (used when available)

| File | Purpose |
|------|---------|
| `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
| `DOCUMENT_DIR/components/` | Component specs for interface identification |

### Prerequisite Checks (BLOCKING)

1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
2. `restrictions.md` exists and is non-empty — **STOP if missing**
3. `input_data/` exists and contains at least one file — **STOP if missing**
4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `templates/expected-results.md` as the format reference."*
5. `problem.md` exists and is non-empty — **STOP if missing**
6. `solution.md` exists and is non-empty — **STOP if missing**
7. Create TESTS_OUTPUT_DIR if it does not exist
8. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**

## Artifact Management

### Directory Structure

```
TESTS_OUTPUT_DIR/
├── environment.md
├── test-data.md
├── blackbox-tests.md
├── performance-tests.md
├── resilience-tests.md
├── security-tests.md
├── resource-limit-tests.md
└── traceability-matrix.md
```

### Save Timing

| Phase | Save immediately after | Filename |
|-------|------------------------|----------|
| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
| Phase 2 | Environment spec | `environment.md` |
| Phase 2 | Test data spec | `test-data.md` |
| Phase 2 | Blackbox tests | `blackbox-tests.md` |
| Phase 2 | Performance tests | `performance-tests.md` |
| Phase 2 | Resilience tests | `resilience-tests.md` |
| Phase 2 | Security tests | `security-tests.md` |
| Phase 2 | Resource limit tests | `resource-limit-tests.md` |
| Phase 2 | Traceability matrix | `traceability-matrix.md` |
| Phase 3 | Updated test data spec (if data added) | `test-data.md` |
| Phase 3 | Updated test files (if tests removed) | respective test file |
| Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
| Hardware Assessment | Test Execution section | `environment.md` (updated) |
| Phase 4 | Test runner script | `scripts/run-tests.sh` |
| Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` |

### Resumability

If TESTS_OUTPUT_DIR already contains files:

1. List existing files and match them to the save timing table above
2. Identify which phase/artifacts are complete
3. Resume from the next incomplete artifact
4. Inform the user which artifacts are being skipped

## Progress Tracking

At the start of execution, create a TodoWrite with all four phases (plus the hardware assessment between Phase 3 and Phase 4). Update status as each phase completes.

## Workflow

### Phase 1: Input Data & Expected Results Completeness Analysis

Read and follow `phases/01-input-data-analysis.md`.

---

### Phase 2: Test Scenario Specification

Read and follow `phases/02-test-scenarios.md`.

---

### Phase 3: Test Data Validation Gate (HARD GATE)

Read and follow `phases/03-data-validation-gate.md`.

---

### Hardware-Dependency & Execution Environment Assessment (BLOCKING — runs between Phase 3 and Phase 4)

Read and follow `phases/hardware-assessment.md`.

---

### Phase 4: Test Runner Script Generation

Read and follow `phases/04-runner-scripts.md`.

---

### cycle-update mode

If invoked in `cycle-update` mode (see "Invocation Modes" above), read and follow `modes/cycle-update.md` instead of the full 4-phase workflow.

## Escalation Rules

| Situation | Action |
|-----------|--------|
| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
| Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template |
| Ambiguous requirements | ASK user |
| Input data coverage below 75% (Phase 1) | Search internet for supplementary data, ASK user to validate |
| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
| Test scenario conflicts with restrictions | ASK user to clarify intent |
| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
| Final coverage below 75% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |

## Common Mistakes

- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
- **Missing pass/fail criterion**: input/output tests without an expected result, OR behavioral tests without a measurable observable — both are unverifiable and must be removed
- **Non-quantifiable criteria**: "should return good results", "works correctly", "behaves properly" — not verifiable. Use exact values, tolerances, thresholds, pattern matches, or timing bounds that code can evaluate.
- **Forcing the wrong shape**: do not invent fake input data for a behavioral test (e.g., "input: SIGTERM signal") just to fit the input/output shape. Classify the test correctly and use the matching checklist.
- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
- **Untraceable tests**: every test should trace to at least one AC or restriction
- **Writing test code**: this skill produces specifications, never implementation code

## Trigger Conditions

When the user wants to:

- Specify blackbox tests before implementation or refactoring
- Analyze input data completeness for test coverage
- Produce test scenarios from acceptance criteria

**Keywords**: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios"

## Methodology Quick Reference

```
┌──────────────────────────────────────────────────────────────────────┐
│              Test Scenario Specification (4-Phase)                   │
├──────────────────────────────────────────────────────────────────────┤
│ PREREQ: Data Gate (BLOCKING)                                         │
│   → verify AC, restrictions, input_data (incl. expected_results.md)  │
│                                                                      │
│ Phase 1: Input Data & Expected Results Completeness Analysis         │
│   → phases/01-input-data-analysis.md                                 │
│   [BLOCKING: user confirms input data + expected results coverage]   │
│                                                                      │
│ Phase 2: Test Scenario Specification                                 │
│   → phases/02-test-scenarios.md                                      │
│   → environment.md · test-data.md · blackbox-tests.md                │
│   → performance-tests.md · resilience-tests.md · security-tests.md   │
│   → resource-limit-tests.md · traceability-matrix.md                 │
│   [BLOCKING: user confirms test coverage]                            │
│                                                                      │
│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)    │
│   → phases/03-data-validation-gate.md                                │
│   [BLOCKING: coverage ≥ 75% required to pass]                        │
│                                                                      │
│ Hardware-Dependency Assessment (BLOCKING, pre-Phase-4)               │
│   → phases/hardware-assessment.md                                    │
│                                                                      │
│ Phase 4: Test Runner Script Generation                               │
│   → phases/04-runner-scripts.md                                      │
│   → scripts/run-tests.sh (unit + blackbox)                           │
│   → scripts/run-performance-tests.sh (load/perf scenarios)           │
│                                                                      │
│ cycle-update mode (scoped refresh)                                   │
│   → modes/cycle-update.md                                            │
├──────────────────────────────────────────────────────────────────────┤
│ Principles: Black-box only · Traceability · Save immediately         │
│             Ask don't assume · Spec don't code                       │
│             No test without data · No test without expected result   │
└──────────────────────────────────────────────────────────────────────┘
```