15 KiB
name, description, category, tags, disable-model-invocation
| name | description | category | tags | disable-model-invocation | |||||
|---|---|---|---|---|---|---|---|---|---|
| test-spec | Test specification skill. Analyzes input data and expected results completeness, then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits) that treat the system as a black box. Every test pairs input data with quantifiable expected results so tests can verify correctness, not just execution. 4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate, test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/. Trigger phrases: - "test spec", "test specification", "test scenarios" - "blackbox test spec", "black box tests", "blackbox tests" - "performance tests", "resilience tests", "security tests" | build |
|
true |
Test Scenario Specification
Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
Core Principles
- Black-box only: tests describe observable behavior through public interfaces; no internal implementation details
- Traceability: every test traces to at least one acceptance criterion or restriction
- Save immediately: write artifacts to disk after each phase; never accumulate unsaved work
- Ask, don't assume: when requirements are ambiguous, ask the user before proceeding
- Spec, don't code: this workflow produces test specifications, never test implementation code
- Every test must have a pass/fail criterion. Two acceptable shapes:
- Input/output shape: concrete input data paired with a quantifiable expected result (exact value, tolerance, threshold, pattern, reference file). Typical for functional blackbox tests, performance tests with load data, data-processing pipelines.
- Behavioral shape: a trigger condition + observable system behavior + quantifiable pass/fail criterion, with no input data required. Typical for startup/shutdown tests, retry/backoff policies, state transitions, logging/metrics emission, resilience scenarios. Example criteria: "startup logs
service readywithin 5s", "retry emits 3 attempts with exponential backoff (base 100ms ± 20ms)", "on SIGTERM, service drains in-flight requests within 30s grace period", "health endpoint returns 503 while migrations run".
- For behavioral tests the observable (log line, metric value, state transition, emitted event, elapsed time) must still be quantifiable — the test must programmatically decide pass/fail.
- A test that cannot produce a pass/fail verdict through either shape is not verifiable and must be removed.
Context Resolution
Fixed paths:
- PROBLEM_DIR:
_docs/00_problem/ - SOLUTION_DIR:
_docs/01_solution/ - DOCUMENT_DIR:
_docs/02_document/ - TESTS_OUTPUT_DIR:
_docs/02_document/tests/
Announce the resolved paths and the detected invocation mode (below) to the user before proceeding.
Invocation Modes
- full (default): runs all 4 phases against the whole
PROBLEM_DIR+DOCUMENT_DIR. Used in greenfield Plan Step 1 and existing-code Step 3. - cycle-update: runs only a scoped refresh of the existing test-spec artifacts against the current feature cycle's completed tasks. Used by the existing-code flow's per-cycle sync step. See
modes/cycle-update.mdfor the narrowed workflow.
Input Specification
Required Files
| File | Purpose |
|---|---|
_docs/00_problem/problem.md |
Problem description and context |
_docs/00_problem/acceptance_criteria.md |
Measurable acceptance criteria |
_docs/00_problem/restrictions.md |
Constraints and limitations |
_docs/00_problem/input_data/ |
Reference data examples, expected results, and optional reference files |
_docs/01_solution/solution.md |
Finalized solution |
Expected Results Specification
Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be quantifiable — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict.
Expected results live inside _docs/00_problem/input_data/ in one or both of:
-
Mapping file (
input_data/expected_results/results_report.md): a table pairing each input with its quantifiable expected output, using the format defined intemplates/expected-results.md -
Reference files folder (
input_data/expected_results/): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file
input_data/
├── expected_results/ ← required: expected results folder
│ ├── results_report.md ← required: input→expected result mapping
│ ├── image_01_expected.csv ← per-file expected detections
│ └── video_01_expected.csv
├── image_01.jpg
├── empty_scene.jpg
└── data_parameters.md
Quantifiability requirements (see templates/expected-results.md for full format and examples):
- Numeric values: exact value or value ± tolerance (e.g.,
confidence ≥ 0.85,position ± 10px) - Structured data: exact JSON/CSV values, or a reference file in
expected_results/ - Counts: exact counts (e.g., "3 detections", "0 errors")
- Text/patterns: exact string or regex pattern to match
- Timing: threshold (e.g., "response ≤ 500ms")
- Error cases: expected error code, message pattern, or HTTP status
Optional Files (used when available)
| File | Purpose |
|---|---|
DOCUMENT_DIR/architecture.md |
System architecture for environment design |
DOCUMENT_DIR/system-flows.md |
System flows for test scenario coverage |
DOCUMENT_DIR/components/ |
Component specs for interface identification |
Prerequisite Checks (BLOCKING)
acceptance_criteria.mdexists and is non-empty — STOP if missingrestrictions.mdexists and is non-empty — STOP if missinginput_data/exists and contains at least one file — STOP if missinginput_data/expected_results/results_report.mdexists and is non-empty — STOP if missing. Prompt the user: "Expected results mapping is required. Please create_docs/00_problem/input_data/expected_results/results_report.mdpairing each input with its quantifiable expected output. Usetemplates/expected-results.mdas the format reference."problem.mdexists and is non-empty — STOP if missingsolution.mdexists and is non-empty — STOP if missing- Create TESTS_OUTPUT_DIR if it does not exist
- If TESTS_OUTPUT_DIR already contains files, ask user: resume from last checkpoint or start fresh?
Artifact Management
Directory Structure
TESTS_OUTPUT_DIR/
├── environment.md
├── test-data.md
├── blackbox-tests.md
├── performance-tests.md
├── resilience-tests.md
├── security-tests.md
├── resource-limit-tests.md
└── traceability-matrix.md
Save Timing
| Phase | Save immediately after | Filename |
|---|---|---|
| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
| Phase 2 | Environment spec | environment.md |
| Phase 2 | Test data spec | test-data.md |
| Phase 2 | Blackbox tests | blackbox-tests.md |
| Phase 2 | Performance tests | performance-tests.md |
| Phase 2 | Resilience tests | resilience-tests.md |
| Phase 2 | Security tests | security-tests.md |
| Phase 2 | Resource limit tests | resource-limit-tests.md |
| Phase 2 | Traceability matrix | traceability-matrix.md |
| Phase 3 | Updated test data spec (if data added) | test-data.md |
| Phase 3 | Updated test files (if tests removed) | respective test file |
| Phase 3 | Updated traceability matrix (if tests removed) | traceability-matrix.md |
| Hardware Assessment | Test Execution section | environment.md (updated) |
| Phase 4 | Test runner script | scripts/run-tests.sh |
| Phase 4 | Performance test runner script | scripts/run-performance-tests.sh |
Resumability
If TESTS_OUTPUT_DIR already contains files:
- List existing files and match them to the save timing table above
- Identify which phase/artifacts are complete
- Resume from the next incomplete artifact
- Inform the user which artifacts are being skipped
Progress Tracking
At the start of execution, create a TodoWrite with all four phases (plus the hardware assessment between Phase 3 and Phase 4). Update status as each phase completes.
Workflow
Phase 1: Input Data & Expected Results Completeness Analysis
Read and follow phases/01-input-data-analysis.md.
Phase 2: Test Scenario Specification
Read and follow phases/02-test-scenarios.md.
Phase 3: Test Data Validation Gate (HARD GATE)
Read and follow phases/03-data-validation-gate.md.
Hardware-Dependency & Execution Environment Assessment (BLOCKING — runs between Phase 3 and Phase 4)
Read and follow phases/hardware-assessment.md.
Phase 4: Test Runner Script Generation
Read and follow phases/04-runner-scripts.md.
cycle-update mode
If invoked in cycle-update mode (see "Invocation Modes" above), read and follow modes/cycle-update.md instead of the full 4-phase workflow.
Escalation Rules
| Situation | Action |
|---|---|
| Missing acceptance_criteria.md, restrictions.md, or input_data/ | STOP — specification cannot proceed |
| Missing input_data/expected_results/results_report.md | STOP — ask user to provide expected results mapping using the template |
| Ambiguous requirements | ASK user |
| Input data coverage below 75% (Phase 1) | Search internet for supplementary data, ASK user to validate |
| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
| Test scenario conflicts with restrictions | ASK user to clarify intent |
| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
| Final coverage below 75% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
Common Mistakes
- Referencing internals: tests must be black-box — no internal module names, no direct DB queries against the system under test
- Vague expected outcomes: "works correctly" is not a test outcome; use specific measurable values
- Missing pass/fail criterion: input/output tests without an expected result, OR behavioral tests without a measurable observable — both are unverifiable and must be removed
- Non-quantifiable criteria: "should return good results", "works correctly", "behaves properly" — not verifiable. Use exact values, tolerances, thresholds, pattern matches, or timing bounds that code can evaluate.
- Forcing the wrong shape: do not invent fake input data for a behavioral test (e.g., "input: SIGTERM signal") just to fit the input/output shape. Classify the test correctly and use the matching checklist.
- Missing negative scenarios: every positive scenario category should have corresponding negative/edge-case tests
- Untraceable tests: every test should trace to at least one AC or restriction
- Writing test code: this skill produces specifications, never implementation code
Trigger Conditions
When the user wants to:
- Specify blackbox tests before implementation or refactoring
- Analyze input data completeness for test coverage
- Produce test scenarios from acceptance criteria
Keywords: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios"
Methodology Quick Reference
┌──────────────────────────────────────────────────────────────────────┐
│ Test Scenario Specification (4-Phase) │
├──────────────────────────────────────────────────────────────────────┤
│ PREREQ: Data Gate (BLOCKING) │
│ → verify AC, restrictions, input_data (incl. expected_results.md) │
│ │
│ Phase 1: Input Data & Expected Results Completeness Analysis │
│ → phases/01-input-data-analysis.md │
│ [BLOCKING: user confirms input data + expected results coverage] │
│ │
│ Phase 2: Test Scenario Specification │
│ → phases/02-test-scenarios.md │
│ → environment.md · test-data.md · blackbox-tests.md │
│ → performance-tests.md · resilience-tests.md · security-tests.md │
│ → resource-limit-tests.md · traceability-matrix.md │
│ [BLOCKING: user confirms test coverage] │
│ │
│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │
│ → phases/03-data-validation-gate.md │
│ [BLOCKING: coverage ≥ 75% required to pass] │
│ │
│ Hardware-Dependency Assessment (BLOCKING, pre-Phase-4) │
│ → phases/hardware-assessment.md │
│ │
│ Phase 4: Test Runner Script Generation │
│ → phases/04-runner-scripts.md │
│ → scripts/run-tests.sh (unit + blackbox) │
│ → scripts/run-performance-tests.sh (load/perf scenarios) │
│ │
│ cycle-update mode (scoped refresh) │
│ → modes/cycle-update.md │
├──────────────────────────────────────────────────────────────────────┤
│ Principles: Black-box only · Traceability · Save immediately │
│ Ask don't assume · Spec don't code │
│ No test without data · No test without expected result │
└──────────────────────────────────────────────────────────────────────┘