mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-23 04:26:35 +00:00
8c665bd0a4
- Updated SKILL.md for test-run to clarify the use of Docker and added a Docker Suitability Check to ensure proper execution environment. - Revised SKILL.md for test-spec to include a detailed Docker Suitability Assessment, outlining disqualifying factors and decision-making steps for test execution. - Adjusted the autopilot state to reflect the current status of test execution as in_progress. These changes improve the clarity and reliability of the testing process, ensuring that users are informed of potential constraints when using Docker.
513 lines
29 KiB
Markdown
513 lines
29 KiB
Markdown
---
|
|
name: test-spec
|
|
description: |
|
|
Test specification skill. Analyzes input data and expected results completeness,
|
|
then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits)
|
|
that treat the system as a black box. Every test pairs input data with quantifiable expected results
|
|
so tests can verify correctness, not just execution.
|
|
4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate,
|
|
test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/.
|
|
Trigger phrases:
|
|
- "test spec", "test specification", "test scenarios"
|
|
- "blackbox test spec", "black box tests", "blackbox tests"
|
|
- "performance tests", "resilience tests", "security tests"
|
|
category: build
|
|
tags: [testing, black-box, blackbox-tests, test-specification, qa]
|
|
disable-model-invocation: true
|
|
---
|
|
|
|
# Test Scenario Specification
|
|
|
|
Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
|
|
|
|
## Core Principles
|
|
|
|
- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
|
|
- **Traceability**: every test traces to at least one acceptance criterion or restriction
|
|
- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
|
|
- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
|
|
- **Spec, don't code**: this workflow produces test specifications, never test implementation code
|
|
- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed
|
|
- **No test without expected result**: every test scenario MUST pair input data with a quantifiable expected result; a test that cannot compare actual output against a known-correct answer is not verifiable and must be removed
|
|
|
|
## Context Resolution
|
|
|
|
Fixed paths — no mode detection needed:
|
|
|
|
- PROBLEM_DIR: `_docs/00_problem/`
|
|
- SOLUTION_DIR: `_docs/01_solution/`
|
|
- DOCUMENT_DIR: `_docs/02_document/`
|
|
- TESTS_OUTPUT_DIR: `_docs/02_document/tests/`
|
|
|
|
Announce the resolved paths to the user before proceeding.
|
|
|
|
## Input Specification
|
|
|
|
### Required Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `_docs/00_problem/problem.md` | Problem description and context |
|
|
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
|
|
| `_docs/00_problem/restrictions.md` | Constraints and limitations |
|
|
| `_docs/00_problem/input_data/` | Reference data examples, expected results, and optional reference files |
|
|
| `_docs/01_solution/solution.md` | Finalized solution |
|
|
|
|
### Expected Results Specification
|
|
|
|
Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be **quantifiable** — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict.
|
|
|
|
Expected results live inside `_docs/00_problem/input_data/` in one or both of:
|
|
|
|
1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md`
|
|
|
|
2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file
|
|
|
|
```
|
|
input_data/
|
|
├── expected_results/ ← required: expected results folder
|
|
│ ├── results_report.md ← required: input→expected result mapping
|
|
│ ├── image_01_expected.csv ← per-file expected detections
|
|
│ └── video_01_expected.csv
|
|
├── image_01.jpg
|
|
├── empty_scene.jpg
|
|
└── data_parameters.md
|
|
```
|
|
|
|
**Quantifiability requirements** (see template for full format and examples):
|
|
- Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`)
|
|
- Structured data: exact JSON/CSV values, or a reference file in `expected_results/`
|
|
- Counts: exact counts (e.g., "3 detections", "0 errors")
|
|
- Text/patterns: exact string or regex pattern to match
|
|
- Timing: threshold (e.g., "response ≤ 500ms")
|
|
- Error cases: expected error code, message pattern, or HTTP status
|
|
|
|
### Optional Files (used when available)
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
|
|
| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
|
|
| `DOCUMENT_DIR/components/` | Component specs for interface identification |
|
|
|
|
### Prerequisite Checks (BLOCKING)
|
|
|
|
1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
|
|
2. `restrictions.md` exists and is non-empty — **STOP if missing**
|
|
3. `input_data/` exists and contains at least one file — **STOP if missing**
|
|
4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."*
|
|
5. `problem.md` exists and is non-empty — **STOP if missing**
|
|
6. `solution.md` exists and is non-empty — **STOP if missing**
|
|
7. Create TESTS_OUTPUT_DIR if it does not exist
|
|
8. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**
|
|
|
|
## Artifact Management
|
|
|
|
### Directory Structure
|
|
|
|
```
|
|
TESTS_OUTPUT_DIR/
|
|
├── environment.md
|
|
├── test-data.md
|
|
├── blackbox-tests.md
|
|
├── performance-tests.md
|
|
├── resilience-tests.md
|
|
├── security-tests.md
|
|
├── resource-limit-tests.md
|
|
└── traceability-matrix.md
|
|
```
|
|
|
|
### Save Timing
|
|
|
|
| Phase | Save immediately after | Filename |
|
|
|-------|------------------------|----------|
|
|
| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
|
|
| Phase 2 | Environment spec | `environment.md` |
|
|
| Phase 2 | Test data spec | `test-data.md` |
|
|
| Phase 2 | Blackbox tests | `blackbox-tests.md` |
|
|
| Phase 2 | Performance tests | `performance-tests.md` |
|
|
| Phase 2 | Resilience tests | `resilience-tests.md` |
|
|
| Phase 2 | Security tests | `security-tests.md` |
|
|
| Phase 2 | Resource limit tests | `resource-limit-tests.md` |
|
|
| Phase 2 | Traceability matrix | `traceability-matrix.md` |
|
|
| Phase 3 | Updated test data spec (if data added) | `test-data.md` |
|
|
| Phase 3 | Updated test files (if tests removed) | respective test file |
|
|
| Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
|
|
| Phase 4 | Test runner script | `scripts/run-tests.sh` |
|
|
| Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` |
|
|
|
|
### Resumability
|
|
|
|
If TESTS_OUTPUT_DIR already contains files:
|
|
|
|
1. List existing files and match them to the save timing table above
|
|
2. Identify which phase/artifacts are complete
|
|
3. Resume from the next incomplete artifact
|
|
4. Inform the user which artifacts are being skipped
|
|
|
|
## Progress Tracking
|
|
|
|
At the start of execution, create a TodoWrite with all four phases. Update status as each phase completes.
|
|
|
|
## Workflow
|
|
|
|
### Phase 1: Input Data Completeness Analysis
|
|
|
|
**Role**: Professional Quality Assurance Engineer
|
|
**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
|
|
**Constraints**: Analysis only — no test specs yet
|
|
|
|
1. Read `_docs/01_solution/solution.md`
|
|
2. Read `acceptance_criteria.md`, `restrictions.md`
|
|
3. Read testing strategy from solution.md (if present)
|
|
4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
|
|
5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/`
|
|
6. Analyze `input_data/` contents against:
|
|
- Coverage of acceptance criteria scenarios
|
|
- Coverage of restriction edge cases
|
|
- Coverage of testing strategy requirements
|
|
7. Analyze `input_data/expected_results/results_report.md` completeness:
|
|
- Every input data item has a corresponding expected result row in the mapping
|
|
- Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
|
|
- Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
|
|
- Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
|
|
8. Present input-to-expected-result pairing assessment:
|
|
|
|
| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
|
|
|------------|--------------------------|---------------|----------------|
|
|
| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
|
|
|
|
9. Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
|
|
10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
|
|
11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
|
|
|
|
**BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient.
|
|
|
|
---
|
|
|
|
### Phase 2: Test Scenario Specification
|
|
|
|
**Role**: Professional Quality Assurance Engineer
|
|
**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios
|
|
**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
|
|
|
|
Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
|
|
|
|
1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
|
|
2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
|
|
3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
|
|
4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
|
|
5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
|
|
6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
|
|
7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
|
|
8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
|
|
|
|
**Self-verification**:
|
|
- [ ] Every acceptance criterion is covered by at least one test scenario
|
|
- [ ] Every restriction is verified by at least one test scenario
|
|
- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md`
|
|
- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
|
|
- [ ] Positive and negative scenarios are balanced
|
|
- [ ] Consumer app has no direct access to system internals
|
|
- [ ] Test environment matches project constraints (see Docker Suitability Assessment below)
|
|
- [ ] External dependencies have mock/stub services defined
|
|
- [ ] Traceability matrix has no uncovered AC or restrictions
|
|
|
|
**Save action**: Write all files under TESTS_OUTPUT_DIR:
|
|
- `environment.md`
|
|
- `test-data.md`
|
|
- `blackbox-tests.md`
|
|
- `performance-tests.md`
|
|
- `resilience-tests.md`
|
|
- `security-tests.md`
|
|
- `resource-limit-tests.md`
|
|
- `traceability-matrix.md`
|
|
|
|
**BLOCKING**: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed.
|
|
|
|
Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
|
|
|
|
---
|
|
|
|
### Phase 3: Test Data Validation Gate (HARD GATE)
|
|
|
|
**Role**: Professional Quality Assurance Engineer
|
|
**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%.
|
|
**Constraints**: This phase is MANDATORY and cannot be skipped.
|
|
|
|
#### Step 1 — Build the test-data and expected-result requirements checklist
|
|
|
|
Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, extract:
|
|
|
|
| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
|
|
|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
|
|
| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
|
|
|
|
Present this table to the user.
|
|
|
|
#### Step 2 — Ask user to provide missing test data AND expected results
|
|
|
|
For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
|
|
|
|
> **Option A — Provide the missing items**: Supply what is missing:
|
|
> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
|
|
> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
|
|
>
|
|
> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
|
|
> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
|
|
> - "HTTP 200 with JSON body matching `expected_response_01.json`"
|
|
> - "Processing time < 500ms"
|
|
> - "0 false positives in the output set"
|
|
>
|
|
> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
|
|
|
|
**BLOCKING**: Wait for the user's response for every missing item.
|
|
|
|
#### Step 3 — Validate provided data and expected results
|
|
|
|
For each item where the user chose **Option A**:
|
|
|
|
**Input data validation**:
|
|
1. Verify the data file(s) exist at the indicated location
|
|
2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
|
|
3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
|
|
|
|
**Expected result validation**:
|
|
4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/`
|
|
5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
|
|
- Exact values (counts, strings, status codes)
|
|
- Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
|
|
- Pattern matches (regex, substring, JSON schema)
|
|
- Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
|
|
- Reference file for structural comparison (JSON diff, CSV diff)
|
|
6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
|
|
7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
|
|
|
|
If any validation fails, report the specific issue and loop back to Step 2 for that item.
|
|
|
|
#### Step 4 — Remove tests without data or expected results
|
|
|
|
For each item where the user chose **Option B**:
|
|
|
|
1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
|
|
2. Remove the test scenario from the respective test file
|
|
3. Remove corresponding rows from `traceability-matrix.md`
|
|
4. Update `test-data.md` to reflect the removal
|
|
|
|
**Save action**: Write updated files under TESTS_OUTPUT_DIR:
|
|
- `test-data.md`
|
|
- Affected test files (if tests removed)
|
|
- `traceability-matrix.md` (if tests removed)
|
|
|
|
#### Step 5 — Final coverage check
|
|
|
|
After all removals, recalculate coverage:
|
|
|
|
1. Count remaining test scenarios that trace to acceptance criteria
|
|
2. Count total acceptance criteria + restrictions
|
|
3. Calculate coverage percentage: `covered_items / total_items * 100`
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Total AC + Restrictions | ? |
|
|
| Covered by remaining tests | ? |
|
|
| **Coverage %** | **?%** |
|
|
|
|
**Decision**:
|
|
|
|
- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user.
|
|
- **Coverage < 70%** → Phase 3 **FAILED**. Report:
|
|
> ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
|
|
>
|
|
> | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
|
|
> |---|---|---|
|
|
>
|
|
> **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
|
|
|
|
**BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%.
|
|
|
|
#### Phase 3 Completion
|
|
|
|
When coverage ≥ 70% and all remaining tests have validated data AND quantifiable expected results:
|
|
|
|
1. Present the final coverage report
|
|
2. List all removed tests (if any) with reasons
|
|
3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
|
|
4. Confirm all artifacts are saved and consistent
|
|
|
|
---
|
|
|
|
### Docker Suitability Assessment (BLOCKING — runs before Phase 4)
|
|
|
|
Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). Before generating scripts, check whether the project has any constraints that prevent easy Docker usage.
|
|
|
|
**Disqualifying factors** (any one is sufficient to fall back to local):
|
|
- Hardware bindings: GPU, MPS, TPU, FPGA, accelerators, sensors, cameras, serial devices, host-level drivers (CUDA, Metal, OpenCL, etc.)
|
|
- Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs not installable in a container
|
|
- Data/volume constraints: large files (> 100MB) that would be impractical to copy into a container, databases that must run on the host
|
|
- Network/environment: tests that require host networking, VPN access, or specific DNS/firewall rules
|
|
- Performance: Docker overhead would invalidate benchmarks or latency-sensitive measurements
|
|
|
|
**Assessment steps**:
|
|
1. Scan project source, config files, and dependencies for indicators of the factors above
|
|
2. Check `TESTS_OUTPUT_DIR/environment.md` for environment requirements
|
|
3. Check `_docs/00_problem/restrictions.md` and `_docs/01_solution/solution.md` for constraints
|
|
|
|
**Decision**:
|
|
- If ANY disqualifying factor is found → recommend **local test execution** as fallback. Present to user using Choose format:
|
|
|
|
```
|
|
══════════════════════════════════════
|
|
DECISION REQUIRED: Test execution environment
|
|
══════════════════════════════════════
|
|
Docker is preferred, but factors preventing easy
|
|
Docker execution detected:
|
|
- [list factors found]
|
|
══════════════════════════════════════
|
|
A) Local execution (recommended)
|
|
B) Docker execution (constraints may cause issues)
|
|
══════════════════════════════════════
|
|
Recommendation: A — detected constraints prevent
|
|
easy Docker execution
|
|
══════════════════════════════════════
|
|
```
|
|
|
|
- If NO disqualifying factors → use Docker (preferred default)
|
|
- Record the decision in `TESTS_OUTPUT_DIR/environment.md` under a "Test Execution" section
|
|
|
|
---
|
|
|
|
### Phase 4: Test Runner Script Generation
|
|
|
|
**Role**: DevOps engineer
|
|
**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
|
|
**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Docker Suitability Assessment decision above.
|
|
|
|
#### Step 1 — Detect test infrastructure
|
|
|
|
1. Identify the project's test runner from manifests and config files:
|
|
- Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini)
|
|
- .NET: `dotnet test` (*.csproj, *.sln)
|
|
- Rust: `cargo test` (Cargo.toml)
|
|
- Node: `npm test` or `vitest` / `jest` (package.json)
|
|
2. Check Docker Suitability Assessment result:
|
|
- If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
|
|
- If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
|
|
3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
|
|
4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
|
|
|
|
#### Step 2 — Generate `scripts/run-tests.sh`
|
|
|
|
Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
|
|
|
|
1. Set `set -euo pipefail` and trap cleanup on EXIT
|
|
2. Optionally accept a `--unit-only` flag to skip blackbox tests
|
|
3. Run unit/blackbox tests using the detected test runner:
|
|
- **Local mode**: activate virtualenv (if present), run test runner directly on host
|
|
- **Docker mode**: spin up docker-compose environment, wait for health checks, run test suite, tear down
|
|
4. Print a summary of passed/failed/skipped tests
|
|
5. Exit 0 on all pass, exit 1 on any failure
|
|
|
|
#### Step 3 — Generate `scripts/run-performance-tests.sh`
|
|
|
|
Create `scripts/run-performance-tests.sh` at the project root. The script must:
|
|
|
|
1. Set `set -euo pipefail` and trap cleanup on EXIT
|
|
2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
|
|
3. Start the system under test (local or docker-compose, matching the Docker Suitability Assessment decision)
|
|
4. Run load/performance scenarios using the detected tool
|
|
5. Compare results against threshold values from the test spec
|
|
6. Print a pass/fail summary per scenario
|
|
7. Exit 0 if all thresholds met, exit 1 otherwise
|
|
|
|
#### Step 4 — Verify scripts
|
|
|
|
1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
|
|
2. Mark both scripts as executable (`chmod +x`)
|
|
3. Present a summary of what each script does to the user
|
|
|
|
**Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
|
|
|
|
---
|
|
|
|
## Escalation Rules
|
|
|
|
| Situation | Action |
|
|
|-----------|--------|
|
|
| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
|
|
| Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template |
|
|
| Ambiguous requirements | ASK user |
|
|
| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
|
|
| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
|
|
| Test scenario conflicts with restrictions | ASK user to clarify intent |
|
|
| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
|
|
| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
|
|
| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
|
|
|
|
## Common Mistakes
|
|
|
|
- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
|
|
- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
|
|
- **Missing expected results**: input data without a paired expected result is useless — the test cannot determine pass/fail without knowing what "correct" looks like
|
|
- **Non-quantifiable expected results**: "should return good results" is not verifiable; expected results must have exact values, tolerances, thresholds, or pattern matches that code can evaluate
|
|
- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
|
|
- **Untraceable tests**: every test should trace to at least one AC or restriction
|
|
- **Writing test code**: this skill produces specifications, never implementation code
|
|
- **Tests without data**: every test scenario MUST have concrete test data AND a quantifiable expected result; a test spec without either is not executable and must be removed
|
|
|
|
## Trigger Conditions
|
|
|
|
When the user wants to:
|
|
- Specify blackbox tests before implementation or refactoring
|
|
- Analyze input data completeness for test coverage
|
|
- Produce test scenarios from acceptance criteria
|
|
|
|
**Keywords**: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios"
|
|
|
|
## Methodology Quick Reference
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────────┐
|
|
│ Test Scenario Specification (4-Phase) │
|
|
├──────────────────────────────────────────────────────────────────────┤
|
|
│ PREREQ: Data Gate (BLOCKING) │
|
|
│ → verify AC, restrictions, input_data (incl. expected_results.md) │
|
|
│ │
|
|
│ Phase 1: Input Data & Expected Results Completeness Analysis │
|
|
│ → assess input_data/ coverage vs AC scenarios (≥70%) │
|
|
│ → verify every input has a quantifiable expected result │
|
|
│ → present input→expected-result pairing assessment │
|
|
│ [BLOCKING: user confirms input data + expected results coverage] │
|
|
│ │
|
|
│ Phase 2: Test Scenario Specification │
|
|
│ → environment.md │
|
|
│ → test-data.md (with expected results mapping) │
|
|
│ → blackbox-tests.md (positive + negative) │
|
|
│ → performance-tests.md │
|
|
│ → resilience-tests.md │
|
|
│ → security-tests.md │
|
|
│ → resource-limit-tests.md │
|
|
│ → traceability-matrix.md │
|
|
│ [BLOCKING: user confirms test coverage] │
|
|
│ │
|
|
│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │
|
|
│ → build test-data + expected-result requirements checklist │
|
|
│ → ask user: provide data+result (A) or remove test (B) │
|
|
│ → validate input data (quality + quantity) │
|
|
│ → validate expected results (quantifiable + comparison method) │
|
|
│ → remove tests without data or expected result, warn user │
|
|
│ → final coverage check (≥70% or FAIL + loop back) │
|
|
│ [BLOCKING: coverage ≥ 70% required to pass] │
|
|
│ │
|
|
│ Phase 4: Test Runner Script Generation │
|
|
│ → detect test runner + docker-compose + load tool │
|
|
│ → scripts/run-tests.sh (unit + blackbox) │
|
|
│ → scripts/run-performance-tests.sh (load/perf scenarios) │
|
|
│ → verify scripts are valid and executable │
|
|
├──────────────────────────────────────────────────────────────────────┤
|
|
│ Principles: Black-box only · Traceability · Save immediately │
|
|
│ Ask don't assume · Spec don't code │
|
|
│ No test without data · No test without expected result │
|
|
└──────────────────────────────────────────────────────────────────────┘
|
|
```
|