ui/.cursor/skills/test-spec/phases/04-runner-scripts.md

# Phase 4: Test Runner Script Generation

**Skip condition**: If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.

**Role**: DevOps engineer
**Goal**: Generate executable shell scripts that run the specified tests, so autodev and CI can invoke them consistently.
**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Hardware-Dependency Assessment decision recorded in `environment.md`.

**Prerequisite**: `phases/hardware-assessment.md` must have completed and written the "Test Execution" section to `TESTS_OUTPUT_DIR/environment.md`.

## Step 1 — Detect test infrastructure

1. Identify the project's test runner from manifests and config files:
   - Python: `pytest` (`pyproject.toml`, `setup.cfg`, `pytest.ini`)
   - .NET: `dotnet test` (`*.csproj`, `*.sln`)
   - Rust: `cargo test` (`Cargo.toml`)
   - Node: `npm test` or `vitest` / `jest` (`package.json`)
2. Check the Hardware-Dependency Assessment result recorded in `environment.md`:
   - If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
   - If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
   - If **both** was chosen → generate both
3. Identify performance/load testing tools from dependencies (`k6`, `locust`, `artillery`, `wrk`, or built-in benchmarks)
4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements

## Step 2 — Generate test runner

**Docker is the default.** Only generate a local `scripts/run-tests.sh` if the Hardware-Dependency Assessment determined **local** or **both** execution (i.e., the project requires real hardware like GPU/CoreML/TPU/sensors). For all other projects, use `docker-compose.test.yml` — it provides reproducibility, isolation, and CI parity without a custom shell script.

**If local script is needed** — create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:

1. Set `set -euo pipefail` and trap cleanup on EXIT
2. **Install all project and test dependencies** (e.g. `pip install -q -r requirements.txt -r e2e/requirements.txt`, `dotnet restore`, `npm ci`). This prevents collection-time import errors on fresh environments.
3. Optionally accept a `--unit-only` flag to skip blackbox tests
4. Run unit/blackbox tests using the detected test runner (activate virtualenv if present, run test runner directly on host)
5. Print a summary of passed/failed/skipped tests
6. Exit 0 on all pass, exit 1 on any failure

**If Docker** — generate or update `docker-compose.test.yml` that builds the test image, installs all dependencies inside the container, runs the test suite, and exits with the test runner's exit code.

## Step 3 — Generate `scripts/run-performance-tests.sh`

Create `scripts/run-performance-tests.sh` at the project root. The script must:

1. Set `set -euo pipefail` and trap cleanup on EXIT
2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
3. Start the system under test (local or docker-compose, matching the Hardware-Dependency Assessment decision)
4. Run load/performance scenarios using the detected tool
5. Compare results against threshold values from the test spec
6. Print a pass/fail summary per scenario
7. Exit 0 if all thresholds met, exit 1 otherwise

## Step 4 — Verify scripts

1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
2. Mark both scripts as executable (`chmod +x`)
3. Present a summary of what each script does to the user

## Save action

Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.