detections/.cursor/skills/test-run/SKILL.md

---
name: test-run
description: |
  Run the project's test suite, report results, and handle failures.
  Detects test runners automatically (pytest, dotnet test, cargo test, npm test)
  or uses scripts/run-tests.sh if available.
  Trigger phrases:
  - "run tests", "test suite", "verify tests"
category: build
tags: [testing, verification, test-suite]
disable-model-invocation: true
---

# Test Run

Run the project's test suite and report results. This skill is invoked by the autopilot at verification checkpoints — after implementing tests, after implementing features, or at any point where the test suite must pass before proceeding.

## Workflow

### 1. Detect Test Runner

Check in order — first match wins:

1. `scripts/run-tests.sh` exists → use it (the script already encodes the correct execution strategy)
2. `docker-compose.test.yml` exists → run the Docker Suitability Check (see below). Docker is preferred; use it unless hardware constraints prevent it.
3. Auto-detect from project files:
   - `pytest.ini`, `pyproject.toml` with `[tool.pytest]`, or `conftest.py` → `pytest`
   - `*.csproj` or `*.sln` → `dotnet test`
   - `Cargo.toml` → `cargo test`
   - `package.json` with test script → `npm test`
   - `Makefile` with `test` target → `make test`

If no runner detected → report failure and ask user to specify.

#### Execution Environment Check

1. Check `_docs/02_document/tests/environment.md` for a "Test Execution" section. If the test-spec skill already assessed hardware dependencies and recorded a decision (local / docker / both), **follow that decision**.
2. If the "Test Execution" section says **local** → run tests directly on host (no Docker).
3. If the "Test Execution" section says **docker** → use Docker (docker-compose).
4. If the "Test Execution" section says **both** → run local first, then Docker (or vice versa), and merge results.
5. If no prior decision exists → fall back to the hardware-dependency detection logic from the test-spec skill's "Hardware-Dependency & Execution Environment Assessment" section. Ask the user if hardware indicators are found.

### 2. Run Tests

1. Execute the detected test runner
2. Capture output: passed, failed, skipped, errors
3. If a test environment was spun up, tear it down after tests complete

### 3. Report Results

Present a summary:

```
══════════════════════════════════════
 TEST RESULTS: [N passed, M failed, K skipped, E errors]
══════════════════════════════════════
```

**Important**: Collection errors (import failures, missing dependencies, syntax errors) count as failures — they are not "skipped" or ignorable. If a collection error is caused by a missing dependency, install it (add to the project's dependency file and install) before re-running. The test runner script (`run-tests.sh`) should install all dependencies automatically — if it doesn't, fix the script to do so.

### 4. Diagnose Failures and Skips

Before presenting choices, list every failing/erroring/skipped test with a one-line root cause:

```
Failures:
 1. test_foo.py::test_bar — missing dependency 'netron' (not installed)
 2. test_baz.py::test_qux — AssertionError: expected 5, got 3 (logic error)
 3. test_old.py::test_legacy — ImportError: no module 'removed_module' (possibly obsolete)

Skips:
 1. test_x.py::test_pre_init — runtime skip: engine already initialized (unreachable in current test order)
 2. test_y.py::test_docker_only — explicit @skip: requires Docker (dead code in local runs)
```

Categorize failures as: **missing dependency**, **broken import**, **logic/assertion error**, **possibly obsolete**, or **environment-specific**.

Categorize skips as: **explicit skip (dead code)**, **runtime skip (unreachable)**, **environment mismatch**, or **missing fixture/data**.

### 5. Handle Outcome

**All tests pass, zero skipped** → return success to the autopilot for auto-chain.

**Any test fails or errors** → this is a **blocking gate**. Never silently ignore failures. **Always investigate the root cause before deciding on an action.** Read the failing test code, read the error output, check service logs if applicable, and determine whether the bug is in the test or in the production code.

After investigating, present:

```
══════════════════════════════════════
 TEST RESULTS: [N passed, M failed, K skipped, E errors]
══════════════════════════════════════
 Failures:
  1. test_X — root cause: [detailed reason] → action: [fix test / fix code / remove + justification]
══════════════════════════════════════
 A) Apply recommended fixes, then re-run
 B) Abort — fix manually
══════════════════════════════════════
 Recommendation: A — fix root causes before proceeding
══════════════════════════════════════
```

- If user picks A → apply fixes, then re-run (loop back to step 2)
- If user picks B → return failure to the autopilot

**Any skipped test** → classify as legitimate or illegitimate before deciding whether to block.

#### Legitimate skips (accept and proceed)

The code path genuinely cannot execute on this runner. Acceptable reasons:

- Hardware not physically present (GPU, Apple Neural Engine, sensor, serial device)
- Operating system mismatch (Darwin-only test on Linux CI, Windows-only test on macOS)
- Feature-flag-gated test whose feature is intentionally disabled in this environment
- External service the project deliberately does not control (e.g., a third-party API with no sandbox, and the project has a documented contract test instead)

For legitimate skips: verify the skip condition is accurate (the test would run if the hardware/OS were present), verify it has a clear reason string, and proceed.

#### Illegitimate skips (BLOCKING — must resolve)

The skip is a workaround for something we can and should fix. NOT acceptable reasons:

- Required service not running (database, message broker, downstream API we control) → fix: bring the service up, add a docker-compose dependency, or add a mock
- Missing test fixture, seed data, or sample file → fix: provide the data, generate it, or ASK the user for it
- Missing environment variable or credential → fix: add to `.env.example`, document, ASK user for the value
- Flaky-test quarantine with no tracking ticket → fix: create the ticket (or replay via leftovers if tracker is down)
- Inherited skip from a prior refactor that was never cleaned up → fix: clean it up now
- Test ordering mutates shared state → fix: isolate the state

**Rule of thumb**: if the reason for skipping is "we didn't set something up," that's not a valid skip — set it up. If the reason is "this hardware/OS isn't here," that's valid.

#### Resolution steps for illegitimate skips

1. Classify the skip (read the skip reason and test body)
2. If the fix is **mechanical** — start a container, install a dep, add a mock, reorder fixtures — attempt it automatically and re-run
3. If the fix requires **user input** — credentials, sample data, a business decision — BLOCK and ASK
4. Never silently mark the skip as "accepted" — every illegitimate skip must either be fixed or escalated
5. Removal is a last resort and requires explicit user approval with documented reasoning

#### Categorization cheatsheet

- **explicit skip (e.g. `@pytest.mark.skip`)**: check whether the reason in the decorator is still valid
- **conditional skip (e.g. `@pytest.mark.skipif`)**: check whether the condition is accurate and whether we can change the environment to make it false
- **runtime skip (e.g. `pytest.skip()` in body)**: check why the condition fires — often an ordering or environment bug
- **missing fixture/data**: treated as illegitimate unless user confirms the data is unavailable

After investigating, present findings:

```
══════════════════════════════════════
 SKIPPED TESTS: K tests skipped
══════════════════════════════════════
 1. test_X — root cause: [detailed reason] → action: [fix / restructure / remove + justification]
 2. test_Y — root cause: [detailed reason] → action: [fix / restructure / remove + justification]
══════════════════════════════════════
 A) Apply recommended fixes, then re-run
 B) Accept skips and proceed (requires user justification per skip)
══════════════════════════════════════
```

Only option B allows proceeding with skips, and it requires explicit user approval with documented justification for each skip.

## Trigger Conditions

This skill is invoked by the autopilot at test verification checkpoints. It is not typically invoked directly by the user.