--- name: test-run description: | Run the project's test suite, report results, and handle failures. Detects test runners automatically (pytest, dotnet test, cargo test, npm test) or uses scripts/run-tests.sh if available. Trigger phrases: - "run tests", "test suite", "verify tests" category: build tags: [testing, verification, test-suite] disable-model-invocation: true --- # Test Run Run the project's test suite and report results. This skill is invoked by the autopilot at verification checkpoints — after implementing tests, after implementing features, or at any point where the test suite must pass before proceeding. ## Workflow ### 1. Detect Test Runner Check in order — first match wins: 1. `scripts/run-tests.sh` exists → use it (the script already encodes the correct execution strategy) 2. `docker-compose.test.yml` exists → run the Docker Suitability Check (see below). Docker is preferred; use it unless hardware constraints prevent it. 3. Auto-detect from project files: - `pytest.ini`, `pyproject.toml` with `[tool.pytest]`, or `conftest.py` → `pytest` - `*.csproj` or `*.sln` → `dotnet test` - `Cargo.toml` → `cargo test` - `package.json` with test script → `npm test` - `Makefile` with `test` target → `make test` If no runner detected → report failure and ask user to specify. #### Docker Suitability Check Docker is the preferred test environment. Before using it, verify no constraints prevent easy Docker execution: 1. Check `_docs/02_document/tests/environment.md` for a "Test Execution" decision (if the test-spec skill already assessed this, follow that decision) 2. If no prior decision exists, check for disqualifying factors: - Hardware bindings: GPU, MPS, CUDA, TPU, FPGA, sensors, cameras, serial devices, host-level drivers - Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs - Data/volume constraints: large files (> 100MB) impractical to copy into a container - Network/environment: host networking, VPN, specific DNS/firewall rules - Performance: Docker overhead would invalidate benchmarks or latency measurements 3. If any disqualifying factor found → fall back to local test runner. Present to user using Choose format: ``` ══════════════════════════════════════ DECISION REQUIRED: Docker is preferred but factors preventing easy Docker execution detected ══════════════════════════════════════ Factors detected: - [list factors] ══════════════════════════════════════ A) Run tests locally (recommended) B) Run tests in Docker anyway ══════════════════════════════════════ Recommendation: A — detected constraints prevent easy Docker execution ══════════════════════════════════════ ``` 4. If no disqualifying factors → use Docker (preferred default) ### 2. Run Tests 1. Execute the detected test runner 2. Capture output: passed, failed, skipped, errors 3. If a test environment was spun up, tear it down after tests complete ### 3. Report Results Present a summary: ``` ══════════════════════════════════════ TEST RESULTS: [N passed, M failed, K skipped, E errors] ══════════════════════════════════════ ``` **Important**: Collection errors (import failures, missing dependencies, syntax errors) count as failures — they are not "skipped" or ignorable. ### 4. Diagnose Failures Before presenting choices, list every failing/erroring test with a one-line root cause: ``` Failures: 1. test_foo.py::test_bar — missing dependency 'netron' (not installed) 2. test_baz.py::test_qux — AssertionError: expected 5, got 3 (logic error) 3. test_old.py::test_legacy — ImportError: no module 'removed_module' (possibly obsolete) ``` Categorize each as: **missing dependency**, **broken import**, **logic/assertion error**, **possibly obsolete**, or **environment-specific**. ### 5. Handle Outcome **All tests pass** → return success to the autopilot for auto-chain. **Any test fails or errors** → this is a **blocking gate**. Never silently ignore or skip failures. Present using Choose format: ``` ══════════════════════════════════════ TEST RESULTS: [N passed, M failed, K skipped, E errors] ══════════════════════════════════════ A) Investigate and fix failing tests/code, then re-run B) Remove obsolete tests (if diagnosis shows they are no longer relevant) C) Leave as-is — acknowledged tech debt (not recommended) D) Abort — fix manually ══════════════════════════════════════ Recommendation: A — fix failures before proceeding ══════════════════════════════════════ ``` - If user picks A → investigate root causes, attempt fixes, then re-run (loop back to step 2) - If user picks B → confirm which tests to remove, delete them, then re-run (loop back to step 2) - If user picks C → require explicit user confirmation; log as acknowledged tech debt in the report, then return success with warning to the autopilot - If user picks D → return failure to the autopilot ## Trigger Conditions This skill is invoked by the autopilot at test verification checkpoints. It is not typically invoked directly by the user.