8.5 KiB
name, description, category, tags, disable-model-invocation
| name | description | category | tags | disable-model-invocation | |||
|---|---|---|---|---|---|---|---|
| test-run | Run the project's test suite, report results, and handle failures. Detects test runners automatically (pytest, dotnet test, cargo test, npm test) or uses scripts/run-tests.sh if available. Trigger phrases: - "run tests", "test suite", "verify tests" | build |
|
true |
Test Run
Run the project's test suite and report results. This skill is invoked by the autopilot at verification checkpoints — after implementing tests, after implementing features, or at any point where the test suite must pass before proceeding.
Workflow
1. Detect Test Runner
Check in order — first match wins:
scripts/run-tests.shexists → use it (the script already encodes the correct execution strategy)docker-compose.test.ymlexists → run the Docker Suitability Check (see below). Docker is preferred; use it unless hardware constraints prevent it.- Auto-detect from project files:
pytest.ini,pyproject.tomlwith[tool.pytest], orconftest.py→pytest*.csprojor*.sln→dotnet testCargo.toml→cargo testpackage.jsonwith test script →npm testMakefilewithtesttarget →make test
If no runner detected → report failure and ask user to specify.
Execution Environment Check
- Check
_docs/02_document/tests/environment.mdfor a "Test Execution" section. If the test-spec skill already assessed hardware dependencies and recorded a decision (local / docker / both), follow that decision. - If the "Test Execution" section says local → run tests directly on host (no Docker).
- If the "Test Execution" section says docker → use Docker (docker-compose).
- If the "Test Execution" section says both → run local first, then Docker (or vice versa), and merge results.
- If no prior decision exists → fall back to the hardware-dependency detection logic from the test-spec skill's "Hardware-Dependency & Execution Environment Assessment" section. Ask the user if hardware indicators are found.
2. Run Tests
- Execute the detected test runner
- Capture output: passed, failed, skipped, errors
- If a test environment was spun up, tear it down after tests complete
3. Report Results
Present a summary:
══════════════════════════════════════
TEST RESULTS: [N passed, M failed, K skipped, E errors]
══════════════════════════════════════
Important: Collection errors (import failures, missing dependencies, syntax errors) count as failures — they are not "skipped" or ignorable. If a collection error is caused by a missing dependency, install it (add to the project's dependency file and install) before re-running. The test runner script (run-tests.sh) should install all dependencies automatically — if it doesn't, fix the script to do so.
4. Diagnose Failures and Skips
Before presenting choices, list every failing/erroring/skipped test with a one-line root cause:
Failures:
1. test_foo.py::test_bar — missing dependency 'netron' (not installed)
2. test_baz.py::test_qux — AssertionError: expected 5, got 3 (logic error)
3. test_old.py::test_legacy — ImportError: no module 'removed_module' (possibly obsolete)
Skips:
1. test_x.py::test_pre_init — runtime skip: engine already initialized (unreachable in current test order)
2. test_y.py::test_docker_only — explicit @skip: requires Docker (dead code in local runs)
Categorize failures as: missing dependency, broken import, logic/assertion error, possibly obsolete, or environment-specific.
Categorize skips as: explicit skip (dead code), runtime skip (unreachable), environment mismatch, or missing fixture/data.
5. Handle Outcome
All tests pass, zero skipped → return success to the autopilot for auto-chain.
Any test fails or errors → this is a blocking gate. Never silently ignore failures. Always investigate the root cause before deciding on an action. Read the failing test code, read the error output, check service logs if applicable, and determine whether the bug is in the test or in the production code.
After investigating, present:
══════════════════════════════════════
TEST RESULTS: [N passed, M failed, K skipped, E errors]
══════════════════════════════════════
Failures:
1. test_X — root cause: [detailed reason] → action: [fix test / fix code / remove + justification]
══════════════════════════════════════
A) Apply recommended fixes, then re-run
B) Abort — fix manually
══════════════════════════════════════
Recommendation: A — fix root causes before proceeding
══════════════════════════════════════
- If user picks A → apply fixes, then re-run (loop back to step 2)
- If user picks B → return failure to the autopilot
Any test skipped → this is also a blocking gate. Skipped tests mean something is wrong — either with the test, the environment, or the test design. Never blindly remove a skipped test. Always investigate the root cause first.
Investigation Protocol for Skipped Tests
For each skipped test:
- Read the test code — understand what the test is supposed to verify and why it skips.
- Determine the root cause — why did the skip condition fire?
- Is the test environment misconfigured? (e.g., wrong ports, missing env vars, service not started correctly)
- Is the test ordering wrong? (e.g., a fixture in an earlier test mutates shared state)
- Is a dependency missing? (e.g., package not installed, fixture file absent)
- Is the skip condition outdated? (e.g., code was refactored but the skip guard still checks the old behavior)
- Is the test fundamentally untestable in the current setup? (e.g., requires Docker restart, different OS, special hardware)
- Try to fix the root cause first — the goal is to make the test run, not to delete it:
- Fix the environment or configuration
- Reorder tests or isolate shared state
- Install the missing dependency
- Update the skip condition to match current behavior
- Only remove as last resort — if the test truly cannot run in any realistic test environment (e.g., requires hardware not available, duplicates another test with identical assertions), then removal is justified. Document the reasoning.
Categorization
- explicit skip (dead code): Has
@pytest.mark.skip— investigate whether the reason in the decorator is still valid. Often these are temporary skips that became permanent by accident. - runtime skip (unreachable):
pytest.skip()fires inside the test body — investigate why the condition always triggers. Often fixable by adjusting test order, environment, or the condition itself. - environment mismatch: Test assumes a different environment — investigate whether the test environment setup can be fixed.
- missing fixture/data: Data or service not available — investigate whether it can be provided.
After investigating, present findings:
══════════════════════════════════════
SKIPPED TESTS: K tests skipped
══════════════════════════════════════
1. test_X — root cause: [detailed reason] → action: [fix / restructure / remove + justification]
2. test_Y — root cause: [detailed reason] → action: [fix / restructure / remove + justification]
══════════════════════════════════════
A) Apply recommended fixes, then re-run
B) Accept skips and proceed (requires user justification per skip)
══════════════════════════════════════
Only option B allows proceeding with skips, and it requires explicit user approval with documented justification for each skip.
Trigger Conditions
This skill is invoked by the autopilot at test verification checkpoints. It is not typically invoked directly by the user.