mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-23 07:46:34 +00:00
8c665bd0a4
- Updated SKILL.md for test-run to clarify the use of Docker and added a Docker Suitability Check to ensure proper execution environment. - Revised SKILL.md for test-spec to include a detailed Docker Suitability Assessment, outlining disqualifying factors and decision-making steps for test execution. - Adjusted the autopilot state to reflect the current status of test execution as in_progress. These changes improve the clarity and reliability of the testing process, ensuring that users are informed of potential constraints when using Docker.
124 lines
5.8 KiB
Markdown
124 lines
5.8 KiB
Markdown
---
|
|
name: test-run
|
|
description: |
|
|
Run the project's test suite, report results, and handle failures.
|
|
Detects test runners automatically (pytest, dotnet test, cargo test, npm test)
|
|
or uses scripts/run-tests.sh if available.
|
|
Trigger phrases:
|
|
- "run tests", "test suite", "verify tests"
|
|
category: build
|
|
tags: [testing, verification, test-suite]
|
|
disable-model-invocation: true
|
|
---
|
|
|
|
# Test Run
|
|
|
|
Run the project's test suite and report results. This skill is invoked by the autopilot at verification checkpoints — after implementing tests, after implementing features, or at any point where the test suite must pass before proceeding.
|
|
|
|
## Workflow
|
|
|
|
### 1. Detect Test Runner
|
|
|
|
Check in order — first match wins:
|
|
|
|
1. `scripts/run-tests.sh` exists → use it (the script already encodes the correct execution strategy)
|
|
2. `docker-compose.test.yml` exists → run the Docker Suitability Check (see below). Docker is preferred; use it unless hardware constraints prevent it.
|
|
3. Auto-detect from project files:
|
|
- `pytest.ini`, `pyproject.toml` with `[tool.pytest]`, or `conftest.py` → `pytest`
|
|
- `*.csproj` or `*.sln` → `dotnet test`
|
|
- `Cargo.toml` → `cargo test`
|
|
- `package.json` with test script → `npm test`
|
|
- `Makefile` with `test` target → `make test`
|
|
|
|
If no runner detected → report failure and ask user to specify.
|
|
|
|
#### Docker Suitability Check
|
|
|
|
Docker is the preferred test environment. Before using it, verify no constraints prevent easy Docker execution:
|
|
|
|
1. Check `_docs/02_document/tests/environment.md` for a "Test Execution" decision (if the test-spec skill already assessed this, follow that decision)
|
|
2. If no prior decision exists, check for disqualifying factors:
|
|
- Hardware bindings: GPU, MPS, CUDA, TPU, FPGA, sensors, cameras, serial devices, host-level drivers
|
|
- Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs
|
|
- Data/volume constraints: large files (> 100MB) impractical to copy into a container
|
|
- Network/environment: host networking, VPN, specific DNS/firewall rules
|
|
- Performance: Docker overhead would invalidate benchmarks or latency measurements
|
|
3. If any disqualifying factor found → fall back to local test runner. Present to user using Choose format:
|
|
|
|
```
|
|
══════════════════════════════════════
|
|
DECISION REQUIRED: Docker is preferred but factors
|
|
preventing easy Docker execution detected
|
|
══════════════════════════════════════
|
|
Factors detected:
|
|
- [list factors]
|
|
══════════════════════════════════════
|
|
A) Run tests locally (recommended)
|
|
B) Run tests in Docker anyway
|
|
══════════════════════════════════════
|
|
Recommendation: A — detected constraints prevent
|
|
easy Docker execution
|
|
══════════════════════════════════════
|
|
```
|
|
|
|
4. If no disqualifying factors → use Docker (preferred default)
|
|
|
|
### 2. Run Tests
|
|
|
|
1. Execute the detected test runner
|
|
2. Capture output: passed, failed, skipped, errors
|
|
3. If a test environment was spun up, tear it down after tests complete
|
|
|
|
### 3. Report Results
|
|
|
|
Present a summary:
|
|
|
|
```
|
|
══════════════════════════════════════
|
|
TEST RESULTS: [N passed, M failed, K skipped, E errors]
|
|
══════════════════════════════════════
|
|
```
|
|
|
|
**Important**: Collection errors (import failures, missing dependencies, syntax errors) count as failures — they are not "skipped" or ignorable.
|
|
|
|
### 4. Diagnose Failures
|
|
|
|
Before presenting choices, list every failing/erroring test with a one-line root cause:
|
|
|
|
```
|
|
Failures:
|
|
1. test_foo.py::test_bar — missing dependency 'netron' (not installed)
|
|
2. test_baz.py::test_qux — AssertionError: expected 5, got 3 (logic error)
|
|
3. test_old.py::test_legacy — ImportError: no module 'removed_module' (possibly obsolete)
|
|
```
|
|
|
|
Categorize each as: **missing dependency**, **broken import**, **logic/assertion error**, **possibly obsolete**, or **environment-specific**.
|
|
|
|
### 5. Handle Outcome
|
|
|
|
**All tests pass** → return success to the autopilot for auto-chain.
|
|
|
|
**Any test fails or errors** → this is a **blocking gate**. Never silently ignore or skip failures. Present using Choose format:
|
|
|
|
```
|
|
══════════════════════════════════════
|
|
TEST RESULTS: [N passed, M failed, K skipped, E errors]
|
|
══════════════════════════════════════
|
|
A) Investigate and fix failing tests/code, then re-run
|
|
B) Remove obsolete tests (if diagnosis shows they are no longer relevant)
|
|
C) Leave as-is — acknowledged tech debt (not recommended)
|
|
D) Abort — fix manually
|
|
══════════════════════════════════════
|
|
Recommendation: A — fix failures before proceeding
|
|
══════════════════════════════════════
|
|
```
|
|
|
|
- If user picks A → investigate root causes, attempt fixes, then re-run (loop back to step 2)
|
|
- If user picks B → confirm which tests to remove, delete them, then re-run (loop back to step 2)
|
|
- If user picks C → require explicit user confirmation; log as acknowledged tech debt in the report, then return success with warning to the autopilot
|
|
- If user picks D → return failure to the autopilot
|
|
|
|
## Trigger Conditions
|
|
|
|
This skill is invoked by the autopilot at test verification checkpoints. It is not typically invoked directly by the user.
|