mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-23 08:06:35 +00:00
cdcd1f6ea7
- Rewrite autopilot flow resolution to 4 deterministic rules based on source code + docs + state file presence - Replace all hard-coded Jira references with tracker-agnostic terminology across 30+ files - Move project-management.mdc to _project.md (project-specific, not portable with .cursor) - Rename FINAL_implementation_report.md to context-dependent names (implementation_report_tests/features/refactor) - Remove "acknowledged tech debt" option from test-run — failing tests must be fixed or removed - Add debug/error recovery protocol to protocols.md - Align directory paths: metrics -> 06_metrics/, add 05_security/, reviews/, 02_task_plans/ to README - Add missing skills (test-spec, test-run, new-task, ui-design) to README - Use language-appropriate comment syntax for Arrange/Act/Assert in coderule + testing rules - Copy updated coderule.mdc to parent suite/.cursor/rules/ - Raise max task complexity from 5 to 8 points in decompose - Skip test-spec Phase 4 (script generation) during planning context - Document per-batch vs post-implement test run as intentional - Add skill-internal state cross-check rule to state.md
122 lines
5.6 KiB
Markdown
122 lines
5.6 KiB
Markdown
---
|
|
name: test-run
|
|
description: |
|
|
Run the project's test suite, report results, and handle failures.
|
|
Detects test runners automatically (pytest, dotnet test, cargo test, npm test)
|
|
or uses scripts/run-tests.sh if available.
|
|
Trigger phrases:
|
|
- "run tests", "test suite", "verify tests"
|
|
category: build
|
|
tags: [testing, verification, test-suite]
|
|
disable-model-invocation: true
|
|
---
|
|
|
|
# Test Run
|
|
|
|
Run the project's test suite and report results. This skill is invoked by the autopilot at verification checkpoints — after implementing tests, after implementing features, or at any point where the test suite must pass before proceeding.
|
|
|
|
## Workflow
|
|
|
|
### 1. Detect Test Runner
|
|
|
|
Check in order — first match wins:
|
|
|
|
1. `scripts/run-tests.sh` exists → use it (the script already encodes the correct execution strategy)
|
|
2. `docker-compose.test.yml` exists → run the Docker Suitability Check (see below). Docker is preferred; use it unless hardware constraints prevent it.
|
|
3. Auto-detect from project files:
|
|
- `pytest.ini`, `pyproject.toml` with `[tool.pytest]`, or `conftest.py` → `pytest`
|
|
- `*.csproj` or `*.sln` → `dotnet test`
|
|
- `Cargo.toml` → `cargo test`
|
|
- `package.json` with test script → `npm test`
|
|
- `Makefile` with `test` target → `make test`
|
|
|
|
If no runner detected → report failure and ask user to specify.
|
|
|
|
#### Docker Suitability Check
|
|
|
|
Docker is the preferred test environment. Before using it, verify no constraints prevent easy Docker execution:
|
|
|
|
1. Check `_docs/02_document/tests/environment.md` for a "Test Execution" decision (if the test-spec skill already assessed this, follow that decision)
|
|
2. If no prior decision exists, check for disqualifying factors:
|
|
- Hardware bindings: GPU, MPS, CUDA, TPU, FPGA, sensors, cameras, serial devices, host-level drivers
|
|
- Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs
|
|
- Data/volume constraints: large files (> 100MB) impractical to copy into a container
|
|
- Network/environment: host networking, VPN, specific DNS/firewall rules
|
|
- Performance: Docker overhead would invalidate benchmarks or latency measurements
|
|
3. If any disqualifying factor found → fall back to local test runner. Present to user using Choose format:
|
|
|
|
```
|
|
══════════════════════════════════════
|
|
DECISION REQUIRED: Docker is preferred but factors
|
|
preventing easy Docker execution detected
|
|
══════════════════════════════════════
|
|
Factors detected:
|
|
- [list factors]
|
|
══════════════════════════════════════
|
|
A) Run tests locally (recommended)
|
|
B) Run tests in Docker anyway
|
|
══════════════════════════════════════
|
|
Recommendation: A — detected constraints prevent
|
|
easy Docker execution
|
|
══════════════════════════════════════
|
|
```
|
|
|
|
4. If no disqualifying factors → use Docker (preferred default)
|
|
|
|
### 2. Run Tests
|
|
|
|
1. Execute the detected test runner
|
|
2. Capture output: passed, failed, skipped, errors
|
|
3. If a test environment was spun up, tear it down after tests complete
|
|
|
|
### 3. Report Results
|
|
|
|
Present a summary:
|
|
|
|
```
|
|
══════════════════════════════════════
|
|
TEST RESULTS: [N passed, M failed, K skipped, E errors]
|
|
══════════════════════════════════════
|
|
```
|
|
|
|
**Important**: Collection errors (import failures, missing dependencies, syntax errors) count as failures — they are not "skipped" or ignorable.
|
|
|
|
### 4. Diagnose Failures
|
|
|
|
Before presenting choices, list every failing/erroring test with a one-line root cause:
|
|
|
|
```
|
|
Failures:
|
|
1. test_foo.py::test_bar — missing dependency 'netron' (not installed)
|
|
2. test_baz.py::test_qux — AssertionError: expected 5, got 3 (logic error)
|
|
3. test_old.py::test_legacy — ImportError: no module 'removed_module' (possibly obsolete)
|
|
```
|
|
|
|
Categorize each as: **missing dependency**, **broken import**, **logic/assertion error**, **possibly obsolete**, or **environment-specific**.
|
|
|
|
### 5. Handle Outcome
|
|
|
|
**All tests pass** → return success to the autopilot for auto-chain.
|
|
|
|
**Any test fails or errors** → this is a **blocking gate**. Never silently ignore or skip failures. Present using Choose format:
|
|
|
|
```
|
|
══════════════════════════════════════
|
|
TEST RESULTS: [N passed, M failed, K skipped, E errors]
|
|
══════════════════════════════════════
|
|
A) Investigate and fix failing tests/code, then re-run
|
|
B) Remove obsolete tests (if diagnosis shows they are no longer relevant)
|
|
C) Abort — fix manually
|
|
══════════════════════════════════════
|
|
Recommendation: A — fix failures before proceeding
|
|
══════════════════════════════════════
|
|
```
|
|
|
|
- If user picks A → investigate root causes, attempt fixes, then re-run (loop back to step 2)
|
|
- If user picks B → confirm which tests to remove, delete them, then re-run (loop back to step 2)
|
|
- If user picks C → return failure to the autopilot
|
|
|
|
## Trigger Conditions
|
|
|
|
This skill is invoked by the autopilot at test verification checkpoints. It is not typically invoked directly by the user.
|