mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 09:26:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh 8c665bd0a4 Enhance Docker suitability assessment and update test execution guidelines

- Updated SKILL.md for test-run to clarify the use of Docker and added a Docker Suitability Check to ensure proper execution environment.
- Revised SKILL.md for test-spec to include a detailed Docker Suitability Assessment, outlining disqualifying factors and decision-making steps for test execution.
- Adjusted the autopilot state to reflect the current status of test execution as in_progress.

These changes improve the clarity and reliability of the testing process, ensuring that users are informed of potential constraints when using Docker.

2026-03-28 01:04:11 +02:00

5.8 KiB

Raw Blame History

name, description, category, tags, disable-model-invocation

name

description

Test Run

Run the project's test suite and report results. This skill is invoked by the autopilot at verification checkpoints — after implementing tests, after implementing features, or at any point where the test suite must pass before proceeding.

Workflow

1. Detect Test Runner

Check in order — first match wins:

scripts/run-tests.sh exists → use it (the script already encodes the correct execution strategy)
docker-compose.test.yml exists → run the Docker Suitability Check (see below). Docker is preferred; use it unless hardware constraints prevent it.
Auto-detect from project files:
- pytest.ini, pyproject.toml with [tool.pytest], or conftest.py → pytest
- *.csproj or *.sln → dotnet test
- Cargo.toml → cargo test
- package.json with test script → npm test
- Makefile with test target → make test

If no runner detected → report failure and ask user to specify.

Docker Suitability Check

Docker is the preferred test environment. Before using it, verify no constraints prevent easy Docker execution:

Check _docs/02_document/tests/environment.md for a "Test Execution" decision (if the test-spec skill already assessed this, follow that decision)
If no prior decision exists, check for disqualifying factors:
- Hardware bindings: GPU, MPS, CUDA, TPU, FPGA, sensors, cameras, serial devices, host-level drivers
- Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs
- Data/volume constraints: large files (> 100MB) impractical to copy into a container
- Network/environment: host networking, VPN, specific DNS/firewall rules
- Performance: Docker overhead would invalidate benchmarks or latency measurements
If any disqualifying factor found → fall back to local test runner. Present to user using Choose format:

══════════════════════════════════════
 DECISION REQUIRED: Docker is preferred but factors
 preventing easy Docker execution detected
══════════════════════════════════════
 Factors detected:
 - [list factors]
══════════════════════════════════════
 A) Run tests locally (recommended)
 B) Run tests in Docker anyway
══════════════════════════════════════
 Recommendation: A — detected constraints prevent
 easy Docker execution
══════════════════════════════════════

If no disqualifying factors → use Docker (preferred default)

2. Run Tests

Execute the detected test runner
Capture output: passed, failed, skipped, errors
If a test environment was spun up, tear it down after tests complete

3. Report Results

Present a summary:

══════════════════════════════════════
 TEST RESULTS: [N passed, M failed, K skipped, E errors]
══════════════════════════════════════

Important: Collection errors (import failures, missing dependencies, syntax errors) count as failures — they are not "skipped" or ignorable.

4. Diagnose Failures

Before presenting choices, list every failing/erroring test with a one-line root cause:

Failures:
 1. test_foo.py::test_bar — missing dependency 'netron' (not installed)
 2. test_baz.py::test_qux — AssertionError: expected 5, got 3 (logic error)
 3. test_old.py::test_legacy — ImportError: no module 'removed_module' (possibly obsolete)

Categorize each as: missing dependency, broken import, logic/assertion error, possibly obsolete, or environment-specific.

5. Handle Outcome

All tests pass → return success to the autopilot for auto-chain.

Any test fails or errors → this is a blocking gate. Never silently ignore or skip failures. Present using Choose format:

══════════════════════════════════════
 TEST RESULTS: [N passed, M failed, K skipped, E errors]
══════════════════════════════════════
 A) Investigate and fix failing tests/code, then re-run
 B) Remove obsolete tests (if diagnosis shows they are no longer relevant)
 C) Leave as-is — acknowledged tech debt (not recommended)
 D) Abort — fix manually
══════════════════════════════════════
 Recommendation: A — fix failures before proceeding
══════════════════════════════════════

If user picks A → investigate root causes, attempt fixes, then re-run (loop back to step 2)
If user picks B → confirm which tests to remove, delete them, then re-run (loop back to step 2)
If user picks C → require explicit user confirmation; log as acknowledged tech debt in the report, then return success with warning to the autopilot
If user picks D → return failure to the autopilot

Trigger Conditions

This skill is invoked by the autopilot at test verification checkpoints. It is not typically invoked directly by the user.

5.8 KiB Raw Blame History