mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 11:36:36 +00:00
Enhance Docker suitability assessment and update test execution guidelines
- Updated SKILL.md for test-run to clarify the use of Docker and added a Docker Suitability Check to ensure proper execution environment. - Revised SKILL.md for test-spec to include a detailed Docker Suitability Assessment, outlining disqualifying factors and decision-making steps for test execution. - Adjusted the autopilot state to reflect the current status of test execution as in_progress. These changes improve the clarity and reliability of the testing process, ensuring that users are informed of potential constraints when using Docker.
This commit is contained in:
@@ -21,8 +21,8 @@ Run the project's test suite and report results. This skill is invoked by the au
|
|||||||
|
|
||||||
Check in order — first match wins:
|
Check in order — first match wins:
|
||||||
|
|
||||||
1. `scripts/run-tests.sh` exists → use it
|
1. `scripts/run-tests.sh` exists → use it (the script already encodes the correct execution strategy)
|
||||||
2. `docker-compose.test.yml` or equivalent test environment exists → spin it up first, then detect runner below
|
2. `docker-compose.test.yml` exists → run the Docker Suitability Check (see below). Docker is preferred; use it unless hardware constraints prevent it.
|
||||||
3. Auto-detect from project files:
|
3. Auto-detect from project files:
|
||||||
- `pytest.ini`, `pyproject.toml` with `[tool.pytest]`, or `conftest.py` → `pytest`
|
- `pytest.ini`, `pyproject.toml` with `[tool.pytest]`, or `conftest.py` → `pytest`
|
||||||
- `*.csproj` or `*.sln` → `dotnet test`
|
- `*.csproj` or `*.sln` → `dotnet test`
|
||||||
@@ -32,6 +32,37 @@ Check in order — first match wins:
|
|||||||
|
|
||||||
If no runner detected → report failure and ask user to specify.
|
If no runner detected → report failure and ask user to specify.
|
||||||
|
|
||||||
|
#### Docker Suitability Check
|
||||||
|
|
||||||
|
Docker is the preferred test environment. Before using it, verify no constraints prevent easy Docker execution:
|
||||||
|
|
||||||
|
1. Check `_docs/02_document/tests/environment.md` for a "Test Execution" decision (if the test-spec skill already assessed this, follow that decision)
|
||||||
|
2. If no prior decision exists, check for disqualifying factors:
|
||||||
|
- Hardware bindings: GPU, MPS, CUDA, TPU, FPGA, sensors, cameras, serial devices, host-level drivers
|
||||||
|
- Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs
|
||||||
|
- Data/volume constraints: large files (> 100MB) impractical to copy into a container
|
||||||
|
- Network/environment: host networking, VPN, specific DNS/firewall rules
|
||||||
|
- Performance: Docker overhead would invalidate benchmarks or latency measurements
|
||||||
|
3. If any disqualifying factor found → fall back to local test runner. Present to user using Choose format:
|
||||||
|
|
||||||
|
```
|
||||||
|
══════════════════════════════════════
|
||||||
|
DECISION REQUIRED: Docker is preferred but factors
|
||||||
|
preventing easy Docker execution detected
|
||||||
|
══════════════════════════════════════
|
||||||
|
Factors detected:
|
||||||
|
- [list factors]
|
||||||
|
══════════════════════════════════════
|
||||||
|
A) Run tests locally (recommended)
|
||||||
|
B) Run tests in Docker anyway
|
||||||
|
══════════════════════════════════════
|
||||||
|
Recommendation: A — detected constraints prevent
|
||||||
|
easy Docker execution
|
||||||
|
══════════════════════════════════════
|
||||||
|
```
|
||||||
|
|
||||||
|
4. If no disqualifying factors → use Docker (preferred default)
|
||||||
|
|
||||||
### 2. Run Tests
|
### 2. Run Tests
|
||||||
|
|
||||||
1. Execute the detected test runner
|
1. Execute the detected test runner
|
||||||
|
|||||||
@@ -209,7 +209,7 @@ Based on all acquired data, acceptance_criteria, and restrictions, form detailed
|
|||||||
- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
|
- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
|
||||||
- [ ] Positive and negative scenarios are balanced
|
- [ ] Positive and negative scenarios are balanced
|
||||||
- [ ] Consumer app has no direct access to system internals
|
- [ ] Consumer app has no direct access to system internals
|
||||||
- [ ] Docker environment is self-contained (`docker compose up` sufficient)
|
- [ ] Test environment matches project constraints (see Docker Suitability Assessment below)
|
||||||
- [ ] External dependencies have mock/stub services defined
|
- [ ] External dependencies have mock/stub services defined
|
||||||
- [ ] Traceability matrix has no uncovered AC or restrictions
|
- [ ] Traceability matrix has no uncovered AC or restrictions
|
||||||
|
|
||||||
@@ -337,11 +337,51 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
### Docker Suitability Assessment (BLOCKING — runs before Phase 4)
|
||||||
|
|
||||||
|
Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). Before generating scripts, check whether the project has any constraints that prevent easy Docker usage.
|
||||||
|
|
||||||
|
**Disqualifying factors** (any one is sufficient to fall back to local):
|
||||||
|
- Hardware bindings: GPU, MPS, TPU, FPGA, accelerators, sensors, cameras, serial devices, host-level drivers (CUDA, Metal, OpenCL, etc.)
|
||||||
|
- Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs not installable in a container
|
||||||
|
- Data/volume constraints: large files (> 100MB) that would be impractical to copy into a container, databases that must run on the host
|
||||||
|
- Network/environment: tests that require host networking, VPN access, or specific DNS/firewall rules
|
||||||
|
- Performance: Docker overhead would invalidate benchmarks or latency-sensitive measurements
|
||||||
|
|
||||||
|
**Assessment steps**:
|
||||||
|
1. Scan project source, config files, and dependencies for indicators of the factors above
|
||||||
|
2. Check `TESTS_OUTPUT_DIR/environment.md` for environment requirements
|
||||||
|
3. Check `_docs/00_problem/restrictions.md` and `_docs/01_solution/solution.md` for constraints
|
||||||
|
|
||||||
|
**Decision**:
|
||||||
|
- If ANY disqualifying factor is found → recommend **local test execution** as fallback. Present to user using Choose format:
|
||||||
|
|
||||||
|
```
|
||||||
|
══════════════════════════════════════
|
||||||
|
DECISION REQUIRED: Test execution environment
|
||||||
|
══════════════════════════════════════
|
||||||
|
Docker is preferred, but factors preventing easy
|
||||||
|
Docker execution detected:
|
||||||
|
- [list factors found]
|
||||||
|
══════════════════════════════════════
|
||||||
|
A) Local execution (recommended)
|
||||||
|
B) Docker execution (constraints may cause issues)
|
||||||
|
══════════════════════════════════════
|
||||||
|
Recommendation: A — detected constraints prevent
|
||||||
|
easy Docker execution
|
||||||
|
══════════════════════════════════════
|
||||||
|
```
|
||||||
|
|
||||||
|
- If NO disqualifying factors → use Docker (preferred default)
|
||||||
|
- Record the decision in `TESTS_OUTPUT_DIR/environment.md` under a "Test Execution" section
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
### Phase 4: Test Runner Script Generation
|
### Phase 4: Test Runner Script Generation
|
||||||
|
|
||||||
**Role**: DevOps engineer
|
**Role**: DevOps engineer
|
||||||
**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
|
**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
|
||||||
**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure.
|
**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Docker Suitability Assessment decision above.
|
||||||
|
|
||||||
#### Step 1 — Detect test infrastructure
|
#### Step 1 — Detect test infrastructure
|
||||||
|
|
||||||
@@ -350,7 +390,9 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab
|
|||||||
- .NET: `dotnet test` (*.csproj, *.sln)
|
- .NET: `dotnet test` (*.csproj, *.sln)
|
||||||
- Rust: `cargo test` (Cargo.toml)
|
- Rust: `cargo test` (Cargo.toml)
|
||||||
- Node: `npm test` or `vitest` / `jest` (package.json)
|
- Node: `npm test` or `vitest` / `jest` (package.json)
|
||||||
2. Identify docker-compose files for integration/blackbox tests (`docker-compose.test.yml`, `e2e/docker-compose*.yml`)
|
2. Check Docker Suitability Assessment result:
|
||||||
|
- If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
|
||||||
|
- If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
|
||||||
3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
|
3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
|
||||||
4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
|
4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
|
||||||
|
|
||||||
@@ -360,10 +402,11 @@ Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spe
|
|||||||
|
|
||||||
1. Set `set -euo pipefail` and trap cleanup on EXIT
|
1. Set `set -euo pipefail` and trap cleanup on EXIT
|
||||||
2. Optionally accept a `--unit-only` flag to skip blackbox tests
|
2. Optionally accept a `--unit-only` flag to skip blackbox tests
|
||||||
3. Run unit tests using the detected test runner
|
3. Run unit/blackbox tests using the detected test runner:
|
||||||
4. If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down
|
- **Local mode**: activate virtualenv (if present), run test runner directly on host
|
||||||
5. Print a summary of passed/failed/skipped tests
|
- **Docker mode**: spin up docker-compose environment, wait for health checks, run test suite, tear down
|
||||||
6. Exit 0 on all pass, exit 1 on any failure
|
4. Print a summary of passed/failed/skipped tests
|
||||||
|
5. Exit 0 on all pass, exit 1 on any failure
|
||||||
|
|
||||||
#### Step 3 — Generate `scripts/run-performance-tests.sh`
|
#### Step 3 — Generate `scripts/run-performance-tests.sh`
|
||||||
|
|
||||||
@@ -371,7 +414,7 @@ Create `scripts/run-performance-tests.sh` at the project root. The script must:
|
|||||||
|
|
||||||
1. Set `set -euo pipefail` and trap cleanup on EXIT
|
1. Set `set -euo pipefail` and trap cleanup on EXIT
|
||||||
2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
|
2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
|
||||||
3. Spin up the system under test (docker-compose or local)
|
3. Start the system under test (local or docker-compose, matching the Docker Suitability Assessment decision)
|
||||||
4. Run load/performance scenarios using the detected tool
|
4. Run load/performance scenarios using the detected tool
|
||||||
5. Compare results against threshold values from the test spec
|
5. Compare results against threshold values from the test spec
|
||||||
6. Print a pass/fail summary per scenario
|
6. Print a pass/fail summary per scenario
|
||||||
|
|||||||
@@ -4,6 +4,6 @@
|
|||||||
flow: existing-code
|
flow: existing-code
|
||||||
step: 6
|
step: 6
|
||||||
name: Run Tests
|
name: Run Tests
|
||||||
status: failed
|
status: in_progress
|
||||||
sub_step: 0
|
sub_step: 0
|
||||||
retry_count: 0
|
retry_count: 1
|
||||||
|
|||||||
Reference in New Issue
Block a user