diff --git a/.cursor/skills/test-run/SKILL.md b/.cursor/skills/test-run/SKILL.md index 0bd22d6..adec4c6 100644 --- a/.cursor/skills/test-run/SKILL.md +++ b/.cursor/skills/test-run/SKILL.md @@ -21,8 +21,8 @@ Run the project's test suite and report results. This skill is invoked by the au Check in order — first match wins: -1. `scripts/run-tests.sh` exists → use it -2. `docker-compose.test.yml` or equivalent test environment exists → spin it up first, then detect runner below +1. `scripts/run-tests.sh` exists → use it (the script already encodes the correct execution strategy) +2. `docker-compose.test.yml` exists → run the Docker Suitability Check (see below). Docker is preferred; use it unless hardware constraints prevent it. 3. Auto-detect from project files: - `pytest.ini`, `pyproject.toml` with `[tool.pytest]`, or `conftest.py` → `pytest` - `*.csproj` or `*.sln` → `dotnet test` @@ -32,6 +32,37 @@ Check in order — first match wins: If no runner detected → report failure and ask user to specify. +#### Docker Suitability Check + +Docker is the preferred test environment. Before using it, verify no constraints prevent easy Docker execution: + +1. Check `_docs/02_document/tests/environment.md` for a "Test Execution" decision (if the test-spec skill already assessed this, follow that decision) +2. If no prior decision exists, check for disqualifying factors: + - Hardware bindings: GPU, MPS, CUDA, TPU, FPGA, sensors, cameras, serial devices, host-level drivers + - Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs + - Data/volume constraints: large files (> 100MB) impractical to copy into a container + - Network/environment: host networking, VPN, specific DNS/firewall rules + - Performance: Docker overhead would invalidate benchmarks or latency measurements +3. If any disqualifying factor found → fall back to local test runner. Present to user using Choose format: + +``` +══════════════════════════════════════ + DECISION REQUIRED: Docker is preferred but factors + preventing easy Docker execution detected +══════════════════════════════════════ + Factors detected: + - [list factors] +══════════════════════════════════════ + A) Run tests locally (recommended) + B) Run tests in Docker anyway +══════════════════════════════════════ + Recommendation: A — detected constraints prevent + easy Docker execution +══════════════════════════════════════ +``` + +4. If no disqualifying factors → use Docker (preferred default) + ### 2. Run Tests 1. Execute the detected test runner diff --git a/.cursor/skills/test-spec/SKILL.md b/.cursor/skills/test-spec/SKILL.md index 73f3890..cdd6a7a 100644 --- a/.cursor/skills/test-spec/SKILL.md +++ b/.cursor/skills/test-spec/SKILL.md @@ -209,7 +209,7 @@ Based on all acquired data, acceptance_criteria, and restrictions, form detailed - [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md` - [ ] Positive and negative scenarios are balanced - [ ] Consumer app has no direct access to system internals -- [ ] Docker environment is self-contained (`docker compose up` sufficient) +- [ ] Test environment matches project constraints (see Docker Suitability Assessment below) - [ ] External dependencies have mock/stub services defined - [ ] Traceability matrix has no uncovered AC or restrictions @@ -337,11 +337,51 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab --- +### Docker Suitability Assessment (BLOCKING — runs before Phase 4) + +Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). Before generating scripts, check whether the project has any constraints that prevent easy Docker usage. + +**Disqualifying factors** (any one is sufficient to fall back to local): +- Hardware bindings: GPU, MPS, TPU, FPGA, accelerators, sensors, cameras, serial devices, host-level drivers (CUDA, Metal, OpenCL, etc.) +- Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs not installable in a container +- Data/volume constraints: large files (> 100MB) that would be impractical to copy into a container, databases that must run on the host +- Network/environment: tests that require host networking, VPN access, or specific DNS/firewall rules +- Performance: Docker overhead would invalidate benchmarks or latency-sensitive measurements + +**Assessment steps**: +1. Scan project source, config files, and dependencies for indicators of the factors above +2. Check `TESTS_OUTPUT_DIR/environment.md` for environment requirements +3. Check `_docs/00_problem/restrictions.md` and `_docs/01_solution/solution.md` for constraints + +**Decision**: +- If ANY disqualifying factor is found → recommend **local test execution** as fallback. Present to user using Choose format: + +``` +══════════════════════════════════════ + DECISION REQUIRED: Test execution environment +══════════════════════════════════════ + Docker is preferred, but factors preventing easy + Docker execution detected: + - [list factors found] +══════════════════════════════════════ + A) Local execution (recommended) + B) Docker execution (constraints may cause issues) +══════════════════════════════════════ + Recommendation: A — detected constraints prevent + easy Docker execution +══════════════════════════════════════ +``` + +- If NO disqualifying factors → use Docker (preferred default) +- Record the decision in `TESTS_OUTPUT_DIR/environment.md` under a "Test Execution" section + +--- + ### Phase 4: Test Runner Script Generation **Role**: DevOps engineer **Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently. -**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. +**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Docker Suitability Assessment decision above. #### Step 1 — Detect test infrastructure @@ -350,7 +390,9 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab - .NET: `dotnet test` (*.csproj, *.sln) - Rust: `cargo test` (Cargo.toml) - Node: `npm test` or `vitest` / `jest` (package.json) -2. Identify docker-compose files for integration/blackbox tests (`docker-compose.test.yml`, `e2e/docker-compose*.yml`) +2. Check Docker Suitability Assessment result: + - If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host + - If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests 3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks) 4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements @@ -360,10 +402,11 @@ Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spe 1. Set `set -euo pipefail` and trap cleanup on EXIT 2. Optionally accept a `--unit-only` flag to skip blackbox tests -3. Run unit tests using the detected test runner -4. If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down -5. Print a summary of passed/failed/skipped tests -6. Exit 0 on all pass, exit 1 on any failure +3. Run unit/blackbox tests using the detected test runner: + - **Local mode**: activate virtualenv (if present), run test runner directly on host + - **Docker mode**: spin up docker-compose environment, wait for health checks, run test suite, tear down +4. Print a summary of passed/failed/skipped tests +5. Exit 0 on all pass, exit 1 on any failure #### Step 3 — Generate `scripts/run-performance-tests.sh` @@ -371,7 +414,7 @@ Create `scripts/run-performance-tests.sh` at the project root. The script must: 1. Set `set -euo pipefail` and trap cleanup on EXIT 2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args) -3. Spin up the system under test (docker-compose or local) +3. Start the system under test (local or docker-compose, matching the Docker Suitability Assessment decision) 4. Run load/performance scenarios using the detected tool 5. Compare results against threshold values from the test spec 6. Print a pass/fail summary per scenario diff --git a/_docs/_autopilot_state.md b/_docs/_autopilot_state.md index 01e7b1d..5bd50d2 100644 --- a/_docs/_autopilot_state.md +++ b/_docs/_autopilot_state.md @@ -4,6 +4,6 @@ flow: existing-code step: 6 name: Run Tests -status: failed +status: in_progress sub_step: 0 -retry_count: 0 +retry_count: 1