Enhance autopilot documentation and workflows: Add assumptions regarding single project per workspace, update notification sound references, and introduce context budget heuristics for managing session limits. Revise various skill documents to reflect changes in task management, including ticketing and testing processes, ensuring clarity and consistency across the system.

2026-06-22 18:51:09 +00:00 · 2026-03-24 05:56:12 +02:00
parent 749217bbb6
commit a5fc4fe073
14 changed files with 341 additions and 57 deletions
@@ -5,8 +5,8 @@ description: |
  then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits)
  that treat the system as a black box. Every test pairs input data with quantifiable expected results
  so tests can verify correctness, not just execution.
-  3-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate.
-  Produces 8 artifacts under tests/.
+  4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate,
+  test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/.
  Trigger phrases:
  - "test spec", "test specification", "test scenarios"
  - "blackbox test spec", "black box tests", "blackbox tests"
@@ -133,6 +133,8 @@ TESTS_OUTPUT_DIR/
 | Phase 3 | Updated test data spec (if data added) | `test-data.md` |
 | Phase 3 | Updated test files (if tests removed) | respective test file |
 | Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
+| Phase 4 | Test runner script | `scripts/run-tests.sh` |
+| Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` |

 ### Resumability

@@ -335,6 +337,56 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab

 ---

+### Phase 4: Test Runner Script Generation
+
+**Role**: DevOps engineer
+**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
+**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure.
+
+#### Step 1 — Detect test infrastructure
+
+1. Identify the project's test runner from manifests and config files:
+   - Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini)
+   - .NET: `dotnet test` (*.csproj, *.sln)
+   - Rust: `cargo test` (Cargo.toml)
+   - Node: `npm test` or `vitest` / `jest` (package.json)
+2. Identify docker-compose files for integration/blackbox tests (`docker-compose.test.yml`, `e2e/docker-compose*.yml`)
+3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
+4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
+
+#### Step 2 — Generate `scripts/run-tests.sh`
+
+Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
+
+1. Set `set -euo pipefail` and trap cleanup on EXIT
+2. Optionally accept a `--unit-only` flag to skip blackbox tests
+3. Run unit tests using the detected test runner
+4. If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down
+5. Print a summary of passed/failed/skipped tests
+6. Exit 0 on all pass, exit 1 on any failure
+
+#### Step 3 — Generate `scripts/run-performance-tests.sh`
+
+Create `scripts/run-performance-tests.sh` at the project root. The script must:
+
+1. Set `set -euo pipefail` and trap cleanup on EXIT
+2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
+3. Spin up the system under test (docker-compose or local)
+4. Run load/performance scenarios using the detected tool
+5. Compare results against threshold values from the test spec
+6. Print a pass/fail summary per scenario
+7. Exit 0 if all thresholds met, exit 1 otherwise
+
+#### Step 4 — Verify scripts
+
+1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
+2. Mark both scripts as executable (`chmod +x`)
+3. Present a summary of what each script does to the user
+
+**Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
+
+---
+
 ## Escalation Rules

 | Situation | Action |
@@ -373,7 +425,7 @@ When the user wants to:

 ```
 ┌──────────────────────────────────────────────────────────────────────┐
-│              Test Scenario Specification (3-Phase)                    │
+│              Test Scenario Specification (4-Phase)                   │
 ├──────────────────────────────────────────────────────────────────────┤
 │ PREREQ: Data Gate (BLOCKING)                                         │
 │   → verify AC, restrictions, input_data (incl. expected_results.md)  │
@@ -397,15 +449,21 @@ When the user wants to:
 │                                                                      │
 │ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)    │
 │   → build test-data + expected-result requirements checklist         │
-│   → ask user: provide data+result (A) or remove test (B)            │
+│   → ask user: provide data+result (A) or remove test (B)             │
 │   → validate input data (quality + quantity)                         │
 │   → validate expected results (quantifiable + comparison method)     │
 │   → remove tests without data or expected result, warn user          │
-│   → final coverage check (≥70% or FAIL + loop back)                 │
-│   [BLOCKING: coverage ≥ 70% required to pass]                       │
+│   → final coverage check (≥70% or FAIL + loop back)                  │
+│   [BLOCKING: coverage ≥ 70% required to pass]                        │
+│                                                                      │
+│ Phase 4: Test Runner Script Generation                               │
+│   → detect test runner + docker-compose + load tool                  │
+│   → scripts/run-tests.sh (unit + blackbox)                           │
+│   → scripts/run-performance-tests.sh (load/perf scenarios)           │
+│   → verify scripts are valid and executable                          │
 ├──────────────────────────────────────────────────────────────────────┤
 │ Principles: Black-box only · Traceability · Save immediately         │
 │             Ask don't assume · Spec don't code                       │
-│             No test without data · No test without expected result    │
+│             No test without data · No test without expected result   │
 └──────────────────────────────────────────────────────────────────────┘
 ```