Sync .cursor from suite (autodev orchestrator + monorepo skills)

2026-06-22 11:11:19 +00:00 · 2026-04-18 22:04:31 +03:00
parent f46531fc6d
commit 5b0e53fbdb
60 changed files with 4232 additions and 1728 deletions
@@ -13,9 +13,24 @@ disable-model-invocation: true

 # Test Run

-Run the project's test suite and report results. This skill is invoked by the autopilot at verification checkpoints — after implementing tests, after implementing features, or at any point where the test suite must pass before proceeding.
+Run the project's test suite and report results. This skill is invoked by the autodev at verification checkpoints — after implementing tests, after implementing features, before deploy — or any point where a test suite must pass before proceeding.

-## Workflow
+## Modes
+
+test-run has two modes. The caller passes the mode explicitly; if missing, default to `functional`.
+
+| Mode | Scope | Typical caller | Input artifacts |
+|------|-------|---------------|-----------------|
+| `functional` (default) | Unit / integration / blackbox tests — correctness | autodev Steps that verify after Implement Tests or Implement | `scripts/run-tests.sh`, `_docs/02_document/tests/environment.md`, `_docs/02_document/tests/blackbox-tests.md` |
+| `perf` | Performance / load / stress / soak tests — latency, throughput, error-rate thresholds | autodev greenfield Step 9, existing-code Step 15 (pre-deploy) | `scripts/run-performance-tests.sh`, `_docs/02_document/tests/performance-tests.md`, AC thresholds in `_docs/00_problem/acceptance_criteria.md` |
+
+Direct user invocation (`/test-run`) defaults to `functional`. If the user says "perf tests", "load test", "performance", or passes a performance scenarios file, run `perf` mode.
+
+After selecting a mode, read its corresponding workflow below; do not mix them.
+
+---
+
+## Functional Mode

 ### 1. Detect Test Runner

@@ -79,7 +94,7 @@ Categorize skips as: **explicit skip (dead code)**, **runtime skip (unreachable)

 ### 5. Handle Outcome

-**All tests pass, zero skipped** → return success to the autopilot for auto-chain.
+**All tests pass, zero skipped** → return success to the autodev for auto-chain.

 **Any test fails or errors** → this is a **blocking gate**. Never silently ignore failures. **Always investigate the root cause before deciding on an action.** Read the failing test code, read the error output, check service logs if applicable, and determine whether the bug is in the test or in the production code.

@@ -100,7 +115,7 @@ After investigating, present:
 ```

 - If user picks A → apply fixes, then re-run (loop back to step 2)
- If user picks B → return failure to the autopilot
+- If user picks B → return failure to the autodev

 **Any skipped test** → classify as legitimate or illegitimate before deciding whether to block.

@@ -159,6 +174,102 @@ After investigating, present findings:

 Only option B allows proceeding with skips, and it requires explicit user approval with documented justification for each skip.

+---
+
+## Perf Mode
+
+Performance tests differ from functional tests in what they measure (latency / throughput / error-rate distributions, not pass/fail of a single assertion) and when they run (once before deploy, not per batch). The mode reuses the same orchestration shape (detect → run → report → gate on outcome) but with perf-specific tool detection and threshold comparison.
+
+### 1. Detect Perf Runner
+
+Check in order — first match wins:
+
+1. `scripts/run-performance-tests.sh` exists (generated by `test-spec` Phase 4) → use it; the script already encodes the correct load profile and tool invocation.
+2. `_docs/02_document/tests/performance-tests.md` exists → read the scenarios, then auto-detect a load-testing tool:
+   - `k6` binary available → prefer k6 (scriptable, good default reporting)
+   - `locust` in project deps or installed → locust
+   - `artillery` in `package.json` or installed globally → artillery
+   - `wrk` binary available → wrk (simple HTTP only; use only if scenarios are HTTP GET/POST)
+   - Language-native benchmark harness (`cargo bench`, `go test -bench`, `pytest-benchmark`) → use when scenarios are CPU-bound or in-process
+3. No runner and no scenarios spec → STOP and ask the user to either run `/test-spec` first (to produce `performance-tests.md` + the runner script) or supply a runner script manually. Do not improvise perf tests from scratch.
+
+### 2. Run
+
+Execute the detected runner against the target system. Capture per-scenario metrics:
+
+- Latency percentiles: p50, p95, p99 (and p999 if load volume permits)
+- Throughput: requests/sec or operations/sec
+- Error rate: failed / total
+- Duration: actual run time (for soak/ramp scenarios)
+- Resource usage if the scenarios call for it (CPU%, RSS, GPU utilization)
+
+Tear down any environment the runner spun up after metrics are captured.
+
+### 3. Compare Against Thresholds
+
+Load thresholds in this order:
+
+1. Per-scenario expected results from `_docs/02_document/tests/performance-tests.md`
+2. Project-wide thresholds from `_docs/00_problem/acceptance_criteria.md` (latency / throughput lines)
+3. Fallback: no threshold → record the measurement but classify the scenario as **Unverified** (not pass/fail)
+
+Classify each scenario:
+
+- **Pass** — all thresholds met
+- **Warn** — within 10% of any threshold (e.g., p95 = 460ms against a 500ms threshold)
+- **Fail** — any threshold violated
+- **Unverified** — no threshold to compare against
+
+### 4. Report
+
+```
+══════════════════════════════════════
+ PERF RESULTS
+══════════════════════════════════════
+ Scenarios: [pass N · warn M · fail K · unverified U]
+──────────────────────────────────────
+ 1. [scenario_name] — [Pass/Warn/Fail/Unverified]
+    p50 = [x]ms · p95 = [y]ms · p99 = [z]ms
+    throughput = [r] rps · errors = [e]%
+    threshold: [criterion and verdict detail]
+ 2. ...
+══════════════════════════════════════
+```
+
+Persist the full report to `_docs/06_metrics/perf_<YYYY-MM-DD>_<run_label>.md` for trend tracking across cycles.
+
+### 5. Handle Outcome
+
+**All scenarios Pass (or Pass + Unverified only)** → return success to the caller.
+
+**Any Warn or Fail** → this is a **blocking gate**. Investigate before deciding — read the runner output, check if the warn-band was historically stable, rule out transient infrastructure noise (always worth one re-run before declaring a regression).
+
+After investigating, present:
+
+```
+══════════════════════════════════════
+ PERF GATE: [summary]
+══════════════════════════════════════
+ Failing / warning scenarios:
+  1. [scenario] — [metric] = [observed] vs threshold [threshold]
+     likely cause: [1-line diagnosis]
+══════════════════════════════════════
+ A) Fix and re-run (investigate and address the regression)
+ B) Proceed anyway (accept the regression — requires written justification
+    recorded in the perf report)
+ C) Abort — investigate offline
+══════════════════════════════════════
+ Recommendation: A — perf regressions caught pre-deploy
+ are orders of magnitude cheaper to fix than post-deploy
+══════════════════════════════════════
+```
+
+- User picks A → apply fixes, re-run (back to step 2).
+- User picks B → append the justification to the perf report; return success to the caller.
+- User picks C → return failure to the caller.
+
+**Any Unverified scenarios with no Warn/Fail** → not blocking, but surface them in the report so the user knows coverage gaps exist. Suggest running `/test-spec` to add expected results next cycle.
+
 ## Trigger Conditions

-This skill is invoked by the autopilot at test verification checkpoints. It is not typically invoked directly by the user.
+This skill is invoked by the autodev at test verification checkpoints. It is not typically invoked directly by the user. When invoked directly, select the mode from the user's phrasing ("run tests" → functional; "load test" / "perf test" → perf).