From cfe3d357f45de5079b3664123422d2de63ebd393 Mon Sep 17 00:00:00 2001 From: Oleksandr Bezdieniezhnykh Date: Wed, 13 May 2026 22:51:48 +0300 Subject: [PATCH] [meta] Forbid per-batch full-suite test runs under implement skill Root cause: I ran the full unit suite at the end of every autodev batch despite implement/SKILL.md already saying that is forbidden (lines 33, 136, 145, 372). The skill's existing rules were buried mid-document; coderule.mdc's general "run full suite when done" overrode them in practice because each batch felt like a "done" point. Two targeted clarifications: - .cursor/rules/coderule.mdc: add an Iterative-Skill Exception bullet stating that when an iterative loop skill (autodev / implement batch loop, refactor batch loop) is active, the skill governs full-suite cadence and "done with changes" means done with the implementation phase, not done with one batch. - .cursor/skills/implement/SKILL.md: hoist the per-batch / per- task / Step-16 cadence rule into a top-of-file "READ FIRST, EVERY BATCH" banner with an explicit anti-pattern check ("if you catch yourself about to run pytest tests/ at end of batch, STOP"). Co-authored-by: Cursor --- .cursor/rules/coderule.mdc | 1 + .cursor/skills/implement/SKILL.md | 26 ++++++++++++++++++++++---- 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/.cursor/rules/coderule.mdc b/.cursor/rules/coderule.mdc index 46542a5..2e7a6b8 100644 --- a/.cursor/rules/coderule.mdc +++ b/.cursor/rules/coderule.mdc @@ -39,6 +39,7 @@ alwaysApply: true - When you think you are done with changes, run the full test suite. Every failure in tests that cover code you modified or that depend on code you modified is a **blocking gate**. For pre-existing failures in unrelated areas, report them to the user but do not block on them. Never silently ignore or skip a failure without reporting it. On any blocking failure, stop and ask the user to choose one of: - **Investigate and fix** the failing test or source code - **Remove the test** if it is obsolete or no longer relevant +- **Iterative-skill exception**: when an iterative loop skill is active (e.g. autodev / `implement/SKILL.md` batch loop, `refactor/SKILL.md` batch loop), the skill governs full-suite cadence — typically focused tests per task/batch and a single full-suite gate at the very end of the implementation phase, NOT after each batch. "Done with changes" means done with the entire implementation phase the skill is running, not done with one batch. Do not run the full suite per batch unless the skill explicitly says to. - Do not rename any databases or tables or table columns without confirmation. Avoid such renaming if possible. - Make sure we don't commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task diff --git a/.cursor/skills/implement/SKILL.md b/.cursor/skills/implement/SKILL.md index 93d7c67..6e36027 100644 --- a/.cursor/skills/implement/SKILL.md +++ b/.cursor/skills/implement/SKILL.md @@ -19,6 +19,20 @@ Implement all tasks produced by the `/decompose` skill. This skill reads task sp For each task the main agent receives a task spec, analyzes the codebase, implements the feature, writes tests, and verifies acceptance criteria — then moves on to the next task. +## Test-Run Cadence (READ FIRST, EVERY BATCH) + +Under this skill, full test suite runs happen **exactly once**, at Step 16 (Final Test Run), after every batch in the implementation phase has been committed. Per-batch and per-task full-suite runs are **forbidden**. + +What runs when: + +- **Per task** (Step 6.4): focused tests for the task's changed files + sibling tests in the same component directory. Nothing more. +- **Per batch** (Step 7): nothing. Aggregate per-task statuses; do not re-run anything. Cumulative cross-batch regressions are caught by Step 14.5 (Cumulative Code Review), not by a per-batch full-suite invocation. +- **End of implementation phase** (Step 16): full test suite, exactly once. + +This overrides the general `coderule.mdc` rule "run the full test suite when you think you are done with changes". Inside this skill, "done with changes" means the implementation phase is finished (Step 16), not "this batch is finished". + +If you catch yourself about to run `pytest tests/unit/` (or equivalent full-suite command) at the end of a batch, STOP — that is the failure mode this banner exists to prevent. Run only the focused subset that touches this batch's changed files. + ## Core Principles - **Sequential execution**: implement one task at a time. Do NOT spawn subagents and do NOT run tasks in parallel. (See `.cursor/rules/no-subagents.mdc`.) @@ -30,6 +44,7 @@ For each task the main agent receives a task spec, analyzes the codebase, implem - **Auto-start**: batches start immediately — no user confirmation before a batch - **Gate on failure**: user confirmation is required only when code review returns FAIL - **Commit per batch**: after each batch is confirmed, commit. Ask the user whether to push to remote unless the user previously opted into auto-push for this session. +- **Focused test runs per task, full-suite gate once at the end**: each task runs only the focused tests for its changed files (see Step 6.4). The full unit/test suite is invoked exactly once, at Step 16 (Final Test Run), after every batch in the implementation phase has been committed. Per-batch full-suite runs are forbidden — they burn time without catching anything the focused runs + cumulative review (Step 14.5) do not already cover. ## Context Resolution @@ -132,8 +147,8 @@ For each task in the batch **in topological order, one at a time**: 1. Read the task spec file. 2. Respect the file-ownership envelope computed in Step 4 (OWNED / READ-ONLY / FORBIDDEN). 3. Implement the feature and write/update tests for every acceptance criterion in the spec. Tests for internal product behavior must exercise the production implementation path. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still exist and skip/block with a clear prerequisite reason, but that skip does not make missing production code complete. -4. Run the relevant tests locally before moving on to the next task in the batch. If tests fail, fix in-place — do not defer. -5. Capture a short per-task status line (files changed, tests pass/fail, any blockers) for the batch report. +4. Run **focused tests only** — the new/changed test files for THIS task, plus any sibling test files in the same component directory that exercise files the task modified. Do **NOT** run the full unit suite here, and do **NOT** run it at the end of the batch. The full-suite gate runs once at Step 16 (end of the whole implementation phase). If focused tests fail, fix in-place — do not defer. +5. Capture a short per-task status line (files changed, focused-test pass/fail counts, any blockers) for the batch report. Do NOT spawn subagents and do NOT attempt to implement two tasks simultaneously, even if they touch disjoint files. See `.cursor/rules/no-subagents.mdc`. @@ -141,6 +156,7 @@ Do NOT spawn subagents and do NOT attempt to implement two tasks simultaneously, - After all tasks in the batch are finished, aggregate the per-task status lines into a structured batch status. - If any task reported "Blocked", log the blocker with the failing task's ID and continue — the batch report will surface it. +- **Do NOT run the full unit/test suite here.** The full-suite gate runs exactly once at Step 16 (end of implementation phase). Running it per-batch on a large project burns time without catching anything the focused per-task runs missed — cross-batch regressions are caught by Step 14.5 (Cumulative Code Review) and Step 16, not by repeated full-suite invocations. **Stuck detection** — while implementing a task, watch for these signals in your own progress: - The same file has been rewritten 3+ times without tests going green → stop, mark the task Blocked, and move to the next task in the batch (the user will be asked at the end of the batch). @@ -296,7 +312,9 @@ Remediation task creation: ### 16. Final Test Run -- After all batches are complete, run the full test suite once unless the invoking flow's immediate next step is `Run Tests`. +This is the **only** full-suite invocation in the implement skill. Steps 6 / 7 / 14 deliberately avoid full-suite runs — they would burn time per-batch without catching anything the focused per-task runs and the cumulative review (Step 14.5) do not already cover. + +- After all batches are complete, run the full test suite **once** unless the invoking flow's immediate next step is `Run Tests`. - If the next flow step is `Run Tests`, record a handoff in the final implementation report and let `.cursor/skills/test-run/SKILL.md` own the full-suite gate to avoid duplicate full runs. - When this step does run, read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices). - Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision. @@ -365,4 +383,4 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed: - Never run tasks in parallel and never spawn subagents — see `.cursor/rules/no-subagents.mdc` - If a task is flagged as stuck, stop working on it and report — do not let it loop indefinitely - Always run the Product Implementation Completeness Gate before final product reports -- Always run or hand off the full test suite after all batches complete (step 16) +- Always run or hand off the full test suite after all batches complete (step 16) — and only at step 16. Per-task runs are focused (changed files + sibling component tests); per-batch runs are forbidden.