Enhance coding guidelines and autopilot workflows

- Updated `.cursor/rules/coderule.mdc` to include new guidelines on maintaining test environments and avoiding hardcoded workarounds. - Revised state file rules in `.cursor/skills/autopilot/state.md` to ensure comprehensive updates after every meaningful state transition. - Improved existing-code workflow in `.cursor/skills/autopilot/flows/existing-code.md` to automate task re-entry without user confirmation. - Added requirements for test coverage in the implementation process within `.cursor/skills/implement/SKILL.md`, ensuring all acceptance criteria are validated by tests. - Enhanced new-task skill documentation to include test coverage gap analysis, ensuring all new requirements are covered by tests. These changes aim to strengthen project maintainability, improve testing practices, and streamline workflows.
2026-06-22 10:21:08 +00:00 · 2026-03-29 05:30:00 +03:00
parent d10d542e0c
commit ad5530b9ef
8 changed files with 110 additions and 32 deletions
@@ -94,6 +94,7 @@ For each task in the batch, launch an `implementer` subagent with:
 - List of files OWNED (exclusive write access)
 - List of files READ-ONLY
 - List of files FORBIDDEN
+- **Explicit instruction**: the implementer must write or update tests that validate each acceptance criterion in the task spec. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still be written and skip with a clear reason.

 Launch all subagents immediately — no user confirmation.

@@ -108,32 +109,44 @@ Launch all subagents immediately — no user confirmation.
 - Subagent has not produced new output for an extended period → flag as potentially hung
 - If a subagent is flagged as stuck, do NOT let it continue looping — stop it and record the blocker in the batch report

-### 8. Code Review
+### 8. AC Test Coverage Verification
+
+Before code review, verify that every acceptance criterion in each task spec has at least one test that validates it. For each task in the batch:
+
+1. Read the task spec's **Acceptance Criteria** section
+2. Search the test files (new and existing) for tests that cover each AC
+3. Classify each AC as:
+   - **Covered**: a test directly validates this AC (running or skipped-with-reason)
+   - **Not covered**: no test exists for this AC
+
+If any AC is **Not covered**:
+- This is a **BLOCKING** failure — the implementer must write the missing test before proceeding
+- Re-launch the implementer with the specific ACs that need tests
+- If the test cannot run in the current environment (GPU required, platform-specific, external service), the test must still exist and skip with `pytest.mark.skipif` or `pytest.skip()` explaining the prerequisite
+- A skipped test counts as **Covered** — the test exists and will run when the environment allows
+
+Only proceed to Step 9 when every AC has a corresponding test.
+
+### 9. Code Review

 - Run `/code-review` skill on the batch's changed files + corresponding task specs
 - The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL

-### 9. Auto-Fix Gate
+### 10. Auto-Fix Gate

 Auto-fix loop with bounded retries (max 2 attempts) before escalating to user:

-1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 10
+1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 11
 2. If verdict is **FAIL** (attempt 1 or 2):
   - Parse the code review findings (Critical and High severity items)
   - For each finding, attempt an automated fix using the finding's location, description, and suggestion
   - Re-run `/code-review` on the modified files
-   - If now PASS or PASS_WITH_WARNINGS → continue to step 10
+   - If now PASS or PASS_WITH_WARNINGS → continue to step 11
   - If still FAIL → increment retry counter, repeat from (2) up to max 2 attempts
 3. If still **FAIL** after 2 auto-fix attempts: present all findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding.

 Track `auto_fix_attempts` count in the batch report for retrospective analysis.

-### 10. Test
-
- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
- Test failures are a **blocking gate** — do not proceed to commit until the test-run skill completes with a user decision
- Note: the autopilot also runs a separate full test suite after all implementation batches complete (greenfield Step 7, existing-code Steps 6/10). This is intentional — per-batch tests are regression checks, the post-implement run is final validation.
-
 ### 11. Commit and Push

 - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
@@ -152,7 +165,13 @@ Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
 ### 14. Loop

 - Go back to step 2 until all tasks in `todo/` are done
- When all tasks are complete, report final summary
+
+### 15. Final Test Run
+
+- After all batches are complete, run the full test suite once
+- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
+- Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision
+- When tests pass, report final summary

 ## Batch Report Persistence

@@ -177,10 +196,11 @@ After each batch, produce a structured report:

 ## Task Results

-| Task | Status | Files Modified | Tests | Issues |
-|------|--------|---------------|-------|--------|
-| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |
+| Task | Status | Files Modified | Tests | AC Coverage | Issues |
+|------|--------|---------------|-------|-------------|--------|
+| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [N/N ACs covered] | [count or None] |

+## AC Test Coverage: [All covered / X of Y covered]
 ## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
 ## Auto-Fix Attempts: [0/1/2]
 ## Stuck Agents: [count or None]
@@ -195,7 +215,7 @@ After each batch, produce a structured report:
 | Implementer fails same approach 3+ times | Stop it, escalate to user |
 | Task blocked on external dependency (not in task list) | Report and skip |
 | File ownership conflict unresolvable | ASK user |
-| Any test failure after a batch | Delegate to test-run skill — blocking gate |
+| Test failure after final test run | Delegate to test-run skill — blocking gate |
 | All tasks complete | Report final summary, suggest final commit |
 | `_dependencies_table.md` missing | STOP — run `/decompose` first |

@@ -203,7 +223,7 @@ After each batch, produce a structured report:

 Each batch commit serves as a rollback checkpoint. If recovery is needed:

- **Tests fail after a batch commit**: `git revert <batch-commit-hash>` using the hash from the batch report in `_docs/03_implementation/`
+- **Tests fail after final test run**: `git revert <batch-commit-hash>` using hashes from the batch reports in `_docs/03_implementation/`
 - **Resuming after interruption**: Read `_docs/03_implementation/batch_*_report.md` files to determine which batches completed, then continue from the next batch
 - **Multiple consecutive batches fail**: Stop and escalate to user with links to batch reports and commit hashes

@@ -212,4 +232,4 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed:
 - Never launch tasks whose dependencies are not yet completed
 - Never allow two parallel agents to write to the same file
 - If a subagent fails or is flagged as stuck, stop it and report — do not let it loop indefinitely
- Always run tests after each batch completes
+- Always run the full test suite after all batches complete (step 15)