Enhance coding guidelines and autopilot workflows

- Updated `.cursor/rules/coderule.mdc` to include new guidelines on maintaining test environments and avoiding hardcoded workarounds. - Revised state file rules in `.cursor/skills/autopilot/state.md` to ensure comprehensive updates after every meaningful state transition. - Improved existing-code workflow in `.cursor/skills/autopilot/flows/existing-code.md` to automate task re-entry without user confirmation. - Added requirements for test coverage in the implementation process within `.cursor/skills/implement/SKILL.md`, ensuring all acceptance criteria are validated by tests. - Enhanced new-task skill documentation to include test coverage gap analysis, ensuring all new requirements are covered by tests. These changes aim to strengthen project maintainability, improve testing practices, and streamline workflows.
2026-06-21 09:51:08 +00:00 · 2026-03-29 05:30:00 +03:00
parent d10d542e0c
commit ad5530b9ef
8 changed files with 110 additions and 32 deletions
@@ -217,22 +217,18 @@ After deployment completes, the existing-code workflow is done.
 **Re-Entry After Completion**
 Condition: the autopilot state shows `step: done` OR all steps through 13 (Deploy) are completed

-Action: The project completed a full cycle. Present status and loop back to New Task:
+Action: The project completed a full cycle. Print the status banner and automatically loop back to New Task — do NOT ask the user for confirmation:

 ```
 ══════════════════════════════════════
 PROJECT CYCLE COMPLETE
 ══════════════════════════════════════
 The previous cycle finished successfully.
- You can now add new functionality.
-══════════════════════════════════════
- A) Add new features (start New Task)
- B) Done — no more changes needed
+ Starting new feature cycle…
 ══════════════════════════════════════
 ```

- If user picks A → set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task).
- If user picks B → report final project status and exit.
+Set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task).

 ## Auto-Chain Rules

@@ -41,7 +41,7 @@ retry_count: 3
 ### State File Rules

 1. **Create** on the first autopilot invocation (after state detection determines Step 1)
-2. **Update** after every step completion, session boundary, or failed retry
+2. **Update** after every change — this includes: batch completion, sub-step progress, step completion, session boundary, failed retry, or any meaningful state transition. The state file must always reflect the current reality.
 3. **Read** as the first action on every invocation — before folder scanning
 4. **Cross-check**: verify against actual `_docs/` folder contents. If they disagree, trust the folder structure and update the state file
 5. **Never delete** the state file
@@ -94,6 +94,7 @@ For each task in the batch, launch an `implementer` subagent with:
 - List of files OWNED (exclusive write access)
 - List of files READ-ONLY
 - List of files FORBIDDEN
+- **Explicit instruction**: the implementer must write or update tests that validate each acceptance criterion in the task spec. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still be written and skip with a clear reason.

 Launch all subagents immediately — no user confirmation.

@@ -108,32 +109,44 @@ Launch all subagents immediately — no user confirmation.
 - Subagent has not produced new output for an extended period → flag as potentially hung
 - If a subagent is flagged as stuck, do NOT let it continue looping — stop it and record the blocker in the batch report

-### 8. Code Review
+### 8. AC Test Coverage Verification
+
+Before code review, verify that every acceptance criterion in each task spec has at least one test that validates it. For each task in the batch:
+
+1. Read the task spec's **Acceptance Criteria** section
+2. Search the test files (new and existing) for tests that cover each AC
+3. Classify each AC as:
+   - **Covered**: a test directly validates this AC (running or skipped-with-reason)
+   - **Not covered**: no test exists for this AC
+
+If any AC is **Not covered**:
+- This is a **BLOCKING** failure — the implementer must write the missing test before proceeding
+- Re-launch the implementer with the specific ACs that need tests
+- If the test cannot run in the current environment (GPU required, platform-specific, external service), the test must still exist and skip with `pytest.mark.skipif` or `pytest.skip()` explaining the prerequisite
+- A skipped test counts as **Covered** — the test exists and will run when the environment allows
+
+Only proceed to Step 9 when every AC has a corresponding test.
+
+### 9. Code Review

 - Run `/code-review` skill on the batch's changed files + corresponding task specs
 - The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL

-### 9. Auto-Fix Gate
+### 10. Auto-Fix Gate

 Auto-fix loop with bounded retries (max 2 attempts) before escalating to user:

-1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 10
+1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 11
 2. If verdict is **FAIL** (attempt 1 or 2):
   - Parse the code review findings (Critical and High severity items)
   - For each finding, attempt an automated fix using the finding's location, description, and suggestion
   - Re-run `/code-review` on the modified files
-   - If now PASS or PASS_WITH_WARNINGS → continue to step 10
+   - If now PASS or PASS_WITH_WARNINGS → continue to step 11
   - If still FAIL → increment retry counter, repeat from (2) up to max 2 attempts
 3. If still **FAIL** after 2 auto-fix attempts: present all findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding.

 Track `auto_fix_attempts` count in the batch report for retrospective analysis.

-### 10. Test
-
- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
- Test failures are a **blocking gate** — do not proceed to commit until the test-run skill completes with a user decision
- Note: the autopilot also runs a separate full test suite after all implementation batches complete (greenfield Step 7, existing-code Steps 6/10). This is intentional — per-batch tests are regression checks, the post-implement run is final validation.
-
 ### 11. Commit and Push

 - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
@@ -152,7 +165,13 @@ Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
 ### 14. Loop

 - Go back to step 2 until all tasks in `todo/` are done
- When all tasks are complete, report final summary
+
+### 15. Final Test Run
+
+- After all batches are complete, run the full test suite once
+- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
+- Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision
+- When tests pass, report final summary

 ## Batch Report Persistence

@@ -177,10 +196,11 @@ After each batch, produce a structured report:

 ## Task Results

-| Task | Status | Files Modified | Tests | Issues |
-|------|--------|---------------|-------|--------|
-| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |
+| Task | Status | Files Modified | Tests | AC Coverage | Issues |
+|------|--------|---------------|-------|-------------|--------|
+| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [N/N ACs covered] | [count or None] |

+## AC Test Coverage: [All covered / X of Y covered]
 ## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
 ## Auto-Fix Attempts: [0/1/2]
 ## Stuck Agents: [count or None]
@@ -195,7 +215,7 @@ After each batch, produce a structured report:
 | Implementer fails same approach 3+ times | Stop it, escalate to user |
 | Task blocked on external dependency (not in task list) | Report and skip |
 | File ownership conflict unresolvable | ASK user |
-| Any test failure after a batch | Delegate to test-run skill — blocking gate |
+| Test failure after final test run | Delegate to test-run skill — blocking gate |
 | All tasks complete | Report final summary, suggest final commit |
 | `_dependencies_table.md` missing | STOP — run `/decompose` first |

@@ -203,7 +223,7 @@ After each batch, produce a structured report:

 Each batch commit serves as a rollback checkpoint. If recovery is needed:

- **Tests fail after a batch commit**: `git revert <batch-commit-hash>` using the hash from the batch report in `_docs/03_implementation/`
+- **Tests fail after final test run**: `git revert <batch-commit-hash>` using hashes from the batch reports in `_docs/03_implementation/`
 - **Resuming after interruption**: Read `_docs/03_implementation/batch_*_report.md` files to determine which batches completed, then continue from the next batch
 - **Multiple consecutive batches fail**: Stop and escalate to user with links to batch reports and commit hashes

@@ -212,4 +232,4 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed:
 - Never launch tasks whose dependencies are not yet completed
 - Never allow two parallel agents to write to the same file
 - If a subagent fails or is flagged as stuck, stop it and report — do not let it loop indefinitely
- Always run tests after each batch completes
+- Always run the full test suite after all batches complete (step 15)
@@ -129,7 +129,7 @@ The `<task_slug>` is a short kebab-case name derived from the feature descriptio
 ### Step 4: Codebase Analysis

 **Role**: Software architect
-**Goal**: Determine where and how to insert the new functionality.
+**Goal**: Determine where and how to insert the new functionality, and whether existing tests cover the new requirements.

 1. Read the codebase documentation from DOCUMENT_DIR:
   - `architecture.md` — overall structure
@@ -144,6 +144,10 @@ The `<task_slug>` is a short kebab-case name derived from the feature descriptio
   - What new interfaces or models are needed
   - How data flows through the change
 4. If the change is complex enough, read the actual source files (not just docs) to verify insertion points
+5. **Test coverage gap analysis**: Read existing test files that cover the affected components. For each acceptance criterion from Step 1, determine whether an existing test already validates it. Classify each AC as:
+   - **Covered**: an existing test directly validates this behavior
+   - **Partially covered**: an existing test exercises the code path but doesn't assert the new requirement
+   - **Not covered**: no existing test validates this behavior — a new test is required

 Present the analysis:

@@ -156,9 +160,22 @@ Present the analysis:
 Interface changes:   [list or "None"]
 New interfaces:      [list or "None"]
 Data flow impact:    [summary]
+ ─────────────────────────────────────
+ TEST COVERAGE GAP ANALYSIS
+ ─────────────────────────────────────
+ AC-1: [Covered / Partially covered / Not covered]
+       [existing test name or "needs new test"]
+ AC-2: [Covered / Partially covered / Not covered]
+       [existing test name or "needs new test"]
+ ...
+ ─────────────────────────────────────
+ New tests needed:  [count]
+ Existing tests to update: [count or "None"]
 ══════════════════════════════════════
 ```

+When gaps are found, the task spec (Step 6) MUST include the missing tests in the Scope (Included) section and the Unit/Blackbox Tests tables. Tests are not optional — if an AC is not covered by an existing test, the task must deliver a test for it.
+
 ---

 ### Step 5: Validate Assumptions