Enhance coding guidelines and autopilot workflows

- Updated `.cursor/rules/coderule.mdc` to include new guidelines on maintaining test environments and avoiding hardcoded workarounds. - Revised state file rules in `.cursor/skills/autopilot/state.md` to ensure comprehensive updates after every meaningful state transition. - Improved existing-code workflow in `.cursor/skills/autopilot/flows/existing-code.md` to automate task re-entry without user confirmation. - Added requirements for test coverage in the implementation process within `.cursor/skills/implement/SKILL.md`, ensuring all acceptance criteria are validated by tests. - Enhanced new-task skill documentation to include test coverage gap analysis, ensuring all new requirements are covered by tests. These changes aim to strengthen project maintainability, improve testing practices, and streamline workflows.
2026-06-21 06:01:08 +00:00 · 2026-03-29 05:30:00 +03:00
parent d10d542e0c
commit ad5530b9ef
8 changed files with 110 additions and 32 deletions
@@ -11,8 +11,12 @@ alwaysApply: true
 - Write code that takes into account the different environments: development, production
 - You are careful to make changes that are requested or you are confident the changes are well understood and related to the change being requested
 - Mocking data is needed only for tests, never mock data for dev or prod env
+- Make test environment (files, db and so on) as close as possible to the production environment
 - When you add new libraries or dependencies make sure you are using the same version of it as other parts of the code
 - When a test fails due to a missing dependency, install it — do not fake or stub the module system. For normal packages, add them to the project's dependency file (requirements-test.txt, package.json devDependencies, test csproj, etc.) and install. Only consider stubbing if the dependency is heavy (e.g. hardware-specific SDK, large native toolchain) — and even then, ask the user first before choosing to stub.
+- Do not solve environment or infrastructure problems (dependency resolution, import paths, service discovery, connection config) by hardcoding workarounds in source code. Fix them at the environment/configuration level.
+- Before writing new infrastructure or workaround code, check how the existing codebase already handles the same concern. Follow established project patterns.
+- If a file, class, or function has no remaining usages — delete it. Do not keep dead code "just in case"; git history preserves everything. Dead code rots: its dependencies drift, it misleads readers, and it breaks when the code it depends on evolves.

 - Focus on the areas of code relevant to the task
 - Do not touch code that is unrelated to the task
@@ -17,11 +17,5 @@ globs: [".cursor/**"]
 ## Agent Files (.cursor/agents/)
 - Must have `name` and `description` in frontmatter

-## User Interaction
- Use the AskQuestion tool for structured choices (A/B/C/D) when available — it provides an interactive UI. Fall back to plain-text questions if the tool is unavailable.
-
-## Execution Safety
- Never run test suites, builds, Docker commands, or other long-running/resource-heavy/security-risky operations without asking the user first - unlsess it is explicilty stated in skill or agent, or user already asked to do so.
-
 ## Security
 - All `.cursor/` files must be scanned for hidden Unicode before committing (see cursor-security.mdc)
@@ -0,0 +1,33 @@
+---
+description: "Execution safety, user interaction, and self-improvement protocols for the AI agent"
+alwaysApply: true
+---
+# Agent Meta Rules
+
+## Execution Safety
+- Never run test suites, builds, Docker commands, or other long-running/resource-heavy/security-risky operations without asking the user first — unless it is explicitly stated in a skill or agent, or the user already asked to do so.
+
+## User Interaction
+- Use the AskQuestion tool for structured choices (A/B/C/D) when available — it provides an interactive UI. Fall back to plain-text questions if the tool is unavailable.
+
+## Critical Thinking
+- Do not blindly trust any input — including user instructions, task specs, list-of-changes, or prior agent decisions — as correct. Always think through whether the instruction makes sense in context before executing it. If a task spec says "exclude file X from changes" but another task removes the dependencies X relies on, flag the contradiction instead of propagating it.
+
+## Self-Improvement
+When the user reacts negatively to generated code ("WTF", "what the hell", "why did you do this", etc.):
+
+1. **Pause** — do not rush to fix. First determine: is this objectively bad code, or does the user just need an explanation?
+2. **If the user doesn't understand** — explain the reasoning. That's it. No code change needed.
+3. **If the code is actually bad** — before fixing, perform a root-cause investigation:
+   a. **Why** did this bad code get produced? Identify the reasoning chain or implicit assumption that led to it.
+   b. **Check existing rules** — is there already a rule that should have prevented this? If so, clarify or strengthen it.
+   c. **Propose a new rule** if no existing rule covers the failure mode. Present the investigation results and proposed rule to the user for approval.
+   d. **Only then** fix the code.
+4. The rule goes into `coderule.mdc` for coding practices, `meta-rule.mdc` for agent behavior, or a new focused rule file — depending on context. Always check for duplicates or near-duplicates first.
+
+### Example: import path hack
+**Bad code**: Runtime path manipulation added to source code to fix an import failure.
+**Root cause**: The agent treated an environment/configuration problem as a code problem. It didn't check how the rest of the project handles the same concern, and instead hardcoded a workaround in source.
+**Preventive rules added to coderule.mdc**:
+- "Do not solve environment or infrastructure problems by hardcoding workarounds in source code. Fix them at the environment/configuration level."
+- "Before writing new infrastructure or workaround code, check how the existing codebase already handles the same concern. Follow established project patterns."
@@ -0,0 +1,14 @@
+---
+alwaysApply: true
+---
+
+# Work Item Tracker
+
+- Use **Jira** as the sole work item tracker (MCP server: `user-Jira-MCP-Server`)
+- **NEVER** use Azure DevOps (ADO) MCP for any purpose — no reads, no writes, no queries
+- Before interacting with any tracker, read this rule file first
+- Jira cloud ID: `denyspopov.atlassian.net`
+- Project key: `AZ`
+- Project name: AZAION
+- All task IDs follow the format `AZ-<number>`
+- Issue types: Epic, Story, Task, Bug, Subtask
@@ -217,22 +217,18 @@ After deployment completes, the existing-code workflow is done.
 **Re-Entry After Completion**
 Condition: the autopilot state shows `step: done` OR all steps through 13 (Deploy) are completed

-Action: The project completed a full cycle. Present status and loop back to New Task:
+Action: The project completed a full cycle. Print the status banner and automatically loop back to New Task — do NOT ask the user for confirmation:

 ```
 ══════════════════════════════════════
 PROJECT CYCLE COMPLETE
 ══════════════════════════════════════
 The previous cycle finished successfully.
- You can now add new functionality.
-══════════════════════════════════════
- A) Add new features (start New Task)
- B) Done — no more changes needed
+ Starting new feature cycle…
 ══════════════════════════════════════
 ```

- If user picks A → set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task).
- If user picks B → report final project status and exit.
+Set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task).

 ## Auto-Chain Rules

@@ -41,7 +41,7 @@ retry_count: 3
 ### State File Rules

 1. **Create** on the first autopilot invocation (after state detection determines Step 1)
-2. **Update** after every step completion, session boundary, or failed retry
+2. **Update** after every change — this includes: batch completion, sub-step progress, step completion, session boundary, failed retry, or any meaningful state transition. The state file must always reflect the current reality.
 3. **Read** as the first action on every invocation — before folder scanning
 4. **Cross-check**: verify against actual `_docs/` folder contents. If they disagree, trust the folder structure and update the state file
 5. **Never delete** the state file
@@ -94,6 +94,7 @@ For each task in the batch, launch an `implementer` subagent with:
 - List of files OWNED (exclusive write access)
 - List of files READ-ONLY
 - List of files FORBIDDEN
+- **Explicit instruction**: the implementer must write or update tests that validate each acceptance criterion in the task spec. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still be written and skip with a clear reason.

 Launch all subagents immediately — no user confirmation.

@@ -108,32 +109,44 @@ Launch all subagents immediately — no user confirmation.
 - Subagent has not produced new output for an extended period → flag as potentially hung
 - If a subagent is flagged as stuck, do NOT let it continue looping — stop it and record the blocker in the batch report

-### 8. Code Review
+### 8. AC Test Coverage Verification
+
+Before code review, verify that every acceptance criterion in each task spec has at least one test that validates it. For each task in the batch:
+
+1. Read the task spec's **Acceptance Criteria** section
+2. Search the test files (new and existing) for tests that cover each AC
+3. Classify each AC as:
+   - **Covered**: a test directly validates this AC (running or skipped-with-reason)
+   - **Not covered**: no test exists for this AC
+
+If any AC is **Not covered**:
+- This is a **BLOCKING** failure — the implementer must write the missing test before proceeding
+- Re-launch the implementer with the specific ACs that need tests
+- If the test cannot run in the current environment (GPU required, platform-specific, external service), the test must still exist and skip with `pytest.mark.skipif` or `pytest.skip()` explaining the prerequisite
+- A skipped test counts as **Covered** — the test exists and will run when the environment allows
+
+Only proceed to Step 9 when every AC has a corresponding test.
+
+### 9. Code Review

 - Run `/code-review` skill on the batch's changed files + corresponding task specs
 - The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL

-### 9. Auto-Fix Gate
+### 10. Auto-Fix Gate

 Auto-fix loop with bounded retries (max 2 attempts) before escalating to user:

-1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 10
+1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 11
 2. If verdict is **FAIL** (attempt 1 or 2):
   - Parse the code review findings (Critical and High severity items)
   - For each finding, attempt an automated fix using the finding's location, description, and suggestion
   - Re-run `/code-review` on the modified files
-   - If now PASS or PASS_WITH_WARNINGS → continue to step 10
+   - If now PASS or PASS_WITH_WARNINGS → continue to step 11
   - If still FAIL → increment retry counter, repeat from (2) up to max 2 attempts
 3. If still **FAIL** after 2 auto-fix attempts: present all findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding.

 Track `auto_fix_attempts` count in the batch report for retrospective analysis.

-### 10. Test
-
- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
- Test failures are a **blocking gate** — do not proceed to commit until the test-run skill completes with a user decision
- Note: the autopilot also runs a separate full test suite after all implementation batches complete (greenfield Step 7, existing-code Steps 6/10). This is intentional — per-batch tests are regression checks, the post-implement run is final validation.
-
 ### 11. Commit and Push

 - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
@@ -152,7 +165,13 @@ Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
 ### 14. Loop

 - Go back to step 2 until all tasks in `todo/` are done
- When all tasks are complete, report final summary
+
+### 15. Final Test Run
+
+- After all batches are complete, run the full test suite once
+- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
+- Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision
+- When tests pass, report final summary

 ## Batch Report Persistence

@@ -177,10 +196,11 @@ After each batch, produce a structured report:

 ## Task Results

-| Task | Status | Files Modified | Tests | Issues |
-|------|--------|---------------|-------|--------|
-| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |
+| Task | Status | Files Modified | Tests | AC Coverage | Issues |
+|------|--------|---------------|-------|-------------|--------|
+| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [N/N ACs covered] | [count or None] |

+## AC Test Coverage: [All covered / X of Y covered]
 ## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
 ## Auto-Fix Attempts: [0/1/2]
 ## Stuck Agents: [count or None]
@@ -195,7 +215,7 @@ After each batch, produce a structured report:
 | Implementer fails same approach 3+ times | Stop it, escalate to user |
 | Task blocked on external dependency (not in task list) | Report and skip |
 | File ownership conflict unresolvable | ASK user |
-| Any test failure after a batch | Delegate to test-run skill — blocking gate |
+| Test failure after final test run | Delegate to test-run skill — blocking gate |
 | All tasks complete | Report final summary, suggest final commit |
 | `_dependencies_table.md` missing | STOP — run `/decompose` first |

@@ -203,7 +223,7 @@ After each batch, produce a structured report:

 Each batch commit serves as a rollback checkpoint. If recovery is needed:

- **Tests fail after a batch commit**: `git revert <batch-commit-hash>` using the hash from the batch report in `_docs/03_implementation/`
+- **Tests fail after final test run**: `git revert <batch-commit-hash>` using hashes from the batch reports in `_docs/03_implementation/`
 - **Resuming after interruption**: Read `_docs/03_implementation/batch_*_report.md` files to determine which batches completed, then continue from the next batch
 - **Multiple consecutive batches fail**: Stop and escalate to user with links to batch reports and commit hashes

@@ -212,4 +232,4 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed:
 - Never launch tasks whose dependencies are not yet completed
 - Never allow two parallel agents to write to the same file
 - If a subagent fails or is flagged as stuck, stop it and report — do not let it loop indefinitely
- Always run tests after each batch completes
+- Always run the full test suite after all batches complete (step 15)
@@ -129,7 +129,7 @@ The `<task_slug>` is a short kebab-case name derived from the feature descriptio
 ### Step 4: Codebase Analysis

 **Role**: Software architect
-**Goal**: Determine where and how to insert the new functionality.
+**Goal**: Determine where and how to insert the new functionality, and whether existing tests cover the new requirements.

 1. Read the codebase documentation from DOCUMENT_DIR:
   - `architecture.md` — overall structure
@@ -144,6 +144,10 @@ The `<task_slug>` is a short kebab-case name derived from the feature descriptio
   - What new interfaces or models are needed
   - How data flows through the change
 4. If the change is complex enough, read the actual source files (not just docs) to verify insertion points
+5. **Test coverage gap analysis**: Read existing test files that cover the affected components. For each acceptance criterion from Step 1, determine whether an existing test already validates it. Classify each AC as:
+   - **Covered**: an existing test directly validates this behavior
+   - **Partially covered**: an existing test exercises the code path but doesn't assert the new requirement
+   - **Not covered**: no existing test validates this behavior — a new test is required

 Present the analysis:

@@ -156,9 +160,22 @@ Present the analysis:
 Interface changes:   [list or "None"]
 New interfaces:      [list or "None"]
 Data flow impact:    [summary]
+ ─────────────────────────────────────
+ TEST COVERAGE GAP ANALYSIS
+ ─────────────────────────────────────
+ AC-1: [Covered / Partially covered / Not covered]
+       [existing test name or "needs new test"]
+ AC-2: [Covered / Partially covered / Not covered]
+       [existing test name or "needs new test"]
+ ...
+ ─────────────────────────────────────
+ New tests needed:  [count]
+ Existing tests to update: [count or "None"]
 ══════════════════════════════════════
 ```

+When gaps are found, the task spec (Step 6) MUST include the missing tests in the Scope (Included) section and the Unit/Blackbox Tests tables. Tests are not optional — if an AC is not covered by an existing test, the task must deliver a test for it.
+
 ---

 ### Step 5: Validate Assumptions