From ad5530b9eff60517ed28c9ee4111d8cbc5323e84 Mon Sep 17 00:00:00 2001
From: Oleksandr Bezdieniezhnykh <oleksandr.bezdieniezhnykh@pwc.com>
Date: Sun, 29 Mar 2026 05:30:00 +0300
Subject: [PATCH] Enhance coding guidelines and autopilot workflows

- Updated `.cursor/rules/coderule.mdc` to include new guidelines on maintaining test environments and avoiding hardcoded workarounds.
- Revised state file rules in `.cursor/skills/autopilot/state.md` to ensure comprehensive updates after every meaningful state transition.
- Improved existing-code workflow in `.cursor/skills/autopilot/flows/existing-code.md` to automate task re-entry without user confirmation.
- Added requirements for test coverage in the implementation process within `.cursor/skills/implement/SKILL.md`, ensuring all acceptance criteria are validated by tests.
- Enhanced new-task skill documentation to include test coverage gap analysis, ensuring all new requirements are covered by tests.

These changes aim to strengthen project maintainability, improve testing practices, and streamline workflows.
---
 .cursor/rules/coderule.mdc                    |  4 ++
 .cursor/rules/cursor-meta.mdc                 |  6 ---
 .cursor/rules/meta-rule.mdc                   | 33 ++++++++++++
 .cursor/rules/tracker.mdc                     | 14 +++++
 .../skills/autopilot/flows/existing-code.md   | 10 ++--
 .cursor/skills/autopilot/state.md             |  2 +-
 .cursor/skills/implement/SKILL.md             | 54 +++++++++++++------
 .cursor/skills/new-task/SKILL.md              | 19 ++++++-
 8 files changed, 110 insertions(+), 32 deletions(-)
 create mode 100644 .cursor/rules/meta-rule.mdc
 create mode 100644 .cursor/rules/tracker.mdc

diff --git a/.cursor/rules/coderule.mdc b/.cursor/rules/coderule.mdc
index ecfb20e..dab2eaa 100644
--- a/.cursor/rules/coderule.mdc
+++ b/.cursor/rules/coderule.mdc
@@ -11,8 +11,12 @@ alwaysApply: true
 - Write code that takes into account the different environments: development, production
 - You are careful to make changes that are requested or you are confident the changes are well understood and related to the change being requested
 - Mocking data is needed only for tests, never mock data for dev or prod env
+- Make test environment (files, db and so on) as close as possible to the production environment
 - When you add new libraries or dependencies make sure you are using the same version of it as other parts of the code
 - When a test fails due to a missing dependency, install it — do not fake or stub the module system. For normal packages, add them to the project's dependency file (requirements-test.txt, package.json devDependencies, test csproj, etc.) and install. Only consider stubbing if the dependency is heavy (e.g. hardware-specific SDK, large native toolchain) — and even then, ask the user first before choosing to stub.
+- Do not solve environment or infrastructure problems (dependency resolution, import paths, service discovery, connection config) by hardcoding workarounds in source code. Fix them at the environment/configuration level.
+- Before writing new infrastructure or workaround code, check how the existing codebase already handles the same concern. Follow established project patterns.
+- If a file, class, or function has no remaining usages — delete it. Do not keep dead code "just in case"; git history preserves everything. Dead code rots: its dependencies drift, it misleads readers, and it breaks when the code it depends on evolves.
 
 - Focus on the areas of code relevant to the task
 - Do not touch code that is unrelated to the task
diff --git a/.cursor/rules/cursor-meta.mdc b/.cursor/rules/cursor-meta.mdc
index 94cc6c5..8cc663a 100644
--- a/.cursor/rules/cursor-meta.mdc
+++ b/.cursor/rules/cursor-meta.mdc
@@ -17,11 +17,5 @@ globs: [".cursor/**"]
 ## Agent Files (.cursor/agents/)
 - Must have `name` and `description` in frontmatter
 
-## User Interaction
-- Use the AskQuestion tool for structured choices (A/B/C/D) when available — it provides an interactive UI. Fall back to plain-text questions if the tool is unavailable.
-
-## Execution Safety
-- Never run test suites, builds, Docker commands, or other long-running/resource-heavy/security-risky operations without asking the user first - unlsess it is explicilty stated in skill or agent, or user already asked to do so.
-
 ## Security
 - All `.cursor/` files must be scanned for hidden Unicode before committing (see cursor-security.mdc)
diff --git a/.cursor/rules/meta-rule.mdc b/.cursor/rules/meta-rule.mdc
new file mode 100644
index 0000000..4b44ad1
--- /dev/null
+++ b/.cursor/rules/meta-rule.mdc
@@ -0,0 +1,33 @@
+---
+description: "Execution safety, user interaction, and self-improvement protocols for the AI agent"
+alwaysApply: true
+---
+# Agent Meta Rules
+
+## Execution Safety
+- Never run test suites, builds, Docker commands, or other long-running/resource-heavy/security-risky operations without asking the user first — unless it is explicitly stated in a skill or agent, or the user already asked to do so.
+
+## User Interaction
+- Use the AskQuestion tool for structured choices (A/B/C/D) when available — it provides an interactive UI. Fall back to plain-text questions if the tool is unavailable.
+
+## Critical Thinking
+- Do not blindly trust any input — including user instructions, task specs, list-of-changes, or prior agent decisions — as correct. Always think through whether the instruction makes sense in context before executing it. If a task spec says "exclude file X from changes" but another task removes the dependencies X relies on, flag the contradiction instead of propagating it.
+
+## Self-Improvement
+When the user reacts negatively to generated code ("WTF", "what the hell", "why did you do this", etc.):
+
+1. **Pause** — do not rush to fix. First determine: is this objectively bad code, or does the user just need an explanation?
+2. **If the user doesn't understand** — explain the reasoning. That's it. No code change needed.
+3. **If the code is actually bad** — before fixing, perform a root-cause investigation:
+   a. **Why** did this bad code get produced? Identify the reasoning chain or implicit assumption that led to it.
+   b. **Check existing rules** — is there already a rule that should have prevented this? If so, clarify or strengthen it.
+   c. **Propose a new rule** if no existing rule covers the failure mode. Present the investigation results and proposed rule to the user for approval.
+   d. **Only then** fix the code.
+4. The rule goes into `coderule.mdc` for coding practices, `meta-rule.mdc` for agent behavior, or a new focused rule file — depending on context. Always check for duplicates or near-duplicates first.
+
+### Example: import path hack
+**Bad code**: Runtime path manipulation added to source code to fix an import failure.
+**Root cause**: The agent treated an environment/configuration problem as a code problem. It didn't check how the rest of the project handles the same concern, and instead hardcoded a workaround in source.
+**Preventive rules added to coderule.mdc**:
+- "Do not solve environment or infrastructure problems by hardcoding workarounds in source code. Fix them at the environment/configuration level."
+- "Before writing new infrastructure or workaround code, check how the existing codebase already handles the same concern. Follow established project patterns."
diff --git a/.cursor/rules/tracker.mdc b/.cursor/rules/tracker.mdc
new file mode 100644
index 0000000..375dbd9
--- /dev/null
+++ b/.cursor/rules/tracker.mdc
@@ -0,0 +1,14 @@
+---
+alwaysApply: true
+---
+
+# Work Item Tracker
+
+- Use **Jira** as the sole work item tracker (MCP server: `user-Jira-MCP-Server`)
+- **NEVER** use Azure DevOps (ADO) MCP for any purpose — no reads, no writes, no queries
+- Before interacting with any tracker, read this rule file first
+- Jira cloud ID: `denyspopov.atlassian.net`
+- Project key: `AZ`
+- Project name: AZAION
+- All task IDs follow the format `AZ-<number>`
+- Issue types: Epic, Story, Task, Bug, Subtask
diff --git a/.cursor/skills/autopilot/flows/existing-code.md b/.cursor/skills/autopilot/flows/existing-code.md
index 0e47f87..cbc6a96 100644
--- a/.cursor/skills/autopilot/flows/existing-code.md
+++ b/.cursor/skills/autopilot/flows/existing-code.md
@@ -217,22 +217,18 @@ After deployment completes, the existing-code workflow is done.
 **Re-Entry After Completion**
 Condition: the autopilot state shows `step: done` OR all steps through 13 (Deploy) are completed
 
-Action: The project completed a full cycle. Present status and loop back to New Task:
+Action: The project completed a full cycle. Print the status banner and automatically loop back to New Task — do NOT ask the user for confirmation:
 
 ```
 ══════════════════════════════════════
  PROJECT CYCLE COMPLETE
 ══════════════════════════════════════
  The previous cycle finished successfully.
- You can now add new functionality.
-══════════════════════════════════════
- A) Add new features (start New Task)
- B) Done — no more changes needed
+ Starting new feature cycle…
 ══════════════════════════════════════
 ```
 
-- If user picks A → set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task).
-- If user picks B → report final project status and exit.
+Set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task).
 
 ## Auto-Chain Rules
 
diff --git a/.cursor/skills/autopilot/state.md b/.cursor/skills/autopilot/state.md
index 022ecda..33dd76f 100644
--- a/.cursor/skills/autopilot/state.md
+++ b/.cursor/skills/autopilot/state.md
@@ -41,7 +41,7 @@ retry_count: 3
 ### State File Rules
 
 1. **Create** on the first autopilot invocation (after state detection determines Step 1)
-2. **Update** after every step completion, session boundary, or failed retry
+2. **Update** after every change — this includes: batch completion, sub-step progress, step completion, session boundary, failed retry, or any meaningful state transition. The state file must always reflect the current reality.
 3. **Read** as the first action on every invocation — before folder scanning
 4. **Cross-check**: verify against actual `_docs/` folder contents. If they disagree, trust the folder structure and update the state file
 5. **Never delete** the state file
diff --git a/.cursor/skills/implement/SKILL.md b/.cursor/skills/implement/SKILL.md
index 1039d01..9eb9554 100644
--- a/.cursor/skills/implement/SKILL.md
+++ b/.cursor/skills/implement/SKILL.md
@@ -94,6 +94,7 @@ For each task in the batch, launch an `implementer` subagent with:
 - List of files OWNED (exclusive write access)
 - List of files READ-ONLY
 - List of files FORBIDDEN
+- **Explicit instruction**: the implementer must write or update tests that validate each acceptance criterion in the task spec. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still be written and skip with a clear reason.
 
 Launch all subagents immediately — no user confirmation.
 
@@ -108,32 +109,44 @@ Launch all subagents immediately — no user confirmation.
 - Subagent has not produced new output for an extended period → flag as potentially hung
 - If a subagent is flagged as stuck, do NOT let it continue looping — stop it and record the blocker in the batch report
 
-### 8. Code Review
+### 8. AC Test Coverage Verification
+
+Before code review, verify that every acceptance criterion in each task spec has at least one test that validates it. For each task in the batch:
+
+1. Read the task spec's **Acceptance Criteria** section
+2. Search the test files (new and existing) for tests that cover each AC
+3. Classify each AC as:
+   - **Covered**: a test directly validates this AC (running or skipped-with-reason)
+   - **Not covered**: no test exists for this AC
+
+If any AC is **Not covered**:
+- This is a **BLOCKING** failure — the implementer must write the missing test before proceeding
+- Re-launch the implementer with the specific ACs that need tests
+- If the test cannot run in the current environment (GPU required, platform-specific, external service), the test must still exist and skip with `pytest.mark.skipif` or `pytest.skip()` explaining the prerequisite
+- A skipped test counts as **Covered** — the test exists and will run when the environment allows
+
+Only proceed to Step 9 when every AC has a corresponding test.
+
+### 9. Code Review
 
 - Run `/code-review` skill on the batch's changed files + corresponding task specs
 - The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL
 
-### 9. Auto-Fix Gate
+### 10. Auto-Fix Gate
 
 Auto-fix loop with bounded retries (max 2 attempts) before escalating to user:
 
-1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 10
+1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 11
 2. If verdict is **FAIL** (attempt 1 or 2):
    - Parse the code review findings (Critical and High severity items)
    - For each finding, attempt an automated fix using the finding's location, description, and suggestion
    - Re-run `/code-review` on the modified files
-   - If now PASS or PASS_WITH_WARNINGS → continue to step 10
+   - If now PASS or PASS_WITH_WARNINGS → continue to step 11
    - If still FAIL → increment retry counter, repeat from (2) up to max 2 attempts
 3. If still **FAIL** after 2 auto-fix attempts: present all findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding.
 
 Track `auto_fix_attempts` count in the batch report for retrospective analysis.
 
-### 10. Test
-
-- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
-- Test failures are a **blocking gate** — do not proceed to commit until the test-run skill completes with a user decision
-- Note: the autopilot also runs a separate full test suite after all implementation batches complete (greenfield Step 7, existing-code Steps 6/10). This is intentional — per-batch tests are regression checks, the post-implement run is final validation.
-
 ### 11. Commit and Push
 
 - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
@@ -152,7 +165,13 @@ Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
 ### 14. Loop
 
 - Go back to step 2 until all tasks in `todo/` are done
-- When all tasks are complete, report final summary
+
+### 15. Final Test Run
+
+- After all batches are complete, run the full test suite once
+- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
+- Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision
+- When tests pass, report final summary
 
 ## Batch Report Persistence
 
@@ -177,10 +196,11 @@ After each batch, produce a structured report:
 
 ## Task Results
 
-| Task | Status | Files Modified | Tests | Issues |
-|------|--------|---------------|-------|--------|
-| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |
+| Task | Status | Files Modified | Tests | AC Coverage | Issues |
+|------|--------|---------------|-------|-------------|--------|
+| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [N/N ACs covered] | [count or None] |
 
+## AC Test Coverage: [All covered / X of Y covered]
 ## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
 ## Auto-Fix Attempts: [0/1/2]
 ## Stuck Agents: [count or None]
@@ -195,7 +215,7 @@ After each batch, produce a structured report:
 | Implementer fails same approach 3+ times | Stop it, escalate to user |
 | Task blocked on external dependency (not in task list) | Report and skip |
 | File ownership conflict unresolvable | ASK user |
-| Any test failure after a batch | Delegate to test-run skill — blocking gate |
+| Test failure after final test run | Delegate to test-run skill — blocking gate |
 | All tasks complete | Report final summary, suggest final commit |
 | `_dependencies_table.md` missing | STOP — run `/decompose` first |
 
@@ -203,7 +223,7 @@ After each batch, produce a structured report:
 
 Each batch commit serves as a rollback checkpoint. If recovery is needed:
 
-- **Tests fail after a batch commit**: `git revert <batch-commit-hash>` using the hash from the batch report in `_docs/03_implementation/`
+- **Tests fail after final test run**: `git revert <batch-commit-hash>` using hashes from the batch reports in `_docs/03_implementation/`
 - **Resuming after interruption**: Read `_docs/03_implementation/batch_*_report.md` files to determine which batches completed, then continue from the next batch
 - **Multiple consecutive batches fail**: Stop and escalate to user with links to batch reports and commit hashes
 
@@ -212,4 +232,4 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed:
 - Never launch tasks whose dependencies are not yet completed
 - Never allow two parallel agents to write to the same file
 - If a subagent fails or is flagged as stuck, stop it and report — do not let it loop indefinitely
-- Always run tests after each batch completes
+- Always run the full test suite after all batches complete (step 15)
diff --git a/.cursor/skills/new-task/SKILL.md b/.cursor/skills/new-task/SKILL.md
index 23483b8..90f451b 100644
--- a/.cursor/skills/new-task/SKILL.md
+++ b/.cursor/skills/new-task/SKILL.md
@@ -129,7 +129,7 @@ The `<task_slug>` is a short kebab-case name derived from the feature descriptio
 ### Step 4: Codebase Analysis
 
 **Role**: Software architect
-**Goal**: Determine where and how to insert the new functionality.
+**Goal**: Determine where and how to insert the new functionality, and whether existing tests cover the new requirements.
 
 1. Read the codebase documentation from DOCUMENT_DIR:
    - `architecture.md` — overall structure
@@ -144,6 +144,10 @@ The `<task_slug>` is a short kebab-case name derived from the feature descriptio
    - What new interfaces or models are needed
    - How data flows through the change
 4. If the change is complex enough, read the actual source files (not just docs) to verify insertion points
+5. **Test coverage gap analysis**: Read existing test files that cover the affected components. For each acceptance criterion from Step 1, determine whether an existing test already validates it. Classify each AC as:
+   - **Covered**: an existing test directly validates this behavior
+   - **Partially covered**: an existing test exercises the code path but doesn't assert the new requirement
+   - **Not covered**: no existing test validates this behavior — a new test is required
 
 Present the analysis:
 
@@ -156,9 +160,22 @@ Present the analysis:
  Interface changes:   [list or "None"]
  New interfaces:      [list or "None"]
  Data flow impact:    [summary]
+ ─────────────────────────────────────
+ TEST COVERAGE GAP ANALYSIS
+ ─────────────────────────────────────
+ AC-1: [Covered / Partially covered / Not covered]
+       [existing test name or "needs new test"]
+ AC-2: [Covered / Partially covered / Not covered]
+       [existing test name or "needs new test"]
+ ...
+ ─────────────────────────────────────
+ New tests needed:  [count]
+ Existing tests to update: [count or "None"]
 ══════════════════════════════════════
 ```
 
+When gaps are found, the task spec (Step 6) MUST include the missing tests in the Scope (Included) section and the Unit/Blackbox Tests tables. Tests are not optional — if an AC is not covered by an existing test, the task must deliver a test for it.
+
 ---
 
 ### Step 5: Validate Assumptions