From ad5530b9eff60517ed28c9ee4111d8cbc5323e84 Mon Sep 17 00:00:00 2001 From: Oleksandr Bezdieniezhnykh Date: Sun, 29 Mar 2026 05:30:00 +0300 Subject: [PATCH] Enhance coding guidelines and autopilot workflows - Updated `.cursor/rules/coderule.mdc` to include new guidelines on maintaining test environments and avoiding hardcoded workarounds. - Revised state file rules in `.cursor/skills/autopilot/state.md` to ensure comprehensive updates after every meaningful state transition. - Improved existing-code workflow in `.cursor/skills/autopilot/flows/existing-code.md` to automate task re-entry without user confirmation. - Added requirements for test coverage in the implementation process within `.cursor/skills/implement/SKILL.md`, ensuring all acceptance criteria are validated by tests. - Enhanced new-task skill documentation to include test coverage gap analysis, ensuring all new requirements are covered by tests. These changes aim to strengthen project maintainability, improve testing practices, and streamline workflows. --- .cursor/rules/coderule.mdc | 4 ++ .cursor/rules/cursor-meta.mdc | 6 --- .cursor/rules/meta-rule.mdc | 33 ++++++++++++ .cursor/rules/tracker.mdc | 14 +++++ .../skills/autopilot/flows/existing-code.md | 10 ++-- .cursor/skills/autopilot/state.md | 2 +- .cursor/skills/implement/SKILL.md | 54 +++++++++++++------ .cursor/skills/new-task/SKILL.md | 19 ++++++- 8 files changed, 110 insertions(+), 32 deletions(-) create mode 100644 .cursor/rules/meta-rule.mdc create mode 100644 .cursor/rules/tracker.mdc diff --git a/.cursor/rules/coderule.mdc b/.cursor/rules/coderule.mdc index ecfb20e..dab2eaa 100644 --- a/.cursor/rules/coderule.mdc +++ b/.cursor/rules/coderule.mdc @@ -11,8 +11,12 @@ alwaysApply: true - Write code that takes into account the different environments: development, production - You are careful to make changes that are requested or you are confident the changes are well understood and related to the change being requested - Mocking data is needed only for tests, never mock data for dev or prod env +- Make test environment (files, db and so on) as close as possible to the production environment - When you add new libraries or dependencies make sure you are using the same version of it as other parts of the code - When a test fails due to a missing dependency, install it — do not fake or stub the module system. For normal packages, add them to the project's dependency file (requirements-test.txt, package.json devDependencies, test csproj, etc.) and install. Only consider stubbing if the dependency is heavy (e.g. hardware-specific SDK, large native toolchain) — and even then, ask the user first before choosing to stub. +- Do not solve environment or infrastructure problems (dependency resolution, import paths, service discovery, connection config) by hardcoding workarounds in source code. Fix them at the environment/configuration level. +- Before writing new infrastructure or workaround code, check how the existing codebase already handles the same concern. Follow established project patterns. +- If a file, class, or function has no remaining usages — delete it. Do not keep dead code "just in case"; git history preserves everything. Dead code rots: its dependencies drift, it misleads readers, and it breaks when the code it depends on evolves. - Focus on the areas of code relevant to the task - Do not touch code that is unrelated to the task diff --git a/.cursor/rules/cursor-meta.mdc b/.cursor/rules/cursor-meta.mdc index 94cc6c5..8cc663a 100644 --- a/.cursor/rules/cursor-meta.mdc +++ b/.cursor/rules/cursor-meta.mdc @@ -17,11 +17,5 @@ globs: [".cursor/**"] ## Agent Files (.cursor/agents/) - Must have `name` and `description` in frontmatter -## User Interaction -- Use the AskQuestion tool for structured choices (A/B/C/D) when available — it provides an interactive UI. Fall back to plain-text questions if the tool is unavailable. - -## Execution Safety -- Never run test suites, builds, Docker commands, or other long-running/resource-heavy/security-risky operations without asking the user first - unlsess it is explicilty stated in skill or agent, or user already asked to do so. - ## Security - All `.cursor/` files must be scanned for hidden Unicode before committing (see cursor-security.mdc) diff --git a/.cursor/rules/meta-rule.mdc b/.cursor/rules/meta-rule.mdc new file mode 100644 index 0000000..4b44ad1 --- /dev/null +++ b/.cursor/rules/meta-rule.mdc @@ -0,0 +1,33 @@ +--- +description: "Execution safety, user interaction, and self-improvement protocols for the AI agent" +alwaysApply: true +--- +# Agent Meta Rules + +## Execution Safety +- Never run test suites, builds, Docker commands, or other long-running/resource-heavy/security-risky operations without asking the user first — unless it is explicitly stated in a skill or agent, or the user already asked to do so. + +## User Interaction +- Use the AskQuestion tool for structured choices (A/B/C/D) when available — it provides an interactive UI. Fall back to plain-text questions if the tool is unavailable. + +## Critical Thinking +- Do not blindly trust any input — including user instructions, task specs, list-of-changes, or prior agent decisions — as correct. Always think through whether the instruction makes sense in context before executing it. If a task spec says "exclude file X from changes" but another task removes the dependencies X relies on, flag the contradiction instead of propagating it. + +## Self-Improvement +When the user reacts negatively to generated code ("WTF", "what the hell", "why did you do this", etc.): + +1. **Pause** — do not rush to fix. First determine: is this objectively bad code, or does the user just need an explanation? +2. **If the user doesn't understand** — explain the reasoning. That's it. No code change needed. +3. **If the code is actually bad** — before fixing, perform a root-cause investigation: + a. **Why** did this bad code get produced? Identify the reasoning chain or implicit assumption that led to it. + b. **Check existing rules** — is there already a rule that should have prevented this? If so, clarify or strengthen it. + c. **Propose a new rule** if no existing rule covers the failure mode. Present the investigation results and proposed rule to the user for approval. + d. **Only then** fix the code. +4. The rule goes into `coderule.mdc` for coding practices, `meta-rule.mdc` for agent behavior, or a new focused rule file — depending on context. Always check for duplicates or near-duplicates first. + +### Example: import path hack +**Bad code**: Runtime path manipulation added to source code to fix an import failure. +**Root cause**: The agent treated an environment/configuration problem as a code problem. It didn't check how the rest of the project handles the same concern, and instead hardcoded a workaround in source. +**Preventive rules added to coderule.mdc**: +- "Do not solve environment or infrastructure problems by hardcoding workarounds in source code. Fix them at the environment/configuration level." +- "Before writing new infrastructure or workaround code, check how the existing codebase already handles the same concern. Follow established project patterns." diff --git a/.cursor/rules/tracker.mdc b/.cursor/rules/tracker.mdc new file mode 100644 index 0000000..375dbd9 --- /dev/null +++ b/.cursor/rules/tracker.mdc @@ -0,0 +1,14 @@ +--- +alwaysApply: true +--- + +# Work Item Tracker + +- Use **Jira** as the sole work item tracker (MCP server: `user-Jira-MCP-Server`) +- **NEVER** use Azure DevOps (ADO) MCP for any purpose — no reads, no writes, no queries +- Before interacting with any tracker, read this rule file first +- Jira cloud ID: `denyspopov.atlassian.net` +- Project key: `AZ` +- Project name: AZAION +- All task IDs follow the format `AZ-` +- Issue types: Epic, Story, Task, Bug, Subtask diff --git a/.cursor/skills/autopilot/flows/existing-code.md b/.cursor/skills/autopilot/flows/existing-code.md index 0e47f87..cbc6a96 100644 --- a/.cursor/skills/autopilot/flows/existing-code.md +++ b/.cursor/skills/autopilot/flows/existing-code.md @@ -217,22 +217,18 @@ After deployment completes, the existing-code workflow is done. **Re-Entry After Completion** Condition: the autopilot state shows `step: done` OR all steps through 13 (Deploy) are completed -Action: The project completed a full cycle. Present status and loop back to New Task: +Action: The project completed a full cycle. Print the status banner and automatically loop back to New Task — do NOT ask the user for confirmation: ``` ══════════════════════════════════════ PROJECT CYCLE COMPLETE ══════════════════════════════════════ The previous cycle finished successfully. - You can now add new functionality. -══════════════════════════════════════ - A) Add new features (start New Task) - B) Done — no more changes needed + Starting new feature cycle… ══════════════════════════════════════ ``` -- If user picks A → set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task). -- If user picks B → report final project status and exit. +Set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task). ## Auto-Chain Rules diff --git a/.cursor/skills/autopilot/state.md b/.cursor/skills/autopilot/state.md index 022ecda..33dd76f 100644 --- a/.cursor/skills/autopilot/state.md +++ b/.cursor/skills/autopilot/state.md @@ -41,7 +41,7 @@ retry_count: 3 ### State File Rules 1. **Create** on the first autopilot invocation (after state detection determines Step 1) -2. **Update** after every step completion, session boundary, or failed retry +2. **Update** after every change — this includes: batch completion, sub-step progress, step completion, session boundary, failed retry, or any meaningful state transition. The state file must always reflect the current reality. 3. **Read** as the first action on every invocation — before folder scanning 4. **Cross-check**: verify against actual `_docs/` folder contents. If they disagree, trust the folder structure and update the state file 5. **Never delete** the state file diff --git a/.cursor/skills/implement/SKILL.md b/.cursor/skills/implement/SKILL.md index 1039d01..9eb9554 100644 --- a/.cursor/skills/implement/SKILL.md +++ b/.cursor/skills/implement/SKILL.md @@ -94,6 +94,7 @@ For each task in the batch, launch an `implementer` subagent with: - List of files OWNED (exclusive write access) - List of files READ-ONLY - List of files FORBIDDEN +- **Explicit instruction**: the implementer must write or update tests that validate each acceptance criterion in the task spec. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still be written and skip with a clear reason. Launch all subagents immediately — no user confirmation. @@ -108,32 +109,44 @@ Launch all subagents immediately — no user confirmation. - Subagent has not produced new output for an extended period → flag as potentially hung - If a subagent is flagged as stuck, do NOT let it continue looping — stop it and record the blocker in the batch report -### 8. Code Review +### 8. AC Test Coverage Verification + +Before code review, verify that every acceptance criterion in each task spec has at least one test that validates it. For each task in the batch: + +1. Read the task spec's **Acceptance Criteria** section +2. Search the test files (new and existing) for tests that cover each AC +3. Classify each AC as: + - **Covered**: a test directly validates this AC (running or skipped-with-reason) + - **Not covered**: no test exists for this AC + +If any AC is **Not covered**: +- This is a **BLOCKING** failure — the implementer must write the missing test before proceeding +- Re-launch the implementer with the specific ACs that need tests +- If the test cannot run in the current environment (GPU required, platform-specific, external service), the test must still exist and skip with `pytest.mark.skipif` or `pytest.skip()` explaining the prerequisite +- A skipped test counts as **Covered** — the test exists and will run when the environment allows + +Only proceed to Step 9 when every AC has a corresponding test. + +### 9. Code Review - Run `/code-review` skill on the batch's changed files + corresponding task specs - The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL -### 9. Auto-Fix Gate +### 10. Auto-Fix Gate Auto-fix loop with bounded retries (max 2 attempts) before escalating to user: -1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 10 +1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 11 2. If verdict is **FAIL** (attempt 1 or 2): - Parse the code review findings (Critical and High severity items) - For each finding, attempt an automated fix using the finding's location, description, and suggestion - Re-run `/code-review` on the modified files - - If now PASS or PASS_WITH_WARNINGS → continue to step 10 + - If now PASS or PASS_WITH_WARNINGS → continue to step 11 - If still FAIL → increment retry counter, repeat from (2) up to max 2 attempts 3. If still **FAIL** after 2 auto-fix attempts: present all findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding. Track `auto_fix_attempts` count in the batch report for retrospective analysis. -### 10. Test - -- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices) -- Test failures are a **blocking gate** — do not proceed to commit until the test-run skill completes with a user decision -- Note: the autopilot also runs a separate full test suite after all implementation batches complete (greenfield Step 7, existing-code Steps 6/10). This is intentional — per-batch tests are regression checks, the post-implement run is final validation. - ### 11. Commit and Push - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS): @@ -152,7 +165,13 @@ Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`. ### 14. Loop - Go back to step 2 until all tasks in `todo/` are done -- When all tasks are complete, report final summary + +### 15. Final Test Run + +- After all batches are complete, run the full test suite once +- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices) +- Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision +- When tests pass, report final summary ## Batch Report Persistence @@ -177,10 +196,11 @@ After each batch, produce a structured report: ## Task Results -| Task | Status | Files Modified | Tests | Issues | -|------|--------|---------------|-------|--------| -| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] | +| Task | Status | Files Modified | Tests | AC Coverage | Issues | +|------|--------|---------------|-------|-------------|--------| +| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [N/N ACs covered] | [count or None] | +## AC Test Coverage: [All covered / X of Y covered] ## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS] ## Auto-Fix Attempts: [0/1/2] ## Stuck Agents: [count or None] @@ -195,7 +215,7 @@ After each batch, produce a structured report: | Implementer fails same approach 3+ times | Stop it, escalate to user | | Task blocked on external dependency (not in task list) | Report and skip | | File ownership conflict unresolvable | ASK user | -| Any test failure after a batch | Delegate to test-run skill — blocking gate | +| Test failure after final test run | Delegate to test-run skill — blocking gate | | All tasks complete | Report final summary, suggest final commit | | `_dependencies_table.md` missing | STOP — run `/decompose` first | @@ -203,7 +223,7 @@ After each batch, produce a structured report: Each batch commit serves as a rollback checkpoint. If recovery is needed: -- **Tests fail after a batch commit**: `git revert ` using the hash from the batch report in `_docs/03_implementation/` +- **Tests fail after final test run**: `git revert ` using hashes from the batch reports in `_docs/03_implementation/` - **Resuming after interruption**: Read `_docs/03_implementation/batch_*_report.md` files to determine which batches completed, then continue from the next batch - **Multiple consecutive batches fail**: Stop and escalate to user with links to batch reports and commit hashes @@ -212,4 +232,4 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed: - Never launch tasks whose dependencies are not yet completed - Never allow two parallel agents to write to the same file - If a subagent fails or is flagged as stuck, stop it and report — do not let it loop indefinitely -- Always run tests after each batch completes +- Always run the full test suite after all batches complete (step 15) diff --git a/.cursor/skills/new-task/SKILL.md b/.cursor/skills/new-task/SKILL.md index 23483b8..90f451b 100644 --- a/.cursor/skills/new-task/SKILL.md +++ b/.cursor/skills/new-task/SKILL.md @@ -129,7 +129,7 @@ The `` is a short kebab-case name derived from the feature descriptio ### Step 4: Codebase Analysis **Role**: Software architect -**Goal**: Determine where and how to insert the new functionality. +**Goal**: Determine where and how to insert the new functionality, and whether existing tests cover the new requirements. 1. Read the codebase documentation from DOCUMENT_DIR: - `architecture.md` — overall structure @@ -144,6 +144,10 @@ The `` is a short kebab-case name derived from the feature descriptio - What new interfaces or models are needed - How data flows through the change 4. If the change is complex enough, read the actual source files (not just docs) to verify insertion points +5. **Test coverage gap analysis**: Read existing test files that cover the affected components. For each acceptance criterion from Step 1, determine whether an existing test already validates it. Classify each AC as: + - **Covered**: an existing test directly validates this behavior + - **Partially covered**: an existing test exercises the code path but doesn't assert the new requirement + - **Not covered**: no existing test validates this behavior — a new test is required Present the analysis: @@ -156,9 +160,22 @@ Present the analysis: Interface changes: [list or "None"] New interfaces: [list or "None"] Data flow impact: [summary] + ───────────────────────────────────── + TEST COVERAGE GAP ANALYSIS + ───────────────────────────────────── + AC-1: [Covered / Partially covered / Not covered] + [existing test name or "needs new test"] + AC-2: [Covered / Partially covered / Not covered] + [existing test name or "needs new test"] + ... + ───────────────────────────────────── + New tests needed: [count] + Existing tests to update: [count or "None"] ══════════════════════════════════════ ``` +When gaps are found, the task spec (Step 6) MUST include the missing tests in the Scope (Included) section and the Unit/Blackbox Tests tables. Tests are not optional — if an AC is not covered by an existing test, the task must deliver a test for it. + --- ### Step 5: Validate Assumptions