Generalize tracker references, restructure refactor skill, and strengthen coding rules

- Replace all Jira-specific references with generic tracker/work-item terminology (TRACKER-ID, work item epics); delete project-management.mdc and mcp.json.example - Restructure refactor skill: extract 8 phases (00–07) and templates into separate files; add guided mode for pre-built change lists - Add Step 3 "Code Testability Revision" to existing-code workflow (renumber steps 3–12 → 3–13) - Simplify autopilot state file to minimal current-step pointer - Strengthen coding rules: AAA test comments per language, test failures as blocking gates, dependency install policy - Add Docker Suitability Assessment to test-spec and test-run skills (local vs Docker execution) - Narrow human-attention sound rule to human-input-needed only - Add AskQuestion fallback to plain text across skills - Rename FINAL_implementation_report to implementation_report_* - Simplify cursor-meta (remove _docs numbering table, quality thresholds) - Make techstackrule alwaysApply, add alwaysApply:false to openapi
2026-06-21 08:21:08 +00:00 · 2026-03-28 02:42:36 +02:00
parent 5be53739cd
commit d28b9584f2
47 changed files with 1248 additions and 884 deletions
@@ -1,13 +1,9 @@
-## Assumptions
-
- **Single project per workspace**: this system assumes one project per Cursor workspace. All `_docs/` paths are relative to the workspace root. For monorepos, open each service in its own Cursor workspace window.
-
 ## How to Use

 Type `/autopilot` to start or continue the full workflow. The orchestrator detects where your project is and picks up from there.

 ```
-/autopilot (or /auto)   — start a new project or continue where you left off
+/autopilot              — start a new project or continue where you left off
 ```

 If you want to run a specific skill directly (without the orchestrator), use the individual commands:
@@ -71,11 +67,11 @@ Interactive interview that builds `_docs/00_problem/`. Asks probing questions ac

 ### plan

-6-step planning workflow. Produces integration test specs, architecture, system flows, data model, deployment plan, component specs with interfaces, risk assessment, test specifications, and Jira epics. Heavy interaction at BLOCKING gates.
+6-step planning workflow. Produces integration test specs, architecture, system flows, data model, deployment plan, component specs with interfaces, risk assessment, test specifications, and work item epics. Heavy interaction at BLOCKING gates.

 ### decompose

-4-step task decomposition. Produces a bootstrap structure plan, atomic task specs per component, integration test tasks, and a cross-task dependency table. Each task gets a Jira ticket and is capped at 5 complexity points.
+4-step task decomposition. Produces a bootstrap structure plan, atomic task specs per component, integration test tasks, and a cross-task dependency table. Each task gets a work item ticket and is capped at 8 complexity points.

 ### implement

@@ -95,7 +91,7 @@ Multi-phase code review against task specs. Produces structured findings with ve

 ### security

-5-phase OWASP-based security audit: dependency scan, static analysis, OWASP Top 10 review, infrastructure review, consolidated report with severity-ranked findings. Integrated into autopilot as an optional step before deploy.
+OWASP-based security testing and audit.

 ### retrospective

@@ -120,7 +116,7 @@ Bottom-up codebase documentation. Analyzes existing code from modules through co
 1. /research                                     — solution drafts → _docs/01_solution/
   Run multiple times: Mode A → draft, Mode B → assess & revise

-2. /plan                                         — architecture, data model, deployment, components, risks, tests, Jira epics → _docs/02_document/
+2. /plan                                         — architecture, data model, deployment, components, risks, tests, epics → _docs/02_document/

 3. /decompose                                    — atomic task specs + dependency table → _docs/02_tasks/todo/

@@ -150,12 +146,15 @@ Or just use `/autopilot` to run steps 0-5 automatically.
 | **problem** | "problem", "define problem", "new project" | `_docs/00_problem/` |
 | **research** | "research", "investigate" | `_docs/01_solution/` |
 | **plan** | "plan", "decompose solution" | `_docs/02_document/` |
+| **test-spec** | "test spec", "blackbox tests", "test scenarios" | `_docs/02_document/tests/` + `scripts/` |
 | **decompose** | "decompose", "task decomposition" | `_docs/02_tasks/todo/` |
 | **implement** | "implement", "start implementation" | `_docs/03_implementation/` |
+| **test-run** | "run tests", "test suite", "verify tests" | Test results + verdict |
 | **code-review** | "code review", "review code" | Verdict: PASS / FAIL / PASS_WITH_WARNINGS |
-| **refactor** | "refactor", "improve code" | `_docs/04_refactoring/` |
-| **security** | "security audit", "OWASP", "vulnerability scan" | `_docs/05_security/` |
+| **new-task** | "new task", "add feature", "new functionality" | `_docs/02_tasks/todo/` |
 | **ui-design** | "design a UI", "mockup", "design system" | `_docs/02_document/ui_mockups/` |
+| **refactor** | "refactor", "improve code" | `_docs/04_refactoring/` |
+| **security** | "security audit", "OWASP" | `_docs/05_security/` |
 | **document** | "document", "document codebase", "reverse-engineer docs" | `_docs/02_document/` + `_docs/00_problem/` + `_docs/01_solution/` |
 | **deploy** | "deploy", "CI/CD", "observability" | `_docs/04_deploy/` |
 | **retrospective** | "retrospective", "retro" | `_docs/06_metrics/` |
@@ -169,6 +168,7 @@ Or just use `/autopilot` to run steps 0-5 automatically.
 ## Project Folder Structure

 ```
+_project.md                              — project-specific config (tracker type, project key, etc.)
 _docs/
 ├── _autopilot_state.md                  — autopilot orchestrator state (progress, decisions, session context)
 ├── 00_problem/                          — problem definition, restrictions, AC, input data
@@ -181,19 +181,22 @@ _docs/
 │   ├── risk_mitigations.md
 │   ├── components/[##]_[name]/          — description.md + tests.md per component
 │   ├── common-helpers/
-│   ├── integration_tests/               — environment, test data, functional, non-functional, traceability
+│   ├── tests/                           — environment, test data, blackbox, performance, resilience, security, traceability
 │   ├── deployment/                      — containerization, CI/CD, environments, observability, procedures
+│   ├── ui_mockups/                      — HTML+CSS mockups, DESIGN.md (ui-design skill)
 │   ├── diagrams/
 │   └── FINAL_report.md
-├── 02_tasks/                            — task workflow folders + _dependencies_table.md
-│   ├── _dependencies_table.md           — cross-task dependency graph (root level)
-│   ├── backlog/                         — parked tasks (not scheduled for implementation)
+├── 02_tasks/                            — task lifecycle folders + _dependencies_table.md
+│   ├── _dependencies_table.md
 │   ├── todo/                            — tasks ready for implementation
-│   └── done/                            — completed tasks (moved here by /implement)
-├── 03_implementation/                   — batch reports, FINAL report
+│   ├── backlog/                         — parked tasks (not scheduled yet)
+│   └── done/                            — completed/archived tasks
+├── 02_task_plans/                       — per-task research artifacts (new-task skill)
+├── 03_implementation/                   — batch reports, implementation_report_*.md
+│   └── reviews/                         — code review reports per batch
 ├── 04_deploy/                           — containerization, CI/CD, environments, observability, procedures, scripts
 ├── 04_refactoring/                      — baseline, discovery, analysis, execution, hardening
-├── 05_security/                         — dependency scan, static analysis, OWASP review, infrastructure, report
+├── 05_security/                         — dependency scan, SAST, OWASP review, security report
 └── 06_metrics/                          — retro_[YYYY-MM-DD].md
 ```

@@ -4,8 +4,6 @@ description: |
  Implements a single task from its spec file. Use when implementing tasks from _docs/02_tasks/todo/.
  Reads the task spec, analyzes the codebase, implements the feature with tests, and verifies acceptance criteria.
  Launched by the /implement skill as a subagent.
-category: build
-tags: [implementation, subagent, task-execution, testing, code-generation]
 ---

 You are a professional software developer implementing a single task.
@@ -13,7 +11,7 @@ You are a professional software developer implementing a single task.
 ## Input

 You receive from the `/implement` orchestrator:
- Path to a task spec file (e.g., `_docs/02_tasks/todo/[JIRA-ID]_[short_name].md`)
+- Path to a task spec file (e.g., `_docs/02_tasks/todo/[TRACKER-ID]_[short_name].md`)
 - Files OWNED (exclusive write access — only you may modify these)
 - Files READ-ONLY (shared interfaces, types — read but do not modify)
 - Files FORBIDDEN (other agents' owned files — do not touch)
@@ -58,7 +56,7 @@ Load context in this order, stopping when you have enough:
 4. If the task has a dependency on an unimplemented component, create a minimal interface mock
 5. Implement the feature following existing code conventions
 6. Implement error handling per the project's defined strategy
-7. Implement unit tests following Arrange/Act/Assert (AAA) pattern — mark each section with a comment in the current language's style (e.g., `// Arrange` in C#/Rust/JS, `# Arrange` in Python, `-- Arrange` in SQL)
+7. Implement unit tests (use Arrange / Act / Assert section comments in language-appropriate syntax)
 8. Implement integration tests — analyze existing tests, add to them or create new
 9. Run all tests, fix any failures
 10. Verify every acceptance criterion is satisfied — trace each AC with evidence
@@ -77,7 +75,7 @@ Report using this exact structure:
 ## Implementer Report: [task_name]

 **Status**: Done | Blocked | Partial
-**Task**: [JIRA-ID]_[short_name]
+**Task**: [TRACKER-ID]_[short_name]

 ### Acceptance Criteria
 | AC | Satisfied | Evidence |
@@ -1,30 +0,0 @@
-{
-  "_comment": "Copy to .cursor/mcp.json and fill in credentials. Do NOT commit the real mcp.json.",
-  "mcpServers": {
-    "Jira-MCP-Server": {
-      "url": "https://mcp.atlassian.com/v1/sse",
-      "_note": "Alternative to Azure DevOps. Used by /plan, /decompose, /implement, /new-task for work item tracking."
-    },
-    "AzureDevops": {
-      "command": "npx",
-      "args": ["-y", "@nicepkg/azure-devops-mcp@latest"],
-      "env": {
-        "AZURE_DEVOPS_ORG_URL": "https://dev.azure.com/YOUR_ORG",
-        "AZURE_DEVOPS_AUTH_METHOD": "pat",
-        "AZURE_DEVOPS_PAT": "YOUR_PAT_HERE",
-        "AZURE_DEVOPS_DEFAULT_PROJECT": "YOUR_PROJECT"
-      },
-      "_note": "Alternative to Jira. Used by /plan, /decompose, /implement, /new-task for work item tracking."
-    },
-    "playwright": {
-      "command": "npx",
-      "args": ["@anthropic/mcp-playwright"],
-      "_note": "Optional. Used by /ui-design for visual verification (screenshots, viewport testing)."
-    },
-    "context7": {
-      "command": "npx",
-      "args": ["-y", "@upstash/context7-mcp@latest"],
-      "_note": "Optional. Retrieves up-to-date library documentation."
-    }
-  }
-}
@@ -5,18 +5,21 @@ alwaysApply: true
 # Coding preferences
 - Always prefer simple solution
 - Generate concise code
- Do not put comments in the code
+- Do not put comments in the code, except in tests: every test must use the Arrange / Act / Assert pattern with language-appropriate comment syntax (`# Arrange` for Python, `// Arrange` for C#/Rust/JS/TS). Omit any section that is not needed (e.g. if there is no setup, skip Arrange; if act and assert are the same line, keep only Assert)
 - Do not put logs unless it is an exception, or was asked specifically
 - Do not put code annotations unless it was asked specifically 
 - Write code that takes into account the different environments: development, production
 - You are careful to make changes that are requested or you are confident the changes are well understood and related to the change being requested
 - Mocking data is needed only for tests, never mock data for dev or prod env
 - When you add new libraries or dependencies make sure you are using the same version of it as other parts of the code
+- When a test fails due to a missing dependency, install it — do not fake or stub the module system. For normal packages, add them to the project's dependency file (requirements-test.txt, package.json devDependencies, test csproj, etc.) and install. Only consider stubbing if the dependency is heavy (e.g. hardware-specific SDK, large native toolchain) — and even then, ask the user first before choosing to stub.

 - Focus on the areas of code relevant to the task
 - Do not touch code that is unrelated to the task
 - Always think about what other methods and areas of code might be affected by the code changes
- When you think you are done with changes, run tests and make sure they are not broken
+- When you think you are done with changes, run the full test suite. Every failure — including pre-existing ones, collection errors, and import errors — is a **blocking gate**. Never silently ignore, skip, or proceed past a failing test. On any failure, stop and ask the user to choose one of:
+  - **Investigate and fix** the failing test or source code
+  - **Remove the test** if it is obsolete or no longer relevant
 - Do not rename any databases or tables or table columns without confirmation. Avoid such renaming if possible.

 - Make sure we don't commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task
@@ -17,26 +17,11 @@ globs: [".cursor/**"]
 ## Agent Files (.cursor/agents/)
 - Must have `name` and `description` in frontmatter

-## _docs/ Directory Numbering Convention
-| Prefix | Content |
-|--------|---------|
-| `00_` | Problem definition, research artifacts |
-| `01_` | Solution drafts |
-| `02_` | Documentation, tasks |
-| `03_` | Implementation reports |
-| `04_` | Deploy, refactoring |
-| `05_` | Security audit |
-| `06_` | Metrics / retrospective |
+## User Interaction
+- Use the AskQuestion tool for structured choices (A/B/C/D) when available — it provides an interactive UI. Fall back to plain-text questions if the tool is unavailable.

-## Quality Thresholds (project-wide)
-| Metric | Target | Used by |
-|--------|--------|---------|
-| Test coverage (black-box) | >= 70% of acceptance criteria | test-spec |
-| Test coverage (refactor safety net) | >= 75% line, 90% critical paths | refactor |
-| CI pipeline coverage gate | >= 75% | deploy |
-
-## Work Item Tracker
-Skills reference Jira MCP by default. Azure DevOps MCP is an equal alternative. The autopilot protocols handle authentication for whichever is configured.
+## Execution Safety
+- Never run test suites, builds, Docker commands, or other long-running/resource-heavy/security-risky operations without asking the user first - unlsess it is explicilty stated in skill or agent, or user already asked to do so.

 ## Security
 - All `.cursor/` files must be scanned for hidden Unicode before committing (see cursor-security.mdc)
@@ -1,8 +1,8 @@
 ---
-description: "Git workflow: work on dev branch, commit message format with Jira IDs"
+description: "Git workflow: work on dev branch, commit message format with tracker IDs"
 alwaysApply: true
 ---
 # Git Workflow

 - Work on the `dev` branch
- Commit message format: `[JIRA-ID-1] [JIRA-ID-2] Summary of changes`
+- Commit message format: `[TRACKER-ID-1] [TRACKER-ID-2] Summary of changes`
@@ -1,12 +1,10 @@
 ---
-description: "Play a notification sound when the AI agent needs human input or when AI generation is finished"
+description: "Play a notification sound whenever the AI agent needs human input, confirmation, or approval"
 alwaysApply: true
 ---
-# Sound Notification for Human Attention
+# Sound Notification on Human Input

-Play a notification sound whenever human attention is needed. This includes waiting for input AND completing generation.
-
-## Commands by OS
+Whenever you are about to ask the user a question, request confirmation, present options for a decision, or otherwise pause and wait for human input, you MUST first run the appropriate shell command for the current OS:

 - **macOS**: `afplay /System/Library/Sounds/Glass.aiff &`
 - **Linux**: `paplay /usr/share/sounds/freedesktop/stereo/bell.oga 2>/dev/null || aplay /usr/share/sounds/freedesktop/stereo/bell.oga 2>/dev/null || echo -e '\a' &`
@@ -14,15 +12,13 @@ Play a notification sound whenever human attention is needed. This includes wait

 Detect the OS from the user's system info or by running `uname -s` if unknown.

-## When to play the sound
-
+This applies to:
 - Asking clarifying questions
 - Presenting choices (e.g. via AskQuestion tool)
 - Requesting approval for destructive actions
 - Reporting that you are blocked and need guidance
 - Any situation where the conversation will stall without user response
- **When AI generation is complete** — play the sound as the very last action before ending your turn, so the user knows the response is ready
+- Completing a task (final answer / deliverable ready for review)

-## When NOT to play the sound
-
- In the middle of executing a multi-step task and just providing a status update (more tool calls will follow)
+Do NOT play the sound when:
+- You are in the middle of executing a multi-step task and just providing a status update
@@ -1,6 +1,7 @@
 ---
 description: "OpenAPI/Swagger API documentation standards — applied when editing API spec files"
 globs: ["**/openapi*", "**/swagger*"]
+alwaysApply: false
 ---
 # OpenAPI

@@ -1,7 +0,0 @@
-# Project Management
-
- This project uses **Jira ONLY** for work item tracking (NOT Azure DevOps)
- Jira project key: `AZ` (AZAION)
- Jira cloud ID: `1598226f-845f-4705-bcd1-5ed0c82d6119`
- Use the `user-Jira-MCP-Server` MCP server for all Jira operations
- Never use Azure DevOps MCP for this project's work items
@@ -1,6 +1,6 @@
 ---
-description: "Technology stack preferences for new code: Postgres DB, .NET/Python/Rust backend, React/Tailwind frontend, OpenAPI for APIs. Apply when creating new projects, choosing frameworks, or making technology decisions."
-alwaysApply: false
+description: "Defines required technology choices: Postgres DB, .NET/Python/Rust backend, React/Tailwind frontend, OpenAPI for APIs"
+alwaysApply: true
 ---
 # Tech Stack
 - Prefer Postgres database, but ask user
@@ -4,7 +4,7 @@ globs: ["**/*test*", "**/*spec*", "**/*Test*", "**/tests/**", "**/test/**"]
 ---
 # Testing

- Structure every test with `//Arrange`, `//Act`, `//Assert` comments
+- Structure every test with Arrange / Act / Assert section comments using language-appropriate syntax (`# Arrange` for Python, `// Arrange` for C#/Rust/JS/TS)
 - One assertion per test when practical; name tests descriptively: `MethodName_Scenario_ExpectedResult`
 - Test boundary conditions, error paths, and happy paths
 - Use mocks only for external dependencies; prefer real implementations for internal code
@@ -24,7 +24,7 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det
 | `flows/greenfield.md` | Detection rules, step table, and auto-chain rules for new projects |
 | `flows/existing-code.md` | Detection rules, step table, and auto-chain rules for existing codebases |
 | `state.md` | State file format, rules, re-entry protocol, session boundaries |
-| `protocols.md` | User interaction, Jira MCP auth, choice format, error handling, status summary |
+| `protocols.md` | User interaction, tracker auth, choice format, error handling, status summary |

 **On every invocation**: read all four files above before executing any logic.

@@ -32,10 +32,10 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det

 - **Auto-chain**: when a skill completes, immediately start the next one — no pause between skills
 - **Only pause at decision points**: BLOCKING gates inside sub-skills are the natural pause points; do not add artificial stops between steps
- **State from disk**: all progress is persisted to `_docs/_autopilot_state.md` and cross-checked against `_docs/` folder structure
- **Rich re-entry**: on every invocation, read the state file for full context before continuing
+- **State from disk**: current step is persisted to `_docs/_autopilot_state.md` and cross-checked against `_docs/` folder structure
+- **Re-entry**: on every invocation, read the state file and cross-check against `_docs/` folders before continuing
 - **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here
- **Sound on pause**: follow `.cursor/rules/human-attention-sound.mdc` — play a notification sound before every pause that requires human input
+- **Sound on pause**: follow `.cursor/rules/human-attention-sound.mdc` — play a notification sound before every pause that requires human input (AskQuestion tool preferred for structured choices; fall back to plain text if unavailable)
 - **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically
 - **Single project per workspace**: all `_docs/` paths are relative to workspace root; for monorepos, each service needs its own Cursor workspace

@@ -43,10 +43,10 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det

 Determine which flow to use:

-1. If workspace has source code files **and** `_docs/` does not exist → **existing-code flow** (Pre-Step detection)
-2. If `_docs/_autopilot_state.md` exists and records Document in `Completed Steps` → **existing-code flow**
-3. If `_docs/_autopilot_state.md` exists and `step: done` AND workspace contains source code → **existing-code flow** (completed project re-entry — loops to New Task)
-4. Otherwise → **greenfield flow**
+1. If workspace has **no source code files** → **greenfield flow**
+2. If workspace has source code files **and** `_docs/` does not exist → **existing-code flow**
+3. If workspace has source code files **and** `_docs/` exists **and** `_docs/_autopilot_state.md` does not exist → **existing-code flow**
+4. If workspace has source code files **and** `_docs/_autopilot_state.md` exists → read the `flow` field from the state file and use that flow

 After selecting the flow, apply its detection rules (first match wins) to determine the current step.

@@ -65,7 +65,7 @@ Every invocation follows this sequence:
   a. Delegate to current skill (see Skill Delegation below)
   b. If skill returns FAILED → apply Skill Failure Retry Protocol (see protocols.md):
      - Auto-retry the same skill (failure may be caused by missing user input or environment issue)
-      - If 3 consecutive auto-retries fail → record in state file Blockers, warn user, stop auto-retry
+      - If 3 consecutive auto-retries fail → set status: failed, warn user, stop auto-retry
   c. When skill completes successfully → reset retry counter, update state file (rules in state.md)
   d. Re-detect next step from the active flow's detection rules
   e. If next skill is ready → auto-chain (go to 7a with next skill)
@@ -82,10 +82,26 @@ For each step, the delegation pattern is:
 3. Read the skill file: `.cursor/skills/[name]/SKILL.md`
 4. Execute the skill's workflow exactly as written, including all BLOCKING gates, self-verification checklists, save actions, and escalation rules. Update `sub_step` in state each time the sub-skill advances.
 5. If the skill **fails**: follow the Skill Failure Retry Protocol in `protocols.md` — increment `retry_count`, auto-retry up to 3 times, then escalate.
-6. When complete (success): reset `retry_count: 0`, mark step `completed`, record date + key outcome, add key decisions to state file, return to auto-chain rules (from active flow file)
+6. When complete (success): reset `retry_count: 0`, update state file to the next step with `status: not_started`, return to auto-chain rules (from active flow file)

 Do NOT modify, skip, or abbreviate any part of the sub-skill's workflow. The autopilot is a sequencer, not an optimizer.

+## State File Template
+
+The state file (`_docs/_autopilot_state.md`) is a minimal pointer — only the current step. Full format rules are in `state.md`.
+
+```markdown
+# Autopilot State
+
+## Current Step
+flow: [greenfield | existing-code]
+step: [number or "done"]
+name: [step name]
+status: [not_started / in_progress / completed / skipped / failed]
+sub_step: [0 or N — sub-skill phase name]
+retry_count: [0-3]
+```
+
 ## Trigger Conditions

 This skill activates when the user wants to:
@@ -1,6 +1,6 @@
 # Existing Code Workflow

-Workflow for projects with an existing codebase. Starts with documentation, produces test specs, decomposes and implements tests, verifies them, refactors with that safety net, then adds new functionality and deploys.
+Workflow for projects with an existing codebase. Starts with documentation, produces test specs, checks code testability (refactoring if needed), decomposes and implements tests, verifies them, refactors with that safety net, then adds new functionality and deploys.

 ## Step Reference Table

@@ -8,18 +8,19 @@ Workflow for projects with an existing codebase. Starts with documentation, prod
 |------|------|-----------|-------------------|
 | 1 | Document | document/SKILL.md | Steps 1–8 |
 | 2 | Test Spec | test-spec/SKILL.md | Phase 1a–1b |
-| 3 | Decompose Tests | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4 |
-| 4 | Implement Tests | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
-| 5 | Run Tests | test-run/SKILL.md | Steps 1–4 |
-| 6 | Refactor | refactor/SKILL.md | Phases 0–5 (6-phase method) |
-| 7 | New Task | new-task/SKILL.md | Steps 1–8 (loop) |
-| 8 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
-| 9 | Run Tests | test-run/SKILL.md | Steps 1–4 |
-| 10 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
-| 11 | Performance Test | (autopilot-managed) | Load/stress tests (optional) |
-| 12 | Deploy | deploy/SKILL.md | Step 1–7 |
+| 3 | Code Testability Revision | refactor/SKILL.md (guided mode) | Phases 0–7 (conditional) |
+| 4 | Decompose Tests | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4 |
+| 5 | Implement Tests | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
+| 6 | Run Tests | test-run/SKILL.md | Steps 1–4 |
+| 7 | Refactor | refactor/SKILL.md | Phases 0–7 (optional) |
+| 8 | New Task | new-task/SKILL.md | Steps 1–8 (loop) |
+| 9 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
+| 10 | Run Tests | test-run/SKILL.md | Steps 1–4 |
+| 11 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
+| 12 | Performance Test | (autopilot-managed) | Load/stress tests (optional) |
+| 13 | Deploy | deploy/SKILL.md | Step 1–7 |

-After Step 12, the existing-code workflow is complete.
+After Step 13, the existing-code workflow is complete.

 ## Detection Rules

@@ -35,7 +36,7 @@ Action: An existing codebase without documentation was detected. Read and execut
 ---

 **Step 2 — Test Spec**
-Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/tests/traceability-matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry)
+Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/tests/traceability-matrix.md` does not exist AND the autopilot state shows `step >= 2` (Document already ran)

 Action: Read and execute `.cursor/skills/test-spec/SKILL.md`

@@ -43,20 +44,51 @@ This step applies when the codebase was documented via the `/document` skill. Te

 ---

-**Step 3 — Decompose Tests**
-Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/todo/` does not exist or has no task files)
+**Step 3 — Code Testability Revision**
+Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND the autopilot state shows Test Spec (Step 2) is completed AND the autopilot state does NOT show Code Testability Revision (Step 3) as completed or skipped
+
+Action: Analyze the codebase against the test specs to determine whether the code can be tested as-is.
+
+1. Read `_docs/02_document/tests/traceability-matrix.md` and all test scenario files in `_docs/02_document/tests/`
+2. For each test scenario, check whether the code under test can be exercised in isolation. Look for:
+   - Hardcoded file paths or directory references
+   - Hardcoded configuration values (URLs, credentials, magic numbers)
+   - Global mutable state that cannot be overridden
+   - Tight coupling to external services without abstraction
+   - Missing dependency injection or non-configurable parameters
+   - Direct file system operations without path configurability
+   - Inline construction of heavy dependencies (models, clients)
+3. If ALL scenarios are testable as-is:
+   - Mark Step 3 as `completed` with outcome "Code is testable — no changes needed"
+   - Auto-chain to Step 4 (Decompose Tests)
+4. If testability issues are found:
+   - Create `_docs/04_refactoring/01-testability-refactoring/`
+   - Write `list-of-changes.md` in that directory using the refactor skill template (`.cursor/skills/refactor/templates/list-of-changes.md`), with:
+     - **Mode**: `guided`
+     - **Source**: `autopilot-testability-analysis`
+     - One change entry per testability issue found (change ID, file paths, problem, proposed change, risk, dependencies)
+   - Invoke the refactor skill in **guided mode**: read and execute `.cursor/skills/refactor/SKILL.md` with the `list-of-changes.md` as input
+   - The refactor skill will create RUN_DIR (`01-testability-refactoring`), create tasks in `_docs/02_tasks/todo/`, delegate to implement skill, and verify results
+   - Phase 3 (Safety Net) is automatically skipped by the refactor skill for testability runs
+   - After refactoring completes, mark Step 3 as `completed`
+   - Auto-chain to Step 4 (Decompose Tests)
+
+---
+
+**Step 4 — Decompose Tests**
+Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Step 3 (Code Testability Revision) is completed or skipped AND (`_docs/02_tasks/todo/` does not exist or has no test task files)

 Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will:
 1. Run Step 1t (test infrastructure bootstrap)
 2. Run Step 3 (blackbox test task decomposition)
 3. Run Step 4 (cross-verification against test coverage)

-If `_docs/02_tasks/todo/` has some task files already, the decompose skill's resumability handles it.
+If `_docs/02_tasks/` subfolders have some task files already (e.g., refactoring tasks from Step 3), the decompose skill's resumability handles it — it appends test tasks alongside existing tasks.

 ---

-**Step 4 — Implement Tests**
-Condition: `_docs/02_tasks/todo/` contains task files AND `_docs/02_tasks/_dependencies_table.md` exists AND the autopilot state shows Step 3 (Decompose Tests) is completed AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
+**Step 5 — Implement Tests**
+Condition: `_docs/02_tasks/todo/` contains task files AND `_dependencies_table.md` exists AND the autopilot state shows Step 4 (Decompose Tests) is completed AND `_docs/03_implementation/implementation_report_tests.md` does not exist

 Action: Read and execute `.cursor/skills/implement/SKILL.md`

@@ -66,8 +98,8 @@ If `_docs/03_implementation/` has batch reports, the implement skill detects com

 ---

-**Step 5 — Run Tests**
-Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 4 (Implement Tests) is completed AND the autopilot state does NOT show Step 5 (Run Tests) as completed
+**Step 6 — Run Tests**
+Condition: `_docs/03_implementation/implementation_report_tests.md` exists AND the autopilot state shows Step 5 (Implement Tests) is completed AND the autopilot state does NOT show Step 6 (Run Tests) as completed

 Action: Read and execute `.cursor/skills/test-run/SKILL.md`

@@ -75,19 +107,31 @@ Verifies the implemented test suite passes before proceeding to refactoring. The

 ---

-**Step 6 — Refactor**
-Condition: the autopilot state shows Step 5 (Run Tests) is completed AND `_docs/04_refactoring/FINAL_report.md` does not exist
+**Step 7 — Refactor (optional)**
+Condition: the autopilot state shows Step 6 (Run Tests) is completed AND the autopilot state does NOT show Step 7 (Refactor) as completed or skipped AND no `_docs/04_refactoring/` run folder contains a `FINAL_report.md` for a non-testability run

-Action: Read and execute `.cursor/skills/refactor/SKILL.md`
+Action: Present using Choose format:

-The refactor skill runs the full 6-phase method using the implemented tests as a safety net.
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Refactor codebase before adding new features?
+══════════════════════════════════════
+ A) Run refactoring (recommended if code quality issues were noted during documentation)
+ B) Skip — proceed directly to New Task
+══════════════════════════════════════
+ Recommendation: [A or B — base on whether documentation
+ flagged significant code smells, coupling issues, or
+ technical debt worth addressing before new development]
+══════════════════════════════════════
+```

-If `_docs/04_refactoring/` has phase reports, the refactor skill detects completed phases and continues.
+- If user picks A → Read and execute `.cursor/skills/refactor/SKILL.md` in automatic mode. The refactor skill creates a new run folder in `_docs/04_refactoring/` (e.g., `02-coupling-refactoring`), runs the full method using the implemented tests as a safety net. After completion, auto-chain to Step 8 (New Task).
+- If user picks B → Mark Step 7 as `skipped` in the state file, auto-chain to Step 8 (New Task).

 ---

-**Step 7 — New Task**
-Condition: the autopilot state shows Step 6 (Refactor) is completed AND the autopilot state does NOT show Step 7 (New Task) as completed
+**Step 8 — New Task**
+Condition: the autopilot state shows Step 7 (Refactor) is completed or skipped AND the autopilot state does NOT show Step 8 (New Task) as completed

 Action: Read and execute `.cursor/skills/new-task/SKILL.md`

@@ -95,26 +139,26 @@ The new-task skill interactively guides the user through defining new functional

 ---

-**Step 8 — Implement**
-Condition: the autopilot state shows Step 7 (New Task) is completed AND `_docs/02_tasks/todo/` contains task files AND `_docs/03_implementation/` does not contain a FINAL report covering the new tasks (check state for distinction between test implementation and feature implementation)
+**Step 9 — Implement**
+Condition: the autopilot state shows Step 8 (New Task) is completed AND `_docs/03_implementation/` does not contain an `implementation_report_*.md` file other than `implementation_report_tests.md` (the tests report from Step 5 is excluded from this check)

 Action: Read and execute `.cursor/skills/implement/SKILL.md`

-The implement skill reads the new tasks from `_docs/02_tasks/todo/` and implements them. Tasks already implemented in Step 4 are in `_docs/02_tasks/done/`.
+The implement skill reads the new tasks from `_docs/02_tasks/todo/` and implements them. Tasks already implemented in Step 5 are skipped (completed tasks have been moved to `done/`).

 If `_docs/03_implementation/` has batch reports from this phase, the implement skill detects completed tasks and continues.

 ---

-**Step 9 — Run Tests**
-Condition: the autopilot state shows Step 8 (Implement) is completed AND the autopilot state does NOT show Step 9 (Run Tests) as completed
+**Step 10 — Run Tests**
+Condition: the autopilot state shows Step 9 (Implement) is completed AND the autopilot state does NOT show Step 10 (Run Tests) as completed

 Action: Read and execute `.cursor/skills/test-run/SKILL.md`

 ---

-**Step 10 — Security Audit (optional)**
-Condition: the autopilot state shows Step 9 (Run Tests) is completed AND the autopilot state does NOT show Step 10 (Security Audit) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+**Step 11 — Security Audit (optional)**
+Condition: the autopilot state shows Step 10 (Run Tests) is completed AND the autopilot state does NOT show Step 11 (Security Audit) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Present using Choose format:

@@ -129,13 +173,13 @@ Action: Present using Choose format:
 ══════════════════════════════════════
 ```

- If user picks A → Read and execute `.cursor/skills/security/SKILL.md`. After completion, auto-chain to Step 11 (Performance Test).
- If user picks B → Mark Step 10 as `skipped` in the state file, auto-chain to Step 11 (Performance Test).
+- If user picks A → Read and execute `.cursor/skills/security/SKILL.md`. After completion, auto-chain to Step 12 (Performance Test).
+- If user picks B → Mark Step 11 as `skipped` in the state file, auto-chain to Step 12 (Performance Test).

 ---

-**Step 11 — Performance Test (optional)**
-Condition: the autopilot state shows Step 10 (Security Audit) is completed or skipped AND the autopilot state does NOT show Step 11 (Performance Test) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+**Step 12 — Performance Test (optional)**
+Condition: the autopilot state shows Step 11 (Security Audit) is completed or skipped AND the autopilot state does NOT show Step 12 (Performance Test) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Present using Choose format:

@@ -156,13 +200,13 @@ Action: Present using Choose format:
  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
  3. Present results vs acceptance criteria thresholds
  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
-  5. After completion, auto-chain to Step 12 (Deploy)
- If user picks B → Mark Step 11 as `skipped` in the state file, auto-chain to Step 12 (Deploy).
+  5. After completion, auto-chain to Step 13 (Deploy)
+- If user picks B → Mark Step 12 as `skipped` in the state file, auto-chain to Step 13 (Deploy).

 ---

-**Step 12 — Deploy**
-Condition: the autopilot state shows Step 9 (Run Tests) is completed AND (Step 10 is completed or skipped) AND (Step 11 is completed or skipped) AND (`_docs/04_deploy/` does not exist or is incomplete)
+**Step 13 — Deploy**
+Condition: the autopilot state shows Step 10 (Run Tests) is completed AND (Step 11 is completed or skipped) AND (Step 12 is completed or skipped) AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Read and execute `.cursor/skills/deploy/SKILL.md`

@@ -171,7 +215,7 @@ After deployment completes, the existing-code workflow is done.
 ---

 **Re-Entry After Completion**
-Condition: the autopilot state shows `step: done` OR all steps through 12 (Deploy) are completed
+Condition: the autopilot state shows `step: done` OR all steps through 13 (Deploy) are completed

 Action: The project completed a full cycle. Present status and loop back to New Task:

@@ -187,7 +231,7 @@ Action: The project completed a full cycle. Present status and loop back to New
 ══════════════════════════════════════
 ```

- If user picks A → set `step: 7`, `status: not_started` in the state file, then auto-chain to Step 7 (New Task). Previous cycle history stays in Completed Steps.
+- If user picks A → set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task).
 - If user picks B → report final project status and exit.

 ## Auto-Chain Rules
@@ -195,17 +239,18 @@ Action: The project completed a full cycle. Present status and loop back to New
 | Completed Step | Next Action |
 |---------------|-------------|
 | Document (1) | Auto-chain → Test Spec (2) |
-| Test Spec (2) | Auto-chain → Decompose Tests (3) |
-| Decompose Tests (3) | **Session boundary** — suggest new conversation before Implement Tests |
-| Implement Tests (4) | Auto-chain → Run Tests (5) |
-| Run Tests (5, all pass) | Auto-chain → Refactor (6) |
-| Refactor (6) | Auto-chain → New Task (7) |
-| New Task (7) | **Session boundary** — suggest new conversation before Implement |
-| Implement (8) | Auto-chain → Run Tests (9) |
-| Run Tests (9, all pass) | Auto-chain → Security Audit choice (10) |
-| Security Audit (10, done or skipped) | Auto-chain → Performance Test choice (11) |
-| Performance Test (11, done or skipped) | Auto-chain → Deploy (12) |
-| Deploy (12) | **Workflow complete** — existing-code flow done |
+| Test Spec (2) | Auto-chain → Code Testability Revision (3) |
+| Code Testability Revision (3) | Auto-chain → Decompose Tests (4) |
+| Decompose Tests (4) | **Session boundary** — suggest new conversation before Implement Tests |
+| Implement Tests (5) | Auto-chain → Run Tests (6) |
+| Run Tests (6, all pass) | Auto-chain → Refactor choice (7) |
+| Refactor (7, done or skipped) | Auto-chain → New Task (8) |
+| New Task (8) | **Session boundary** — suggest new conversation before Implement |
+| Implement (9) | Auto-chain → Run Tests (10) |
+| Run Tests (10, all pass) | Auto-chain → Security Audit choice (11) |
+| Security Audit (11, done or skipped) | Auto-chain → Performance Test choice (12) |
+| Performance Test (12, done or skipped) | Auto-chain → Deploy (13) |
+| Deploy (13) | **Workflow complete** — existing-code flow done |

 ## Status Summary Template

@@ -215,16 +260,17 @@ Action: The project completed a full cycle. Present status and loop back to New
 ═══════════════════════════════════════════════════
 Step 1   Document                 [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
 Step 2   Test Spec                [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 3   Decompose Tests     [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 4   Implement Tests     [DONE / IN PROGRESS (batch M) / NOT STARTED / FAILED (retry N/3)]
- Step 5   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 6   Refactor            [DONE / IN PROGRESS (phase N) / NOT STARTED / FAILED (retry N/3)]
- Step 7   New Task            [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 8   Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
- Step 9   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 10  Security Audit      [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 11  Performance Test    [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 12  Deploy              [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 3   Code Testability Rev.    [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 4   Decompose Tests          [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 5   Implement Tests          [DONE / IN PROGRESS (batch M) / NOT STARTED / FAILED (retry N/3)]
+ Step 6   Run Tests                [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 7   Refactor                 [DONE / SKIPPED / IN PROGRESS (phase N) / NOT STARTED / FAILED (retry N/3)]
+ Step 8   New Task                 [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 9   Implement                [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
+ Step 10  Run Tests                [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 11  Security Audit           [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 12  Performance Test         [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 13  Deploy                   [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
 ═══════════════════════════════════════════════════
 Current: Step N — Name
 SubStep: M — [sub-skill internal step name]
@@ -114,21 +114,21 @@ Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_documen

 Action: Read and execute `.cursor/skills/decompose/SKILL.md`

-If `_docs/02_tasks/todo/` has some task files already, the decompose skill's resumability handles it.
+If `_docs/02_tasks/` subfolders have some task files already, the decompose skill's resumability handles it.

 ---

 **Step 6 — Implement**
-Condition: `_docs/02_tasks/todo/` contains task files AND `_docs/02_tasks/_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
+Condition: `_docs/02_tasks/todo/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/` does not contain any `implementation_report_*.md` file

 Action: Read and execute `.cursor/skills/implement/SKILL.md`

-If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
+If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues. The FINAL report filename is context-dependent — see implement skill documentation for naming convention.

 ---

 **Step 7 — Run Tests**
-Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state does NOT show Step 7 (Run Tests) as completed AND (`_docs/04_deploy/` does not exist or is incomplete)
+Condition: `_docs/03_implementation/` contains an `implementation_report_*.md` file AND the autopilot state does NOT show Step 7 (Run Tests) as completed AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Read and execute `.cursor/skills/test-run/SKILL.md`

@@ -190,7 +190,7 @@ Action: Read and execute `.cursor/skills/deploy/SKILL.md`
 ---

 **Done**
-Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md)
+Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md, deploy_scripts.md)

 Action: Report project completion with summary. If the user runs autopilot again after greenfield completion, Flow Resolution rule 3 routes to the existing-code flow (re-entry after completion) so they can add new features.

@@ -46,18 +46,16 @@ Rules:
 2. Always include a recommendation with a brief justification
 3. Keep option descriptions to one line each
 4. If only 2 options make sense, use A/B only — do not pad with filler options
-5. Play the notification sound (per `human-attention-sound.mdc`) before presenting the choice
-6. Record every user decision in the state file's `Key Decisions` section
-7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
+5. Play the notification sound (per `.cursor/rules/human-attention-sound.mdc`) before presenting the choice
+6. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive

 ## Work Item Tracker Authentication

-Several workflow steps create work items (epics, tasks, links). The system supports **Jira MCP** and **Azure DevOps MCP** as interchangeable backends. Detect which is configured by listing available MCP servers.
+Several workflow steps create work items (epics, tasks, links). The system requires some task tracker MCP as interchangeable backend.

 ### Tracker Detection

-1. Check for available MCP servers: Jira MCP (`user-Jira-MCP-Server`) or Azure DevOps MCP (`user-AzureDevops`)
-2. If both are available, ask the user which to use (Choose format)
+1. If there is no task tracker MCP or it is not authorized, ask the user about it
 3. Record the choice in the state file: `tracker: jira` or `tracker: ado`
 4. If neither is available, set `tracker: local` and proceed without external tracking

@@ -124,16 +122,12 @@ Skill execution → FAILED
  │
  ├─ retry_count < 3 ?
  │    YES → increment retry_count in state file
-  │         → log failure reason in state file (Retry Log section)
  │         → re-read the sub-skill's SKILL.md
  │         → re-execute from the current sub_step
  │         → (loop back to check result)
  │
  │    NO (retry_count = 3) →
  │         → set status: failed in Current Step
-  │         → add entry to Blockers section:
-  │             "[Skill Name] failed 3 consecutive times at sub_step [M].
-  │              Last failure: [reason]. Auto-retry exhausted."
  │         → present warning to user (see Escalation below)
  │         → do NOT auto-retry again until user intervenes
 ```
@@ -143,18 +137,14 @@ Skill execution → FAILED
 1. **Auto-retry immediately**: when a skill fails, retry it without asking the user — the failure is often transient (missing user confirmation in a prior step, docker not running, file lock, etc.)
 2. **Preserve sub_step**: retry from the last recorded `sub_step`, not from the beginning of the skill — unless the failure indicates corruption, in which case restart from sub_step 1
 3. **Increment `retry_count`**: update `retry_count` in the state file's `Current Step` section on each retry attempt
-4. **Log each failure**: append the failure reason and timestamp to the state file's `Retry Log` section
-5. **Reset on success**: when the skill eventually succeeds, reset `retry_count: 0` and clear the `Retry Log` for that step
+4. **Reset on success**: when the skill eventually succeeds, reset `retry_count: 0`

 ### Escalation (after 3 consecutive failures)

 After 3 failed auto-retries of the same skill, the failure is likely not user-related. Stop retrying and escalate:

-1. Update the state file:
-   - Set `status: failed` in `Current Step`
-   - Set `retry_count: 3`
-   - Add a blocker entry describing the repeated failure
-2. Play notification sound (per `human-attention-sound.mdc`)
+1. Update the state file: set `status: failed` and `retry_count: 3` in `Current Step`
+2. Play notification sound (per `.cursor/rules/human-attention-sound.mdc`)
 3. Present using Choose format:

 ```
@@ -215,9 +205,8 @@ When executing a sub-skill, monitor for these signals:

 If the same autopilot step fails 3 consecutive times across conversations:

- Record the failure pattern in the state file's `Blockers` section
 - Do NOT auto-retry on next invocation
- Present the blocker and ask user for guidance before attempting again
+- Present the failure pattern and ask user for guidance before attempting again

 ## Context Management Protocol

@@ -304,11 +293,73 @@ For steps that produce `_docs/` artifacts (problem, research, plan, decompose, d
 3. **Git safety net**: artifacts are committed with each autopilot step completion. To roll back: `git log --oneline _docs/` to find the commit, then `git checkout <commit> -- _docs/<folder>/`
 4. **State file rollback**: when rolling back artifacts, also update `_docs/_autopilot_state.md` to reflect the rolled-back step (set it to `in_progress`, clear completed date)

+## Debug / Error Recovery Protocol
+
+When the implement skill's auto-fix loop fails (code review FAIL after 2 auto-fix attempts) or an implementer subagent reports a blocker, the user is asked to intervene. This protocol guides the recovery process.
+
+### Structured Debugging Workflow
+
+When escalated to the user after implementation failure:
+
+1. **Classify the failure** — determine the category:
+   - **Missing dependency**: a package, service, or module the task needs but isn't available
+   - **Logic error**: code runs but produces wrong results (assertion failures, incorrect output)
+   - **Integration mismatch**: interfaces between components don't align (type errors, missing methods, wrong signatures)
+   - **Environment issue**: Docker, database, network, or configuration problem
+   - **Spec ambiguity**: the task spec is unclear or contradictory
+
+2. **Reproduce** — isolate the failing behavior:
+   - Run the specific failing test(s) in isolation
+   - Check whether the failure is deterministic or intermittent
+   - Capture the exact error message, stack trace, and relevant file:line
+
+3. **Narrow scope** — focus on the minimal reproduction:
+   - For logic errors: trace the data flow from input to the point of failure
+   - For integration mismatches: compare the caller's expectations against the callee's actual interface
+   - For environment issues: verify Docker services are running, DB is accessible, env vars are set
+
+4. **Fix and verify** — apply the fix and confirm:
+   - Make the minimal change that fixes the root cause
+   - Re-run the failing test(s) to confirm the fix
+   - Run the full test suite to check for regressions
+   - If the fix changes a shared interface, check all consumers
+
+5. **Report** — update the batch report with:
+   - Root cause category
+   - Fix applied (file:line, description)
+   - Tests that now pass
+
+### Common Recovery Patterns
+
+| Failure Pattern | Typical Root Cause | Recovery Action |
+|----------------|-------------------|----------------|
+| ImportError / ModuleNotFoundError | Missing dependency or wrong path | Install dependency or fix import path |
+| TypeError on method call | Interface mismatch between tasks | Align caller with callee's actual signature |
+| AssertionError in test | Logic bug or wrong expected value | Fix logic or update test expectations |
+| ConnectionRefused | Service not running | Start Docker services, check docker-compose |
+| Timeout | Blocking I/O or infinite loop | Add timeout, fix blocking call |
+| FileNotFoundError | Hardcoded path or missing fixture | Make path configurable, add fixture |
+
+### Escalation
+
+If debugging does not resolve the issue after 2 focused attempts:
+
+```
+══════════════════════════════════════
+ DEBUG ESCALATION: [failure description]
+══════════════════════════════════════
+ Root cause category: [category]
+ Attempted fixes: [list]
+ Current state: [what works, what doesn't]
+══════════════════════════════════════
+ A) Continue debugging with more context
+ B) Revert this batch and skip the task (move to backlog)
+ C) Simplify the task scope and retry
+══════════════════════════════════════
+```
+
 ## Status Summary

 On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback). Use the Status Summary Template from the active flow file (`flows/greenfield.md` or `flows/existing-code.md`).

-For re-entry (state file exists), also include:
- Key decisions from the state file's `Key Decisions` section
- Last session context from the `Last Session` section
- Any blockers from the `Blockers` section
+For re-entry (state file exists), cross-check the current step against `_docs/` folder structure and present any `status: failed` state to the user before continuing.
@@ -2,81 +2,52 @@

 ## State File: `_docs/_autopilot_state.md`

-The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
+The autopilot persists its position to `_docs/_autopilot_state.md`. This is a lightweight pointer — only the current step. All history lives in `_docs/` artifacts and git log. Folder scanning is the fallback when the state file doesn't exist.

-### Format
+### Template

 ```markdown
 # Autopilot State

 ## Current Step
 flow: [greenfield | existing-code]
-step: [1-10 for greenfield, 1-12 for existing-code, or "done"]
+step: [1-10 for greenfield, 1-13 for existing-code, or "done"]
 name: [step name from the active flow's Step Reference Table]
 status: [not_started / in_progress / completed / skipped / failed]
-sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
-retry_count: [0-3 — number of consecutive auto-retry attempts for current step, reset to 0 on success]
+sub_step: [0, or sub-skill internal step number + name if interrupted mid-step]
+retry_count: [0-3 — consecutive auto-retry attempts, reset to 0 on success]
+```

-When updating `Current Step`, always write it as:
-  flow: existing-code   ← active flow
-  step: N               ← autopilot step (sequential integer)
-  sub_step: M           ← sub-skill's own internal step/phase number + name
-  retry_count: 0        ← reset on new step or success; increment on each failed retry
-Example:
+### Examples
+
+```
 flow: greenfield
 step: 3
 name: Plan
 status: in_progress
 sub_step: 4 — Architecture Review & Risk Assessment
 retry_count: 0
-Example (failed after 3 retries):
+```
+
+```
 flow: existing-code
 step: 2
 name: Test Spec
 status: failed
 sub_step: 1b — Test Case Generation
 retry_count: 3
-
-## Completed Steps
-
-| Step | Name | Completed | Key Outcome |
-|------|------|-----------|-------------|
-| 1 | [name] | [date] | [one-line summary] |
-| 2 | [name] | [date] | [one-line summary] |
-| ... | ... | ... | ... |
-
-## Key Decisions
- [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
- [decision N]
-
-## Last Session
-date: [date]
-ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
-reason: [completed step / session boundary / user paused / context limit]
-notes: [any context for next session]
-
-## Retry Log
-| Attempt | Step | Name | SubStep | Failure Reason | Timestamp |
-|---------|------|------|---------|----------------|-----------|
-| 1 | [step] | [name] | [sub_step] | [reason] | [date-time] |
-| ... | ... | ... | ... | ... | ... |
-
-(Clear this table when the step succeeds or user resets. Append a row on each failed auto-retry.)
-
-## Blockers
- [blocker 1, if any]
- [none]
 ```

 ### State File Rules

-1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 1)
-2. **Update** the state file after every step completion, every session boundary, every BLOCKING gate confirmation, and every failed retry attempt
-3. **Read** the state file as the first action on every invocation — before folder scanning
-4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 3 but `_docs/02_document/architecture.md` already exists), trust the folder structure and update the state file to match
-5. **Never delete** the state file. It accumulates history across the entire project lifecycle
-6. **Retry tracking**: increment `retry_count` on each failed auto-retry; reset to `0` when the step succeeds or the user manually resets. If `retry_count` reaches 3, set `status: failed` and add an entry to `Blockers`
-7. **Failed state on re-entry**: if the state file shows `status: failed` with `retry_count: 3`, do NOT auto-retry — present the blocker to the user and wait for their decision before proceeding
+1. **Create** on the first autopilot invocation (after state detection determines Step 1)
+2. **Update** after every step completion, session boundary, or failed retry
+3. **Read** as the first action on every invocation — before folder scanning
+4. **Cross-check**: verify against actual `_docs/` folder contents. If they disagree, trust the folder structure and update the state file
+5. **Never delete** the state file
+6. **Retry tracking**: increment `retry_count` on each failed auto-retry; reset to `0` on success. If `retry_count` reaches 3, set `status: failed`
+7. **Failed state on re-entry**: if `status: failed` with `retry_count: 3`, do NOT auto-retry — present the issue to the user first
+8. **Skill-internal state**: when the active skill maintains its own state file (e.g., document skill's `_docs/02_document/state.json`), the autopilot's `sub_step` field should reflect the skill's internal progress. On re-entry, cross-check the skill's state file against the autopilot's `sub_step` for consistency.

 ## State Detection

@@ -92,8 +63,8 @@ When the user invokes `/autopilot` and work already exists:

 1. Read `_docs/_autopilot_state.md`
 2. Cross-check against `_docs/` folder structure
-3. Present Status Summary with context from state file (key decisions, last session, blockers)
-4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
+3. Present Status Summary (use the active flow's Status Summary Template)
+4. If the detected step has a sub-skill with built-in resumability, the sub-skill handles mid-step recovery
 5. Continue execution from detected state

 ## Session Boundaries
@@ -101,12 +72,11 @@ When the user invokes `/autopilot` and work already exists:
 After any decompose/planning step completes, **do not auto-chain to implement**. Instead:

 1. Update state file: mark the step as completed, set current step to the next implement step with status `not_started`
-   - Existing-code flow: After Step 3 (Decompose Tests) → set current step to 4 (Implement Tests)
-   - Existing-code flow: After Step 7 (New Task) → set current step to 8 (Implement)
+   - Existing-code flow: After Step 4 (Decompose Tests) → set current step to 5 (Implement Tests)
+   - Existing-code flow: After Step 8 (New Task) → set current step to 9 (Implement)
   - Greenfield flow: After Step 5 (Decompose) → set current step to 6 (Implement)
-2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
-3. Present a summary: number of tasks, estimated batches, total complexity points
-4. Use Choose format:
+2. Present a summary: number of tasks, estimated batches, total complexity points
+3. Use Choose format:

 ```
 ══════════════════════════════════════
@@ -27,7 +27,7 @@ Multi-phase code review that verifies implementation against task specs, checks

 ## Input

- List of task spec files that were just implemented (paths to `[JIRA-ID]_[short_name].md`)
+- List of task spec files that were just implemented (paths to `[TRACKER-ID]_[short_name].md`)
 - Changed files (detected via `git diff` or provided by the `/implement` skill)
 - Project context: `_docs/00_problem/restrictions.md`, `_docs/01_solution/solution.md`

@@ -10,23 +10,23 @@ description: |
  - "prepare for implementation"
  - "decompose tests", "test decomposition"
 category: build
-tags: [decomposition, tasks, dependencies, jira, implementation-prep]
+tags: [decomposition, tasks, dependencies, work-items, implementation-prep]
 disable-model-invocation: true
 ---

 # Task Decomposition

-Decompose planned components into atomic, implementable task specs with a bootstrap structure plan through a systematic workflow. All tasks are named with their Jira ticket ID prefix in a flat directory.
+Decompose planned components into atomic, implementable task specs with a bootstrap structure plan through a systematic workflow. All tasks are named with their work item tracker ID prefix in a flat directory.

 ## Core Principles

- **Atomic tasks**: each task does one thing; if it exceeds 5 complexity points, split it
+- **Atomic tasks**: each task does one thing; if it exceeds 8 complexity points, split it
 - **Behavioral specs, not implementation plans**: describe what the system should do, not how to build it
- **Flat structure**: all tasks are Jira-ID-prefixed files in TASKS_DIR (`todo/`) — no component subdirectories within workflow folders
+- **Flat structure**: all tasks are tracker-ID-prefixed files in TASKS_DIR — no component subdirectories
 - **Save immediately**: write artifacts to disk after each task; never accumulate unsaved work
- **Jira inline**: create Jira ticket immediately after writing each task file
+- **Tracker inline**: create work item ticket immediately after writing each task file
 - **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
- **Plan, don't code**: this workflow produces documents and Jira tasks, never implementation code
+- **Plan, don't code**: this workflow produces documents and work item tickets, never implementation code

 ## Context Resolution

@@ -34,26 +34,23 @@ Determine the operating mode based on invocation before any other logic runs.

 **Default** (no explicit input file provided):
 - DOCUMENT_DIR: `_docs/02_document/`
- TASKS_DIR: `_docs/02_tasks/todo/`
- TASKS_ROOT: `_docs/02_tasks/`
- DEPS_TABLE: `_docs/02_tasks/_dependencies_table.md`
+- TASKS_DIR: `_docs/02_tasks/`
+- TASKS_TODO: `_docs/02_tasks/todo/`
 - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, DOCUMENT_DIR
 - Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (blackbox tests) + Step 4 (cross-verification)

 **Single component mode** (provided file is within `_docs/02_document/` and inside a `components/` subdirectory):
 - DOCUMENT_DIR: `_docs/02_document/`
- TASKS_DIR: `_docs/02_tasks/todo/`
- TASKS_ROOT: `_docs/02_tasks/`
- DEPS_TABLE: `_docs/02_tasks/_dependencies_table.md`
+- TASKS_DIR: `_docs/02_tasks/`
+- TASKS_TODO: `_docs/02_tasks/todo/`
 - Derive component number and component name from the file path
 - Ask user for the parent Epic ID
 - Runs Step 2 (that component only, appending to existing task numbering)

 **Tests-only mode** (provided file/directory is within `tests/`, or `DOCUMENT_DIR/tests/` exists and input explicitly requests test decomposition):
 - DOCUMENT_DIR: `_docs/02_document/`
- TASKS_DIR: `_docs/02_tasks/todo/`
- TASKS_ROOT: `_docs/02_tasks/`
- DEPS_TABLE: `_docs/02_tasks/_dependencies_table.md`
+- TASKS_DIR: `_docs/02_tasks/`
+- TASKS_TODO: `_docs/02_tasks/todo/`
 - TESTS_DIR: `DOCUMENT_DIR/tests/`
 - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR
 - Runs Step 1t (test infrastructure bootstrap) + Step 3 (blackbox test decomposition) + Step 4 (cross-verification against test coverage)
@@ -105,8 +102,8 @@ Announce the detected mode and resolved paths to the user before proceeding.

 **Default:**
 1. DOCUMENT_DIR contains `architecture.md` and `components/` — **STOP if missing**
-2. Create TASKS_ROOT subfolders (`todo/`, `backlog/`, `done/`) if they do not exist
-3. If TASKS_DIR (`todo/`) already contains task files, ask user: **resume from last checkpoint or start fresh?**
+2. Create TASKS_DIR and TASKS_TODO if they do not exist
+3. If TASKS_DIR subfolders (`todo/`, `backlog/`, `done/`) already contain task files, ask user: **resume from last checkpoint or start fresh?**

 **Single component mode:**
 1. The provided component file exists and is non-empty — **STOP if missing**
@@ -114,44 +111,42 @@ Announce the detected mode and resolved paths to the user before proceeding.
 **Tests-only mode:**
 1. `TESTS_DIR/blackbox-tests.md` exists and is non-empty — **STOP if missing**
 2. `TESTS_DIR/environment.md` exists — **STOP if missing**
-3. Create TASKS_ROOT subfolders (`todo/`, `backlog/`, `done/`) if they do not exist
-4. If TASKS_DIR (`todo/`) already contains task files, ask user: **resume from last checkpoint or start fresh?**
+3. Create TASKS_DIR and TASKS_TODO if they do not exist
+4. If TASKS_DIR subfolders (`todo/`, `backlog/`, `done/`) already contain task files, ask user: **resume from last checkpoint or start fresh?**

 ## Artifact Management

 ### Directory Structure

 ```
-_docs/02_tasks/
+TASKS_DIR/
 ├── _dependencies_table.md
-├── backlog/
 ├── todo/
-│   ├── [JIRA-ID]_initial_structure.md
-│   ├── [JIRA-ID]_[short_name].md
+│   ├── [TRACKER-ID]_initial_structure.md
+│   ├── [TRACKER-ID]_[short_name].md
 │   └── ...
+├── backlog/
 └── done/
 ```

-New task files are written to `todo/`. The `/implement` skill moves them to `done/` after successful implementation. Users can move tasks to `backlog/` to defer them.
-
-**Naming convention**: Each task file is initially saved with a temporary numeric prefix (`[##]_[short_name].md`). After creating the Jira ticket, rename the file to use the Jira ticket ID as prefix (`[JIRA-ID]_[short_name].md`). For example: `01_initial_structure.md` → `AZ-42_initial_structure.md`.
+**Naming convention**: Each task file is initially saved in `TASKS_TODO/` with a temporary numeric prefix (`[##]_[short_name].md`). After creating the work item ticket, rename the file to use the work item ticket ID as prefix (`[TRACKER-ID]_[short_name].md`). For example: `todo/01_initial_structure.md` → `todo/AZ-42_initial_structure.md`.

 ### Save Timing

 | Step | Save immediately after | Filename |
 |------|------------------------|----------|
-| Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` |
-| Step 1t | Test infrastructure bootstrap complete + Jira ticket created + file renamed | `[JIRA-ID]_test_infrastructure.md` |
-| Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
-| Step 3 | Each blackbox test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
+| Step 1 | Bootstrap structure plan complete + work item ticket created + file renamed | `todo/[TRACKER-ID]_initial_structure.md` |
+| Step 1t | Test infrastructure bootstrap complete + work item ticket created + file renamed | `todo/[TRACKER-ID]_test_infrastructure.md` |
+| Step 2 | Each component task decomposed + work item ticket created + file renamed | `todo/[TRACKER-ID]_[short_name].md` |
+| Step 3 | Each blackbox test task decomposed + work item ticket created + file renamed | `todo/[TRACKER-ID]_[short_name].md` |
 | Step 4 | Cross-task verification complete | `_dependencies_table.md` |

 ### Resumability

-If task files already exist (in `todo/`, `backlog/`, or `done/`):
+If TASKS_DIR subfolders already contain task files:

-1. List existing `*_*.md` files across all three subfolders (excluding `_dependencies_table.md`) and count them
-2. Resume numbering from the next number (for temporary numeric prefix before Jira rename)
+1. List existing `*_*.md` files across `todo/`, `backlog/`, and `done/` (excluding `_dependencies_table.md`) and count them
+2. Resume numbering from the next number (for temporary numeric prefix before tracker rename)
 3. Inform the user which tasks already exist and are being skipped

 ## Progress Tracking
@@ -186,11 +181,11 @@ The test infrastructure bootstrap must include:
 - [ ] Test runner configuration matches the consumer app tech stack from environment.md
 - [ ] Data isolation strategy is defined

-**Save action**: Write `01_test_infrastructure.md` (temporary numeric name)
+**Save action**: Write `todo/01_test_infrastructure.md` (temporary numeric name)

-**Jira action**: Create a Jira ticket for this task under the "Blackbox Tests" epic. Write the Jira ticket ID and Epic ID back into the task header.
+**Tracker action**: Create a work item ticket for this task under the "Blackbox Tests" epic. Write the work item ticket ID and Epic ID back into the task header.

-**Rename action**: Rename the file from `01_test_infrastructure.md` to `[JIRA-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename.
+**Rename action**: Rename the file from `todo/01_test_infrastructure.md` to `todo/[TRACKER-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename.

 **BLOCKING**: Present test infrastructure plan summary to user. Do NOT proceed until user confirms.

@@ -234,11 +229,11 @@ The bootstrap structure plan must include:
 - [ ] Environment strategy covers dev, staging, production
 - [ ] Test structure includes unit and blackbox test locations

-**Save action**: Write `01_initial_structure.md` (temporary numeric name)
+**Save action**: Write `todo/01_initial_structure.md` (temporary numeric name)

-**Jira action**: Create a Jira ticket for this task under the "Bootstrap & Initial Structure" epic. Write the Jira ticket ID and Epic ID back into the task header.
+**Tracker action**: Create a work item ticket for this task under the "Bootstrap & Initial Structure" epic. Write the work item ticket ID and Epic ID back into the task header.

-**Rename action**: Rename the file from `01_initial_structure.md` to `[JIRA-ID]_initial_structure.md` (e.g., `AZ-42_initial_structure.md`). Update the **Task** field inside the file to match the new filename.
+**Rename action**: Rename the file from `todo/01_initial_structure.md` to `todo/[TRACKER-ID]_initial_structure.md` (e.g., `todo/AZ-42_initial_structure.md`). Update the **Task** field inside the file to match the new filename.

 **BLOCKING**: Present structure plan summary to user. Do NOT proceed until user confirms.

@@ -262,19 +257,19 @@ For each component (or the single provided component):
 4. Do not create tasks for other components — only tasks for the current component
 5. Each task should be atomic, containing 0 APIs or a list of semantically connected APIs
 6. Write each task spec using `templates/task.md`
-7. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
-8. Note task dependencies (referencing Jira IDs of already-created dependency tasks, e.g., `AZ-42_initial_structure`)
-9. **Immediately after writing each task file**: create a Jira ticket, link it to the component's epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
+7. Estimate complexity per task (1, 2, 3, 5, 8 points); no task should exceed 8 points — split if it does
+8. Note task dependencies (referencing tracker IDs of already-created dependency tasks, e.g., `AZ-42_initial_structure`)
+9. **Immediately after writing each task file**: create a work item ticket, link it to the component's epic, write the work item ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[TRACKER-ID]_[short_name].md`.

 **Self-verification** (per component):
 - [ ] Every task is atomic (single concern)
- [ ] No task exceeds 5 complexity points
- [ ] Task dependencies reference correct Jira IDs
+- [ ] No task exceeds 8 complexity points
+- [ ] Task dependencies reference correct tracker IDs
 - [ ] Tasks cover all interfaces defined in the component spec
 - [ ] No tasks duplicate work from other components
- [ ] Every task has a Jira ticket linked to the correct epic
+- [ ] Every task has a work item ticket linked to the correct epic

-**Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename the file to `[JIRA-ID]_[short_name].md`. Update the **Task** field inside the file to match the new filename. Update **Dependencies** references in the file to use Jira IDs of the dependency tasks.
+**Save action**: Write each `todo/[##]_[short_name].md` (temporary numeric name), create work item ticket inline, then rename to `todo/[TRACKER-ID]_[short_name].md`. Update the **Task** field inside the file to match the new filename. Update **Dependencies** references in the file to use tracker IDs of the dependency tasks.

 ---

@@ -295,18 +290,18 @@ For each component (or the single provided component):
   - In default mode: blackbox test tasks depend on the component implementation tasks they exercise
   - In tests-only mode: blackbox test tasks depend on the test infrastructure bootstrap task (Step 1t)
 5. Write each task spec using `templates/task.md`
-6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
-7. Note task dependencies (referencing Jira IDs of already-created dependency tasks)
-8. **Immediately after writing each task file**: create a Jira ticket under the "Blackbox Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
+6. Estimate complexity per task (1, 2, 3, 5, 8 points); no task should exceed 8 points — split if it does
+7. Note task dependencies (referencing tracker IDs of already-created dependency tasks)
+8. **Immediately after writing each task file**: create a work item ticket under the "Blackbox Tests" epic, write the work item ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[TRACKER-ID]_[short_name].md`.

 **Self-verification**:
 - [ ] Every scenario from `tests/blackbox-tests.md` is covered by a task
 - [ ] Every scenario from `tests/performance-tests.md`, `tests/resilience-tests.md`, `tests/security-tests.md`, and `tests/resource-limit-tests.md` is covered by a task
- [ ] No task exceeds 5 complexity points
+- [ ] No task exceeds 8 complexity points
 - [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode)
- [ ] Every task has a Jira ticket linked to the "Blackbox Tests" epic
+- [ ] Every task has a work item ticket linked to the "Blackbox Tests" epic

-**Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`.
+**Save action**: Write each `todo/[##]_[short_name].md` (temporary numeric name), create work item ticket inline, then rename to `todo/[TRACKER-ID]_[short_name].md`.

 ---

@@ -348,23 +343,23 @@ Tests-only mode:

 - **Coding during decomposition**: this workflow produces specs, never code
 - **Over-splitting**: don't create many tasks if the component is simple — 1 task is fine
- **Tasks exceeding 5 points**: split them; no task should be too complex for a single implementer
+- **Tasks exceeding 8 points**: split them; no task should be too complex for a single implementer
 - **Cross-component tasks**: each task belongs to exactly one component
 - **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
 - **Creating git branches**: branch creation is an implementation concern, not a decomposition one
- **Creating component subdirectories**: all tasks go flat in TASKS_DIR (`todo/`)
- **Forgetting Jira**: every task must have a Jira ticket created inline — do not defer to a separate step
- **Forgetting to rename**: after Jira ticket creation, always rename the file from numeric prefix to Jira ID prefix
+- **Creating component subdirectories**: all tasks go flat in `TASKS_TODO/`
+- **Forgetting tracker**: every task must have a work item ticket created inline — do not defer to a separate step
+- **Forgetting to rename**: after work item ticket creation, always rename the file from numeric prefix to tracker ID prefix

 ## Escalation Rules

 | Situation | Action |
 |-----------|--------|
 | Ambiguous component boundaries | ASK user |
-| Task complexity exceeds 5 points after splitting | ASK user |
+| Task complexity exceeds 8 points after splitting | ASK user |
 | Missing component specs in DOCUMENT_DIR | ASK user |
 | Cross-component dependency conflict | ASK user |
-| Jira epic not found for a component | ASK user for Epic ID |
+| Tracker epic not found for a component | ASK user for Epic ID |
 | Task naming | PROCEED, confirm at next BLOCKING gate |

 ## Methodology Quick Reference
@@ -376,24 +371,24 @@ Tests-only mode:
 │ CONTEXT: Resolve mode (default / single component / tests-only)│
 │                                                                │
 │ DEFAULT MODE:                                                   │
-│  1.  Bootstrap Structure  → [JIRA-ID]_initial_structure.md     │
+│  1.  Bootstrap Structure  → [TRACKER-ID]_initial_structure.md     │
 │      [BLOCKING: user confirms structure]                       │
-│  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
-│  3.  Blackbox Tests       → [JIRA-ID]_[short_name].md each    │
+│  2.  Component Tasks      → [TRACKER-ID]_[short_name].md each    │
+│  3.  Blackbox Tests       → [TRACKER-ID]_[short_name].md each    │
 │  4.  Cross-Verification   → _dependencies_table.md            │
 │      [BLOCKING: user confirms dependencies]                    │
 │                                                                │
 │ TESTS-ONLY MODE:                                                │
-│  1t. Test Infrastructure  → [JIRA-ID]_test_infrastructure.md   │
+│  1t. Test Infrastructure  → [TRACKER-ID]_test_infrastructure.md   │
 │      [BLOCKING: user confirms test scaffold]                   │
-│  3.  Blackbox Tests       → [JIRA-ID]_[short_name].md each    │
+│  3.  Blackbox Tests       → [TRACKER-ID]_[short_name].md each    │
 │  4.  Cross-Verification   → _dependencies_table.md            │
 │      [BLOCKING: user confirms dependencies]                    │
 │                                                                │
 │ SINGLE COMPONENT MODE:                                          │
-│  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
+│  2.  Component Tasks      → [TRACKER-ID]_[short_name].md each    │
 ├────────────────────────────────────────────────────────────────┤
 │ Principles: Atomic tasks · Behavioral specs · Flat structure   │
-│   Jira inline · Rename to Jira ID · Save now · Ask don't assume│
+│   Tracker inline · Rename to tracker ID · Save now · Ask don't assume│
 └────────────────────────────────────────────────────────────────┘
 ```
@@ -13,10 +13,10 @@ Use this template after cross-task verification. Save as `TASKS_DIR/_dependencie

 | Task | Name | Complexity | Dependencies | Epic |
 |------|------|-----------|-------------|------|
-| [JIRA-ID] | initial_structure | [points] | None | [EPIC-ID] |
-| [JIRA-ID] | [short_name] | [points] | [JIRA-ID] | [EPIC-ID] |
-| [JIRA-ID] | [short_name] | [points] | [JIRA-ID] | [EPIC-ID] |
-| [JIRA-ID] | [short_name] | [points] | [JIRA-ID], [JIRA-ID] | [EPIC-ID] |
+| [TRACKER-ID] | initial_structure | [points] | None | [EPIC-ID] |
+| [TRACKER-ID] | [short_name] | [points] | [TRACKER-ID] | [EPIC-ID] |
+| [TRACKER-ID] | [short_name] | [points] | [TRACKER-ID] | [EPIC-ID] |
+| [TRACKER-ID] | [short_name] | [points] | [TRACKER-ID], [TRACKER-ID] | [EPIC-ID] |
 | ... | ... | ... | ... | ... |
 ```

@@ -25,7 +25,7 @@ Use this template after cross-task verification. Save as `TASKS_DIR/_dependencie
 ## Guidelines

 - Every task from TASKS_DIR must appear in this table
- Dependencies column lists Jira IDs (e.g., "AZ-43, AZ-44") or "None"
+- Dependencies column lists tracker IDs (e.g., "AZ-43, AZ-44") or "None"
 - No circular dependencies allowed
 - Tasks should be listed in recommended execution order
 - The `/implement` skill reads this table to compute parallel batches
@@ -1,19 +1,19 @@
 # Initial Structure Task Template

-Use this template for the bootstrap structure plan. Save as `TASKS_DIR/01_initial_structure.md` initially, then rename to `TASKS_DIR/[JIRA-ID]_initial_structure.md` after Jira ticket creation.
+Use this template for the bootstrap structure plan. Save as `TASKS_DIR/01_initial_structure.md` initially, then rename to `TASKS_DIR/[TRACKER-ID]_initial_structure.md` after work item ticket creation.

 ---

 ```markdown
 # Initial Project Structure

-**Task**: [JIRA-ID]_initial_structure
+**Task**: [TRACKER-ID]_initial_structure
 **Name**: Initial Structure
 **Description**: Scaffold the project skeleton — folders, shared models, interfaces, stubs, CI/CD, DB migrations, test structure
 **Complexity**: [3|5] points
 **Dependencies**: None
 **Component**: Bootstrap
-**Jira**: [TASK-ID]
+**Tracker**: [TASK-ID]
 **Epic**: [EPIC-ID]

 ## Project Folder Layout
@@ -1,20 +1,20 @@
 # Task Specification Template

 Create a focused behavioral specification that describes **what** the system should do, not **how** it should be built.
-Save as `TASKS_DIR/[##]_[short_name].md` initially, then rename to `TASKS_DIR/[JIRA-ID]_[short_name].md` after Jira ticket creation.
+Save as `TASKS_DIR/[##]_[short_name].md` initially, then rename to `TASKS_DIR/[TRACKER-ID]_[short_name].md` after work item ticket creation.

 ---

 ```markdown
 # [Feature Name]

-**Task**: [JIRA-ID]_[short_name]
+**Task**: [TRACKER-ID]_[short_name]
 **Name**: [short human name]
 **Description**: [one-line description of what this task delivers]
-**Complexity**: [1|2|3|5] points
+**Complexity**: [1|2|3|5|8] points
 **Dependencies**: [AZ-43_shared_models, AZ-44_db_migrations] or "None"
 **Component**: [component name for context]
-**Jira**: [TASK-ID]
+**Tracker**: [TASK-ID]
 **Epic**: [EPIC-ID]

 ## Problem
@@ -91,7 +91,8 @@ Then [expected result]
 - 2 points: Non-trivial, low complexity, minimal coordination
 - 3 points: Multi-step, moderate complexity, potential alignment needed
 - 5 points: Difficult, interconnected logic, medium-high risk
- 8 points: Too complex — split into smaller tasks
+- 8 points: High difficulty, high ambiguity or coordination, multiple components
+- 13 points: Too complex — split into smaller tasks

 ## Output Guidelines

@@ -102,7 +103,7 @@ Then [expected result]
 - Include realistic scope boundaries
 - Write from the user's perspective
 - Include complexity estimation
- Reference dependencies by Jira ID (e.g., AZ-43_shared_models)
+- Reference dependencies by tracker ID (e.g., AZ-43_shared_models)

 **DON'T:**
 - Include implementation details (file paths, classes, methods)
@@ -1,19 +1,19 @@
 # Test Infrastructure Task Template

-Use this template for the test infrastructure bootstrap (Step 1t in tests-only mode). Save as `TASKS_DIR/01_test_infrastructure.md` initially, then rename to `TASKS_DIR/[JIRA-ID]_test_infrastructure.md` after Jira ticket creation.
+Use this template for the test infrastructure bootstrap (Step 1t in tests-only mode). Save as `TASKS_DIR/01_test_infrastructure.md` initially, then rename to `TASKS_DIR/[TRACKER-ID]_test_infrastructure.md` after work item ticket creation.

 ---

 ```markdown
 # Test Infrastructure

-**Task**: [JIRA-ID]_test_infrastructure
+**Task**: [TRACKER-ID]_test_infrastructure
 **Name**: Test Infrastructure
 **Description**: Scaffold the Blackbox test project — test runner, mock services, Docker test environment, test data fixtures, reporting
 **Complexity**: [3|5] points
 **Dependencies**: None
 **Component**: Blackbox Tests
-**Jira**: [TASK-ID]
+**Tracker**: [TASK-ID]
 **Epic**: [EPIC-ID]

 ## Test Project Folder Layout
@@ -177,7 +177,7 @@ Re-entry is seamless: `state.json` tracks exactly which modules are done.
   - By directory structure (most common)
   - By shared data models or common purpose
   - By dependency clusters (tightly coupled modules)
-2. For each identified component, synthesize its module docs into a single component specification using `templates/component-spec.md` as structure:
+2. For each identified component, synthesize its module docs into a single component specification using `.cursor/skills/plan/templates/component-spec.md` as structure:
   - High-level overview: purpose, pattern, upstream/downstream
   - Internal interfaces: method signatures, DTOs (from actual module code)
   - External API specification (if the component exposes HTTP/gRPC endpoints)
@@ -214,7 +214,7 @@ All documents here are derived from component docs (Step 2) + module docs (Step

 #### 3a. Architecture

-Using `templates/architecture.md` as structure:
+Using `.cursor/skills/plan/templates/architecture.md` as structure:

 - System context and boundaries from entry points and external integrations
 - Tech stack table from discovery (Step 0) + component specs
@@ -229,7 +229,7 @@ Using `templates/architecture.md` as structure:

 #### 3b. System Flows

-Using `templates/system-flows.md` as structure:
+Using `.cursor/skills/plan/templates/system-flows.md` as structure:

 - Trace main flows through the component interaction graph
 - Entry point -> component chain -> output for each major flow
@@ -370,7 +370,7 @@ This is the inverse of normal workflow: instead of problem -> solution -> code,
 **Role**: Technical writer
 **Goal**: Produce `FINAL_report.md` integrating all generated documentation.

-Using `templates/final-report.md` as structure:
+Using `.cursor/skills/plan/templates/final-report.md` as structure:

 - Executive summary from architecture + problem docs
 - Problem statement (transformed from problem.md, not copy-pasted)
@@ -32,31 +32,37 @@ The `implementer` agent is the specialist that writes all the code — it receiv

 ## Context Resolution

- TASKS_DIR: `_docs/02_tasks/todo/`
- DONE_DIR: `_docs/02_tasks/done/`
- BACKLOG_DIR: `_docs/02_tasks/backlog/`
- TASKS_ROOT: `_docs/02_tasks/`
- Task files: all `*.md` files in TASKS_DIR (excluding files starting with `_`)
- Dependency table: `TASKS_ROOT/_dependencies_table.md`
+- TASKS_DIR: `_docs/02_tasks/`
+- Task files: all `*.md` files in `TASKS_DIR/todo/` (excluding files starting with `_`)
+- Dependency table: `TASKS_DIR/_dependencies_table.md`
+
+### Task Lifecycle Folders
+
+```
+TASKS_DIR/
+├── _dependencies_table.md
+├── todo/        ← tasks ready for implementation (this skill reads from here)
+├── backlog/     ← parked tasks (not scheduled yet, ignored by this skill)
+└── done/        ← completed tasks (moved here after implementation)
+```

 ## Prerequisite Checks (BLOCKING)

-1. TASKS_DIR (`todo/`) exists and contains at least one task file — **STOP if missing**
-2. `TASKS_ROOT/_dependencies_table.md` exists — **STOP if missing**
-3. At least one task in TASKS_DIR is not yet completed — **STOP if all done** (already-completed tasks live in DONE_DIR)
+1. `TASKS_DIR/todo/` exists and contains at least one task file — **STOP if missing**
+2. `_dependencies_table.md` exists — **STOP if missing**
+3. At least one task is not yet completed — **STOP if all done**

 ## Algorithm

 ### 1. Parse

- Read all task `*.md` files from TASKS_DIR (excluding files starting with `_`)
+- Read all task `*.md` files from `TASKS_DIR/todo/` (excluding files starting with `_`)
 - Read `_dependencies_table.md` — parse into a dependency graph (DAG)
 - Validate: no circular dependencies, all referenced dependencies exist

 ### 2. Detect Progress

- Scan DONE_DIR to identify tasks that were already completed in previous runs
- Scan the codebase to determine which TASKS_DIR tasks are already completed
+- Scan the codebase to determine which tasks are already completed
 - Match implemented code against task acceptance criteria
 - Mark completed tasks as done in the DAG
 - Report progress to user: "X of Y tasks completed"
@@ -79,7 +85,7 @@ For each task in the batch:

 ### 5. Update Tracker Status → In Progress

-For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `protocols.md` for detection) before launching the implementer. If `tracker: local`, skip this step.
+For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (see `protocols.md` for tracker detection) before launching the implementer. If `tracker: local`, skip this step.

 ### 6. Launch Implementer Subagents

@@ -124,33 +130,39 @@ Track `auto_fix_attempts` count in the batch report for retrospective analysis.

 ### 10. Test

- Run the full test suite
- If failures: report to user with details
+- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
+- Test failures are a **blocking gate** — do not proceed to commit until the test-run skill completes with a user decision
+- Note: the autopilot also runs a separate full test suite after all implementation batches complete (greenfield Step 7, existing-code Steps 6/10). This is intentional — per-batch tests are regression checks, the post-implement run is final validation.

 ### 11. Commit and Push

 - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
  - `git add` all changed files from the batch
-  - `git commit` with a message that includes ALL task IDs (Jira IDs, ADO IDs, or numeric prefixes) of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[TASK-ID-1] [TASK-ID-2] ... Summary of changes`
+  - `git commit` with a message that includes ALL task IDs (tracker IDs or numeric prefixes) of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[TASK-ID-1] [TASK-ID-2] ... Summary of changes`
  - `git push` to the remote branch

-### 11b. Move Completed Tasks to Done
-
- For each task in the batch that completed successfully, move its task spec file from TASKS_DIR (`todo/`) to DONE_DIR (`done/`)
- `git add` the moved files and amend the batch commit, or create a follow-up commit
-
 ### 12. Update Tracker Status → In Testing

 After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step.

-### 13. Loop
+### 13. Archive Completed Tasks

- Go back to step 2 until TASKS_DIR (`todo/`) is empty (all tasks moved to DONE_DIR)
+Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
+
+### 14. Loop
+
+- Go back to step 2 until all tasks in `todo/` are done
 - When all tasks are complete, report final summary

 ## Batch Report Persistence

-After each batch completes, save the batch report to `_docs/03_implementation/batch_[NN]_report.md`. Create the directory if it doesn't exist. When all tasks are complete, produce `_docs/03_implementation/FINAL_implementation_report.md` with a summary of all batches.
+After each batch completes, save the batch report to `_docs/03_implementation/batch_[NN]_report.md`. Create the directory if it doesn't exist. When all tasks are complete, produce a FINAL implementation report with a summary of all batches. The filename depends on context:
+
+- **Test implementation** (tasks from test decomposition): `_docs/03_implementation/implementation_report_tests.md`
+- **Feature implementation**: `_docs/03_implementation/implementation_report_{feature_slug}.md` where `{feature_slug}` is derived from the batch task names (e.g., `implementation_report_core_api.md`)
+- **Refactoring**: `_docs/03_implementation/implementation_report_refactor_{run_name}.md`
+
+Determine the context from the task files being implemented: if all tasks have test-related names or belong to a test epic, use the tests filename; otherwise derive the feature slug from the component names.

 ## Batch Report

@@ -167,7 +179,7 @@ After each batch, produce a structured report:

 | Task | Status | Files Modified | Tests | Issues |
 |------|--------|---------------|-------|--------|
-| [JIRA-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |
+| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |

 ## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
 ## Auto-Fix Attempts: [0/1/2]
@@ -183,7 +195,7 @@ After each batch, produce a structured report:
 | Implementer fails same approach 3+ times | Stop it, escalate to user |
 | Task blocked on external dependency (not in task list) | Report and skip |
 | File ownership conflict unresolvable | ASK user |
-| Test failures exceed 50% of suite after a batch | Stop and escalate |
+| Any test failure after a batch | Delegate to test-run skill — blocking gate |
 | All tasks complete | Report final summary, suggest final commit |
 | `_dependencies_table.md` missing | STOP — run `/decompose` first |

@@ -15,7 +15,7 @@ Use this template after each implementation batch completes.

 | Task | Status | Files Modified | Tests | Issues |
 |------|--------|---------------|-------|--------|
-| [JIRA-ID]_[name] | Done/Blocked/Partial | [count] files | [X/Y pass] | [count or None] |
+| [TRACKER-ID]_[name] | Done/Blocked/Partial | [count] files | [X/Y pass] | [count or None] |

 ## Code Review Verdict: [PASS / FAIL / PASS_WITH_WARNINGS]

@@ -4,19 +4,19 @@ description: |
  Interactive skill for adding new functionality to an existing codebase.
  Guides the user through describing the feature, assessing complexity,
  optionally running research, analyzing the codebase for insertion points,
-  validating assumptions with the user, and producing a task spec with Jira ticket.
+  validating assumptions with the user, and producing a task spec with work item ticket.
  Supports a loop — the user can add multiple tasks in one session.
  Trigger phrases:
  - "new task", "add feature", "new functionality"
  - "I want to add", "new component", "extend"
 category: build
-tags: [task, feature, interactive, planning, jira]
+tags: [task, feature, interactive, planning, work-items]
 disable-model-invocation: true
 ---

 # New Task (Interactive Feature Planning)

-Guide the user through defining new functionality for an existing codebase. Produces one or more task specifications with Jira tickets, optionally running deep research for complex features.
+Guide the user through defining new functionality for an existing codebase. Produces one or more task specifications with work item tickets, optionally running deep research for complex features.

 ## Core Principles

@@ -30,17 +30,15 @@ Guide the user through defining new functionality for an existing codebase. Prod

 Fixed paths:

- TASKS_DIR: `_docs/02_tasks/todo/`
- TASKS_ROOT: `_docs/02_tasks/`
- DONE_DIR: `_docs/02_tasks/done/`
- BACKLOG_DIR: `_docs/02_tasks/backlog/`
+- TASKS_DIR: `_docs/02_tasks/`
+- TASKS_TODO: `_docs/02_tasks/todo/`
 - PLANS_DIR: `_docs/02_task_plans/`
 - DOCUMENT_DIR: `_docs/02_document/`
 - DEPENDENCIES_TABLE: `_docs/02_tasks/_dependencies_table.md`

-Create TASKS_ROOT subfolders (`todo/`, `backlog/`, `done/`) and PLANS_DIR if they don't exist.
+Create TASKS_DIR, TASKS_TODO, and PLANS_DIR if they don't exist.

-Scan all three subfolders (`todo/`, `backlog/`, `done/`) for existing task files to determine the next numeric prefix for temporary file naming.
+If TASKS_DIR already contains task files (scan `todo/`, `backlog/`, and `done/`), use them to determine the next numeric prefix for temporary file naming.

 ## Workflow

@@ -121,7 +119,7 @@ This step only runs if Step 2 determined research is needed.
 2. Invoke `.cursor/skills/research/SKILL.md` in standalone mode:
   - INPUT_FILE: `PLANS_DIR/<task_slug>/problem.md`
   - BASE_DIR: `PLANS_DIR/<task_slug>/`
-3. After research completes, read the solution draft from `PLANS_DIR/<task_slug>/01_solution/solution_draft01.md`
+3. After research completes, read the latest solution draft from `PLANS_DIR/<task_slug>/01_solution/` (highest-numbered `solution_draft*.md`)
 4. Extract the key findings relevant to the task specification

 The `<task_slug>` is a short kebab-case name derived from the feature description (e.g., `auth-provider-integration`, `real-time-notifications`).
@@ -198,20 +196,21 @@ Present using the Choose format for each decision that has meaningful alternativ
 **Role**: Technical writer
 **Goal**: Produce the task specification file.

-1. Determine the next numeric prefix by scanning TASKS_DIR for existing files
-2. Write the task file using `.cursor/skills/decompose/templates/task.md`:
+1. Determine the next numeric prefix by scanning all TASKS_DIR subfolders (`todo/`, `backlog/`, `done/`) for existing files
+2. If research was performed (Step 3), the research artifacts live in `PLANS_DIR/<task_slug>/` — reference them from the task spec where relevant
+3. Write the task file using `.cursor/skills/decompose/templates/task.md`:
   - Fill all fields from the gathered information
   - Set **Complexity** based on the assessment from Step 2
-   - Set **Dependencies** by cross-referencing existing tasks in TASKS_DIR
-   - Set **Jira** and **Epic** to `pending` (filled in Step 7)
-3. Save as `TASKS_DIR/[##]_[short_name].md`
+   - Set **Dependencies** by cross-referencing existing tasks in TASKS_DIR subfolders
+   - Set **Tracker** and **Epic** to `pending` (filled in Step 7)
+3. Save as `TASKS_TODO/[##]_[short_name].md`

 **Self-verification**:
 - [ ] Problem section clearly describes the user need
 - [ ] Acceptance criteria are testable (Gherkin format)
 - [ ] Scope boundaries are explicit
 - [ ] Complexity points match the assessment
- [ ] Dependencies reference existing task Jira IDs where applicable
+- [ ] Dependencies reference existing task tracker IDs where applicable
 - [ ] No implementation details leaked into the spec

 ---
@@ -221,20 +220,20 @@ Present using the Choose format for each decision that has meaningful alternativ
 **Role**: Project coordinator
 **Goal**: Create a work item ticket and link it to the task file.

-1. Create a ticket via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md` for detection):
+1. Create a ticket via the configured work item tracker (see `autopilot/protocols.md` for tracker detection):
   - Summary: the task's **Name** field
   - Description: the task's **Problem** and **Acceptance Criteria** sections
   - Story points: the task's **Complexity** value
   - Link to the appropriate epic (ask user if unclear which epic)
 2. Write the ticket ID and Epic ID back into the task file header:
   - Update **Task** field: `[TICKET-ID]_[short_name]`
-   - Update **Jira** field: `[TICKET-ID]`
+   - Update **Tracker** field: `[TICKET-ID]`
   - Update **Epic** field: `[EPIC-ID]`
 3. Rename the file from `[##]_[short_name].md` to `[TICKET-ID]_[short_name].md`

 If the work item tracker is not authenticated or unavailable (`tracker: local`):
 - Keep the numeric prefix
- Set **Jira** to `pending`
+- Set **Tracker** to `pending`
 - Set **Epic** to `pending`
 - The task is still valid and can be implemented; tracker sync happens later

@@ -246,7 +245,7 @@ Ask the user:

 ```
 ══════════════════════════════════════
- Task created: [JIRA-ID or ##] — [task name]
+ Task created: [TRACKER-ID or ##] — [task name]
 ══════════════════════════════════════
 A) Add another task
 B) Done — finish and update dependencies
@@ -262,7 +261,7 @@ Ask the user:

 After the user chooses **Done**:

-1. Update (or create) `TASKS_DIR/_dependencies_table.md` — add all newly created tasks to the dependencies table
+1. Update (or create) `DEPENDENCIES_TABLE` — add all newly created tasks to the dependencies table
 2. Present a summary of all tasks created in this session:

 ```
@@ -272,8 +271,8 @@ After the user chooses **Done**:
 Tasks created: N
 Total complexity: M points
 ─────────────────────────────────────
- [JIRA-ID] [name] ([complexity] pts)
- [JIRA-ID] [name] ([complexity] pts)
+ [TRACKER-ID] [name] ([complexity] pts)
+ [TRACKER-ID] [name] ([complexity] pts)
 ...
 ══════════════════════════════════════
 ```
@@ -287,7 +286,7 @@ After the user chooses **Done**:
 | Research skill hits a blocker | Follow research skill's own escalation rules |
 | Codebase analysis reveals conflicting architectures | **ASK** user which pattern to follow |
 | Complexity exceeds 5 points | **WARN** user and suggest splitting into multiple tasks |
-| Jira MCP unavailable | **WARN**, continue with local-only task files |
+| Work item tracker MCP unavailable | **WARN**, continue with local-only task files |

 ## Trigger Conditions

@@ -1,21 +1,21 @@
 ---
 name: plan
 description: |
-  Decompose a solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics.
-  Systematic 6-step planning workflow with BLOCKING gates, self-verification, and structured artifact management.
+  Decompose a solution into architecture, data model, deployment plan, system flows, components, tests, and work item epics.
+  Systematic planning workflow with BLOCKING gates, self-verification, and structured artifact management.
  Uses _docs/ + _docs/02_document/ structure.
  Trigger phrases:
  - "plan", "decompose solution", "architecture planning"
  - "break down the solution", "create planning documents"
  - "component decomposition", "solution analysis"
 category: build
-tags: [planning, architecture, components, testing, jira, epics]
+tags: [planning, architecture, components, testing, work-items, epics]
 disable-model-invocation: true
 ---

 # Solution Planning

-Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics through a systematic 6-step workflow.
+Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, tests, and work item epics through a systematic 6-step workflow.

 ## Core Principles

@@ -61,7 +61,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 6 plus F

 ### Step 1: Blackbox Tests

-Read and execute `.cursor/skills/test-spec/SKILL.md`.
+Read and execute `.cursor/skills/test-spec/SKILL.md`. This is a planning context — no source code exists yet, so test-spec Phase 4 (script generation) is skipped. Script creation is handled later by the decompose skill as a task.

 Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3.

@@ -91,9 +91,9 @@ Read and follow `steps/05_test-specifications.md`.

 ---

-### Step 6: Jira Epics
+### Step 6: Work Item Epics

-Read and follow `steps/06_jira-epics.md`.
+Read and follow `steps/06_work-item-epics.md`.

 ---

@@ -144,7 +144,7 @@ Read and follow `steps/07_quality-checklist.md`.
 │ 4. Review & Risk       → risk register, iterations              │
 │    [BLOCKING: user confirms mitigations]                       │
 │ 5. Test Specifications → per-component test specs               │
-│ 6. Jira Epics          → epic per component + bootstrap         │
+│ 6. Work Item Epics     → epic per component + bootstrap         │
 │    ─────────────────────────────────────────────────           │
 │ Final: Quality Checklist → FINAL_report.md                      │
 ├────────────────────────────────────────────────────────────────┤
@@ -67,7 +67,7 @@ DOCUMENT_DIR/
 | Step 3 | Diagrams generated | `diagrams/` |
 | Step 4 | Risk assessment complete | `risk_mitigations.md` |
 | Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
-| Step 6 | Epics created in Jira | Jira via MCP |
+| Step 6 | Epics created in work item tracker | Tracker via MCP |
 | Final | All steps complete | `FINAL_report.md` |

 ### Save Principles
@@ -7,7 +7,7 @@
 **Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the epic should understand the full context without needing to open separate files.

 1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
-2. Generate epics for each component using the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md`), structured per `templates/epic-spec.md`
+2. Generate epics for each component using the configured work item tracker (see `autopilot/protocols.md` for tracker detection), structured per `templates/epic-spec.md`
 3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
 4. Include effort estimation per epic (T-shirt size or story points range)
 5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
@@ -22,7 +22,7 @@ Each epic description MUST include ALL of the following sections with substantia
 - **Architecture notes**: relevant ADRs, technology choices, patterns used, key design decisions
 - **Interface specification**: full method signatures, input/output types, error types (from component description.md)
 - **Data flow**: how data enters and exits this component (include Mermaid sequence or flowchart diagram)
- **Dependencies**: epic dependencies (with Jira IDs) and external dependencies (libraries, hardware, services)
+- **Dependencies**: epic dependencies (with tracker IDs) and external dependencies (libraries, hardware, services)
 - **Acceptance criteria**: measurable criteria with specific thresholds (from component tests.md)
 - **Non-functional requirements**: latency, memory, throughput targets with failure thresholds
 - **Risks & mitigations**: relevant risks from risk_mitigations.md with concrete mitigation strategies
@@ -1,6 +1,6 @@
 # Epic Template

-Use this template for each epic. Create epics via the configured work item tracker (Jira MCP or Azure DevOps MCP).
+Use this template for each epic. Create epics via the configured work item tracker (see `autopilot/protocols.md` for tracker detection).

 ---

@@ -27,8 +27,8 @@ Use this template after completing all 6 steps and the quality checklist. Save a

 | # | Component | Purpose | Dependencies | Epic |
 |---|-----------|---------|-------------|------|
-| 01 | [name] | [one-line purpose] | — | [Jira ID] |
-| 02 | [name] | [one-line purpose] | 01 | [Jira ID] |
+| 01 | [name] | [one-line purpose] | — | [Tracker ID] |
+| 02 | [name] | [one-line purpose] | 01 | [Tracker ID] |
 | ... | | | | |

 **Implementation order** (based on dependency graph):
@@ -71,8 +71,8 @@ Use this template after completing all 6 steps and the quality checklist. Save a

 | Order | Epic | Component | Effort | Dependencies |
 |-------|------|-----------|--------|-------------|
-| 1 | [Jira ID]: [name] | [component] | [S/M/L/XL] | — |
-| 2 | [Jira ID]: [name] | [component] | [S/M/L/XL] | Epic 1 |
+| 1 | [Tracker ID]: [name] | [component] | [S/M/L/XL] | — |
+| 2 | [Tracker ID]: [name] | [component] | [S/M/L/XL] | Epic 1 |
 | ... | | | | |

 **Total estimated effort**: [sum or range]
@@ -1,471 +1,126 @@
 ---
 name: refactor
 description: |
-  Structured refactoring workflow (6-phase method) with three execution modes:
-  - Full Refactoring: all 6 phases — baseline, discovery, analysis, safety net, execution, hardening
-  - Targeted Refactoring: skip discovery if docs exist, focus on a specific component/area
-  - Quick Assessment: phases 0-2 only, outputs a refactoring plan without execution
-  Supports project mode (_docs/ structure) and standalone mode (@file.md).
-  Trigger phrases:
-  - "refactor", "refactoring", "improve code"
-  - "analyze coupling", "decoupling", "technical debt"
-  - "refactoring assessment", "code quality improvement"
+  Structured 8-phase refactoring workflow with two input modes:
+  Automatic (skill discovers issues) and Guided (input file with change list).
+  Each run gets its own subfolder in _docs/04_refactoring/.
+  Delegates code execution to the implement skill via task files in _docs/02_tasks/.
+  Additional workflow modes: Targeted (skip discovery), Quick Assessment (phases 0-2 only).
 category: evolve
-tags: [refactoring, coupling, technical-debt, performance, hardening]
+tags: [refactoring, coupling, technical-debt, performance, testability]
+trigger_phrases: ["refactor", "refactoring", "improve code", "analyze coupling", "decoupling", "technical debt", "code quality"]
 disable-model-invocation: true
 ---

-# Structured Refactoring (6-Phase Method)
+# Structured Refactoring

-Transform existing codebases through a systematic refactoring workflow: capture baseline, document current state, research improvements, build safety net, execute changes, and harden.
+Phase details live in `phases/` — read the relevant file before executing each phase.

 ## Core Principles

- **Preserve behavior first**: never refactor without a passing test suite
+- **Preserve behavior first**: never refactor without a passing test suite (exception: testability runs, where the goal is making code testable)
 - **Measure before and after**: every change must be justified by metrics
 - **Small incremental changes**: commit frequently, never break tests
- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
+- **Save immediately**: write artifacts to disk after each phase
+- **Delegate execution**: all code changes go through the implement skill via task files
 - **Ask, don't assume**: when scope or priorities are unclear, STOP and ask the user

 ## Context Resolution

-Determine the operating mode based on invocation before any other logic runs.
+Announce detected paths and input mode to user before proceeding.

-**Project mode** (no explicit input file provided):
- PROBLEM_DIR: `_docs/00_problem/`
- SOLUTION_DIR: `_docs/01_solution/`
- COMPONENTS_DIR: `_docs/02_document/components/`
- DOCUMENT_DIR: `_docs/02_document/`
- REFACTOR_DIR: `_docs/04_refactoring/`
- All existing guardrails apply.
+**Fixed paths:**

-**Standalone mode** (explicit input file provided, e.g. `/refactor @some_component.md`):
- INPUT_FILE: the provided file (treated as component/area description)
- REFACTOR_DIR: `_standalone/refactoring/`
- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
- `acceptance_criteria.md` is optional — warn if absent
+| Path | Location |
+|------|----------|
+| PROBLEM_DIR | `_docs/00_problem/` |
+| SOLUTION_DIR | `_docs/01_solution/` |
+| COMPONENTS_DIR | `_docs/02_document/components/` |
+| DOCUMENT_DIR | `_docs/02_document/` |
+| TASKS_DIR | `_docs/02_tasks/` |
+| TASKS_TODO | `_docs/02_tasks/todo/` |
+| REFACTOR_DIR | `_docs/04_refactoring/` |
+| RUN_DIR | `REFACTOR_DIR/NN-[run-name]/` |

-Announce the detected mode and resolved paths to the user before proceeding.
+**Prereqs**: `problem.md` required, `acceptance_criteria.md` warn if absent.

-## Mode Detection
+**RUN_DIR resolution**: on start, scan REFACTOR_DIR for existing `NN-*` folders. Auto-increment the numeric prefix for the new run. The run name is derived from the invocation context (e.g., `01-testability-refactoring`, `02-coupling-refactoring`). If invoked with a guided input file, derive the name from the input file name or ask the user.

-After context resolution, determine the execution mode:
+Create REFACTOR_DIR and RUN_DIR if missing. If a RUN_DIR with the same name already exists, ask user: **resume or start fresh?**

-1. **User explicitly says** "quick assessment" or "just assess" → **Quick Assessment**
-2. **User explicitly says** "refactor [component/file/area]" with a specific target → **Targeted Refactoring**
-3. **Default** → **Full Refactoring**
+## Input Modes

-| Mode | Phases Executed | When to Use |
-|------|----------------|-------------|
-| **Full Refactoring** | 0 → 1 → 2 → 3 → 4 → 5 | Complete refactoring of a system or major area |
-| **Targeted Refactoring** | 0 → (skip 1 if docs exist) → 2 → 3 → 4 → 5 | Refactor a specific component; docs already exist |
-| **Quick Assessment** | 0 → 1 → 2 | Produce a refactoring roadmap without executing changes |
+| Mode | Trigger | Discovery source |
+|------|---------|-----------------|
+| Automatic | Default, no input file | Skill discovers issues from code analysis |
+| Guided | Input file provided (e.g., `/refactor @list-of-changes.md`) | Reads input file + scans code to form validated change list |

-Inform the user which mode was detected and confirm before proceeding.
+Both modes produce `RUN_DIR/list-of-changes.md` (template: `templates/list-of-changes.md`). Both modes then convert that file into task files in TASKS_DIR during Phase 2.

-## Prerequisite Checks (BLOCKING)
-
-**Project mode:**
-1. PROBLEM_DIR exists with `problem.md` (or `problem_description.md`) — **STOP if missing**, ask user to create it
-2. If `acceptance_criteria.md` is missing: **warn** and ask whether to proceed
-3. Create REFACTOR_DIR if it does not exist
-4. If REFACTOR_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
-
-**Standalone mode:**
-1. INPUT_FILE exists and is non-empty — **STOP if missing**
-2. Warn if no `acceptance_criteria.md` provided
-3. Create REFACTOR_DIR if it does not exist
-
-## Artifact Management
-
-### Directory Structure
-
-```
-REFACTOR_DIR/
-├── baseline_metrics.md          (Phase 0)
-├── discovery/
-│   ├── components/
-│   │   └── [##]_[name].md       (Phase 1)
-│   ├── solution.md              (Phase 1)
-│   └── system_flows.md          (Phase 1)
-├── analysis/
-│   ├── research_findings.md     (Phase 2)
-│   └── refactoring_roadmap.md   (Phase 2)
-├── test_specs/
-│   └── [##]_[test_name].md      (Phase 3)
-├── coupling_analysis.md         (Phase 4)
-├── execution_log.md             (Phase 4)
-├── hardening/
-│   ├── technical_debt.md        (Phase 5)
-│   ├── performance.md           (Phase 5)
-│   └── security.md              (Phase 5)
-└── FINAL_report.md              (after all phases)
-```
-
-### Save Timing
-
-| Phase | Save immediately after | Filename |
-|-------|------------------------|----------|
-| Phase 0 | Baseline captured | `baseline_metrics.md` |
-| Phase 1 | Each component documented | `discovery/components/[##]_[name].md` |
-| Phase 1 | Solution synthesized | `discovery/solution.md`, `discovery/system_flows.md` |
-| Phase 2 | Research complete | `analysis/research_findings.md` |
-| Phase 2 | Roadmap produced | `analysis/refactoring_roadmap.md` |
-| Phase 3 | Test specs written | `test_specs/[##]_[test_name].md` |
-| Phase 4 | Coupling analyzed | `coupling_analysis.md` |
-| Phase 4 | Execution complete | `execution_log.md` |
-| Phase 5 | Each hardening track | `hardening/<track>.md` |
-| Final | All phases done | `FINAL_report.md` |
-
-### Resumability
-
-If REFACTOR_DIR already contains artifacts:
-
-1. List existing files and match to the save timing table
-2. Identify the last completed phase based on which artifacts exist
-3. Resume from the next incomplete phase
-4. Inform the user which phases are being skipped
-
-## Progress Tracking
-
-At the start of execution, create a TodoWrite with all applicable phases. Update status as each phase completes.
+**Guided mode cleanup**: after `RUN_DIR/list-of-changes.md` is created from the input file, delete the original input file to avoid duplication.

 ## Workflow

-### Phase 0: Context & Baseline
-
-**Role**: Software engineer preparing for refactoring
-**Goal**: Collect refactoring goals and capture baseline metrics
-**Constraints**: Measurement only — no code changes
-
-#### 0a. Collect Goals
-
-If PROBLEM_DIR files do not yet exist, help the user create them:
-
-1. `problem.md` — what the system currently does, what changes are needed, pain points
-2. `acceptance_criteria.md` — success criteria for the refactoring
-3. `security_approach.md` — security requirements (if applicable)
-
-Store in PROBLEM_DIR.
-
-#### 0b. Capture Baseline
-
-1. Read problem description and acceptance criteria
-2. Measure current system metrics using project-appropriate tools:
-
-| Metric Category | What to Capture |
-|----------------|-----------------|
-| **Coverage** | Overall, unit, blackbox, critical paths |
-| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
-| **Code Smells** | Total, critical, major |
-| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
-| **Dependencies** | Total count, outdated, security vulnerabilities |
-| **Build** | Build time, test execution time, deployment time |
-
-3. Create functionality inventory: all features/endpoints with status and coverage
-
-**Self-verification**:
- [ ] All metric categories measured (or noted as N/A with reason)
- [ ] Functionality inventory is complete
- [ ] Measurements are reproducible
-
-**Save action**: Write `REFACTOR_DIR/baseline_metrics.md`
-
-**BLOCKING**: Present baseline summary to user. Do NOT proceed until user confirms.
-
---
-
-### Phase 1: Discovery
-
-**Role**: Principal software architect
-**Goal**: Generate documentation from existing code and form solution description
-**Constraints**: Document what exists, not what should be. No code changes.
-
-**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.
-
-#### 1a. Document Components
-
-For each component in the codebase:
-
-1. Analyze project structure, directories, files
-2. Go file by file, analyze each method
-3. Analyze connections between components
-
-Write per component to `REFACTOR_DIR/discovery/components/[##]_[name].md`:
- Purpose and architectural patterns
- Mermaid diagrams for logic flows
- API reference table (name, description, input, output)
- Implementation details: algorithmic complexity, state management, dependencies
- Caveats, edge cases, known limitations
-
-#### 1b. Synthesize Solution & Flows
-
-1. Review all generated component documentation
-2. Synthesize into a cohesive solution description
-3. Create flow diagrams showing component interactions
-
-Write:
- `REFACTOR_DIR/discovery/solution.md` — product description, component overview, interaction diagram
- `REFACTOR_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case
-
-Also copy to project standard locations if in project mode:
- `SOLUTION_DIR/solution.md`
- `DOCUMENT_DIR/system_flows.md`
-
-**Self-verification**:
- [ ] Every component in the codebase is documented
- [ ] Solution description covers all components
- [ ] Flow diagrams cover all major use cases
- [ ] Mermaid diagrams are syntactically correct
-
-**Save action**: Write discovery artifacts
-
-**BLOCKING**: Present discovery summary to user. Do NOT proceed until user confirms documentation accuracy.
-
---
-
-### Phase 2: Analysis
-
-**Role**: Researcher and software architect
-**Goal**: Research improvements and produce a refactoring roadmap
-**Constraints**: Analysis only — no code changes
-
-#### 2a. Deep Research
-
-1. Analyze current implementation patterns
-2. Research modern approaches for similar systems
-3. Identify what could be done differently
-4. Suggest improvements based on state-of-the-art practices
-
-Write `REFACTOR_DIR/analysis/research_findings.md`:
- Current state analysis: patterns used, strengths, weaknesses
- Alternative approaches per component: current vs alternative, pros/cons, migration effort
- Prioritized recommendations: quick wins + strategic improvements
-
-#### 2b. Solution Assessment
-
-1. Assess current implementation against acceptance criteria
-2. Identify weak points in codebase, map to specific code areas
-3. Perform gap analysis: acceptance criteria vs current state
-4. Prioritize changes by impact and effort
-
-Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
- Weak points assessment: location, description, impact, proposed solution
- Gap analysis: what's missing, what needs improvement
- Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
-
-**Self-verification**:
- [ ] All acceptance criteria are addressed in gap analysis
- [ ] Recommendations are grounded in actual code, not abstract
- [ ] Roadmap phases are prioritized by impact
- [ ] Quick wins are identified separately
-
-**Save action**: Write analysis artifacts
-
-**BLOCKING**: Present refactoring roadmap to user. Do NOT proceed until user confirms.
-
-**Quick Assessment mode stops here.** Present final summary and write `FINAL_report.md` with phases 0-2 content.
-
---
-
-### Phase 3: Safety Net
-
-**Role**: QA engineer and developer
-**Goal**: Design and implement tests that capture current behavior before refactoring
-**Constraints**: Tests must all pass on the current codebase before proceeding
-
-#### 3a. Design Test Specs
-
-Coverage requirements (must meet before refactoring — see `.cursor/rules/cursor-meta.mdc` Quality Thresholds):
- Minimum overall coverage: 75%
- Critical path coverage: 90%
- All public APIs must have blackbox tests
- All error handling paths must be tested
-
-For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
- Blackbox tests: summary, current behavior, input data, expected result, max expected time
- Acceptance tests: summary, preconditions, steps with expected results
- Coverage analysis: current %, target %, uncovered critical paths
-
-#### 3b. Implement Tests
-
-1. Set up test environment and infrastructure if not exists
-2. Implement each test from specs
-3. Run tests, verify all pass on current codebase
-4. Document any discovered issues
-
-**Self-verification**:
- [ ] Coverage requirements met (75% overall, 90% critical paths)
- [ ] All tests pass on current codebase
- [ ] All public APIs have blackbox tests
- [ ] Test data fixtures are configured
-
-**Save action**: Write test specs; implemented tests go into the project's test folder
-
-**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 4. If tests fail, fix the tests (not the code) or ask user for guidance. Do NOT proceed to Phase 4 with failing tests.
-
---
-
-### Phase 4: Execution
-
-**Role**: Software architect and developer
-**Goal**: Analyze coupling and execute decoupling changes
-**Constraints**: Small incremental changes; tests must stay green after every change
-
-#### 4a. Analyze Coupling
-
-1. Analyze coupling between components/modules
-2. Map dependencies (direct and transitive)
-3. Identify circular dependencies
-4. Form decoupling strategy
-
-Write `REFACTOR_DIR/coupling_analysis.md`:
- Dependency graph (Mermaid)
- Coupling metrics per component
- Problem areas: components involved, coupling type, severity, impact
- Decoupling strategy: priority order, proposed interfaces/abstractions, effort estimates
-
-**BLOCKING**: Present coupling analysis to user. Do NOT proceed until user confirms strategy.
-
-#### 4b. Execute Decoupling
-
-For each change in the decoupling strategy:
-
-1. Implement the change
-2. Run blackbox tests
-3. Fix any failures
-4. Commit with descriptive message
-
-Address code smells encountered: long methods, large classes, duplicate code, dead code, magic numbers.
-
-Write `REFACTOR_DIR/execution_log.md`:
- Change description, files affected, test status per change
- Before/after metrics comparison against baseline
-
-**Self-verification**:
- [ ] All tests still pass after execution
- [ ] No circular dependencies remain (or reduced per plan)
- [ ] Code smells addressed
- [ ] Metrics improved compared to baseline
-
-**Save action**: Write execution artifacts
-
-**BLOCKING**: Present execution summary to user. Do NOT proceed until user confirms.
-
---
-
-### Phase 5: Hardening (Optional, Parallel Tracks)
-
-**Role**: Varies per track
-**Goal**: Address technical debt, performance, and security
-**Constraints**: Each track is optional; user picks which to run
-
-Present the three tracks and let user choose which to execute:
-
-#### Track A: Technical Debt
-
-**Role**: Technical debt analyst
-
-1. Identify and categorize debt items: design, code, test, documentation
-2. Assess each: location, description, impact, effort, interest (cost of not fixing)
-3. Prioritize: quick wins → strategic debt → tolerable debt
-4. Create actionable plan with prevention measures
-
-Write `REFACTOR_DIR/hardening/technical_debt.md`
-
-#### Track B: Performance Optimization
-
-**Role**: Performance engineer
-
-1. Profile current performance, identify bottlenecks
-2. For each bottleneck: location, symptom, root cause, impact
-3. Propose optimizations with expected improvement and risk
-4. Implement one at a time, benchmark after each change
-5. Verify tests still pass
-
-Write `REFACTOR_DIR/hardening/performance.md` with before/after benchmarks
-
-#### Track C: Security Review
-
-**Role**: Security engineer
-
-1. Review code against OWASP Top 10
-2. Verify security requirements from `security_approach.md` are met
-3. Check: authentication, authorization, input validation, output encoding, encryption, logging
-
-Write `REFACTOR_DIR/hardening/security.md`:
- Vulnerability assessment: location, type, severity, exploit scenario, fix
- Security controls review
- Compliance check against `security_approach.md`
- Recommendations: critical fixes, improvements, hardening
-
-**Self-verification** (per track):
- [ ] All findings are grounded in actual code
- [ ] Recommendations are actionable with effort estimates
- [ ] All tests still pass after any changes
-
-**Save action**: Write hardening artifacts
-
---
+| Phase | File | Summary | Gate |
+|-------|------|---------|------|
+| 0 | `phases/00-baseline.md` | Collect goals, create RUN_DIR, capture baseline metrics | BLOCKING: user confirms |
+| 1 | `phases/01-discovery.md` | Document components (scoped for guided mode), produce list-of-changes.md | BLOCKING: user confirms |
+| 2 | `phases/02-analysis.md` | Research improvements, produce roadmap, create epic, decompose into tasks in TASKS_DIR | BLOCKING: user confirms |
+| | | *Quick Assessment stops here* | |
+| 3 | `phases/03-safety-net.md` | Check existing tests or implement pre-refactoring tests (skip for testability runs) | GATE: all tests pass |
+| 4 | `phases/04-execution.md` | Delegate task execution to implement skill | GATE: implement completes |
+| 5 | `phases/05-test-sync.md` | Remove obsolete, update broken, add new tests | GATE: all tests pass |
+| 6 | `phases/06-verification.md` | Run full suite, compare metrics vs baseline | GATE: all pass, no regressions |
+| 7 | `phases/07-documentation.md` | Update `_docs/` to reflect refactored state | Skip if `_docs/02_document/` absent |
+
+**Workflow mode detection:**
+- "quick assessment" / "just assess" → phases 0–2
+- "refactor [specific target]" → skip phase 1 if docs exist
+- Default → all phases
+
+At the start of execution, create a TodoWrite with all applicable phases.
+
+## Artifact Structure
+
+All artifacts are written to RUN_DIR:
+
+```
+baseline_metrics.md                      Phase 0
+discovery/components/[##]_[name].md      Phase 1
+discovery/solution.md                    Phase 1
+discovery/system_flows.md                Phase 1
+list-of-changes.md                       Phase 1
+analysis/research_findings.md            Phase 2
+analysis/refactoring_roadmap.md          Phase 2
+test_specs/[##]_[test_name].md           Phase 3
+execution_log.md                         Phase 4
+test_sync/{obsolete_tests,updated_tests,new_tests}.md  Phase 5
+verification_report.md                   Phase 6
+doc_update_log.md                        Phase 7
+FINAL_report.md                          after all phases
+```
+
+Task files produced during Phase 2 go to TASKS_TODO (not RUN_DIR):
+```
+TASKS_TODO/[TRACKER-ID]_refactor_[short_name].md
+TASKS_DIR/_dependencies_table.md (appended)
+```
+
+**Resumability**: match existing artifacts to phases above, resume from next incomplete phase.

 ## Final Report

-After all executed phases complete, write `REFACTOR_DIR/FINAL_report.md`:
-
- Refactoring mode used and phases executed
- Baseline metrics vs final metrics comparison
- Changes made summary
- Remaining items (deferred to future)
- Lessons learned
+After all phases complete, write `RUN_DIR/FINAL_report.md`:
+mode used (automatic/guided), input mode, phases executed, baseline vs final metrics, changes summary, remaining items, lessons learned.

 ## Escalation Rules

 | Situation | Action |
 |-----------|--------|
-| Unclear refactoring scope | **ASK user** |
-| Ambiguous acceptance criteria | **ASK user** |
+| Unclear scope or ambiguous criteria | **ASK user** |
 | Tests failing before refactoring | **ASK user** — fix tests or fix code? |
-| Coupling change risks breaking external contracts | **ASK user** |
-| Performance optimization vs readability trade-off | **ASK user** |
-| Missing baseline metrics (no test suite, no CI) | **WARN user**, suggest building safety net first |
-| Security vulnerability found during refactoring | **WARN user** immediately, don't defer |
-
-## Trigger Conditions
-
-When the user wants to:
- Improve existing code structure or quality
- Reduce technical debt or coupling
- Prepare codebase for new features
- Assess code health before major changes
-
-**Keywords**: "refactor", "refactoring", "improve code", "reduce coupling", "technical debt", "code quality", "decoupling"
-
-## Methodology Quick Reference
-
-```
-┌────────────────────────────────────────────────────────────────┐
-│           Structured Refactoring (6-Phase Method)              │
-├────────────────────────────────────────────────────────────────┤
-│ CONTEXT: Resolve mode (project vs standalone) + set paths      │
-│ MODE: Full / Targeted / Quick Assessment                       │
-│                                                                │
-│ 0. Context & Baseline  → baseline_metrics.md                   │
-│    [BLOCKING: user confirms baseline]                          │
-│ 1. Discovery           → discovery/ (components, solution)     │
-│    [BLOCKING: user confirms documentation]                     │
-│ 2. Analysis            → analysis/ (research, roadmap)         │
-│    [BLOCKING: user confirms roadmap]                           │
-│    ── Quick Assessment stops here ──                           │
-│ 3. Safety Net          → test_specs/ + implemented tests       │
-│    [GATE: all tests must pass]                                 │
-│ 4. Execution           → coupling_analysis, execution_log      │
-│    [BLOCKING: user confirms changes]                           │
-│ 5. Hardening           → hardening/ (debt, perf, security)     │
-│    [optional, user picks tracks]                               │
-│    ─────────────────────────────────────────────────           │
-│    FINAL_report.md                                             │
-├────────────────────────────────────────────────────────────────┤
-│ Principles: Preserve behavior · Measure before/after           │
-│             Small changes · Save immediately · Ask don't assume│
-└────────────────────────────────────────────────────────────────┘
-```
+| Risk of breaking external contracts | **ASK user** |
+| Performance vs readability trade-off | **ASK user** |
+| No test suite or CI exists | **WARN user**, suggest safety net first |
+| Security vulnerability found | **WARN user** immediately |
+| Implement skill reports failures | **ASK user** — review batch reports |
@@ -0,0 +1,52 @@
+# Phase 0: Context & Baseline
+
+**Role**: Software engineer preparing for refactoring
+**Goal**: Collect refactoring goals, create run directory, capture baseline metrics
+**Constraints**: Measurement only — no code changes
+
+## 0a. Collect Goals
+
+If PROBLEM_DIR files do not yet exist, help the user create them:
+
+1. `problem.md` — what the system currently does, what changes are needed, pain points
+2. `acceptance_criteria.md` — success criteria for the refactoring
+3. `security_approach.md` — security requirements (if applicable)
+
+Store in PROBLEM_DIR.
+
+## 0b. Create RUN_DIR
+
+1. Scan REFACTOR_DIR for existing `NN-*` folders
+2. Auto-increment the numeric prefix (e.g., if `01-testability-refactoring` exists, next is `02-...`)
+3. Determine the run name:
+   - If guided mode with input file: derive from input file name or context (e.g., `01-testability-refactoring`)
+   - If automatic mode: ask user for a short run name, or derive from goals (e.g., `01-coupling-refactoring`)
+4. Create `REFACTOR_DIR/NN-[run-name]/` — this is RUN_DIR for the rest of the workflow
+
+Announce RUN_DIR path to user.
+
+## 0c. Capture Baseline
+
+1. Read problem description and acceptance criteria
+2. Measure current system metrics using project-appropriate tools:
+
+| Metric Category | What to Capture |
+|----------------|-----------------|
+| **Coverage** | Overall, unit, blackbox, critical paths |
+| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
+| **Code Smells** | Total, critical, major |
+| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
+| **Dependencies** | Total count, outdated, security vulnerabilities |
+| **Build** | Build time, test execution time, deployment time |
+
+3. Create functionality inventory: all features/endpoints with status and coverage
+
+**Self-verification**:
+- [ ] RUN_DIR created with correct auto-incremented prefix
+- [ ] All metric categories measured (or noted as N/A with reason)
+- [ ] Functionality inventory is complete
+- [ ] Measurements are reproducible
+
+**Save action**: Write `RUN_DIR/baseline_metrics.md`
+
+**BLOCKING**: Present baseline summary to user. Do NOT proceed until user confirms.
@@ -0,0 +1,119 @@
+# Phase 1: Discovery
+
+**Role**: Principal software architect
+**Goal**: Analyze existing code and produce `RUN_DIR/list-of-changes.md`
+**Constraints**: Document what exists, identify what needs to change. No code changes.
+
+**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.
+
+## Mode Branch
+
+Determine the input mode set during Context Resolution (see SKILL.md):
+
+- **Guided mode**: input file provided → start with 1g below
+- **Automatic mode**: no input file → start with 1a below
+
+---
+
+## Guided Mode
+
+### 1g. Read and Validate Input File
+
+1. Read the provided input file (e.g., `list-of-changes.md` from the autopilot testability revision step or user-provided file)
+2. Extract file paths, problem descriptions, and proposed changes from each entry
+3. For each entry, verify against actual codebase:
+   - Referenced files exist
+   - Described problems are accurate (read the code, confirm the issue)
+   - Proposed changes are feasible
+4. Flag any entries that reference nonexistent files or describe inaccurate problems — ASK user
+
+### 1h. Scoped Component Analysis
+
+For each file/area referenced in the input file:
+
+1. Analyze the specific modules and their immediate dependencies
+2. Document component structure, interfaces, and coupling points relevant to the proposed changes
+3. Identify additional issues not in the input file but discovered during analysis of the same areas
+
+Write per-component to `RUN_DIR/discovery/components/[##]_[name].md` (same format as automatic mode, but scoped to affected areas only).
+
+### 1i. Produce List of Changes
+
+1. Start from the validated input file entries
+2. Enrich each entry with:
+   - Exact file paths confirmed from code
+   - Risk assessment (low/medium/high)
+   - Dependencies between changes
+3. Add any additional issues discovered during scoped analysis (1h)
+4. Write `RUN_DIR/list-of-changes.md` using `templates/list-of-changes.md` format
+   - Set **Mode**: `guided`
+   - Set **Source**: path to the original input file
+
+Skip to **Save action** below.
+
+---
+
+## Automatic Mode
+
+### 1a. Document Components
+
+For each component in the codebase:
+
+1. Analyze project structure, directories, files
+2. Go file by file, analyze each method
+3. Analyze connections between components
+
+Write per component to `RUN_DIR/discovery/components/[##]_[name].md`:
+- Purpose and architectural patterns
+- Mermaid diagrams for logic flows
+- API reference table (name, description, input, output)
+- Implementation details: algorithmic complexity, state management, dependencies
+- Caveats, edge cases, known limitations
+
+### 1b. Synthesize Solution & Flows
+
+1. Review all generated component documentation
+2. Synthesize into a cohesive solution description
+3. Create flow diagrams showing component interactions
+
+Write:
+- `RUN_DIR/discovery/solution.md` — product description, component overview, interaction diagram
+- `RUN_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case
+
+Also copy to project standard locations:
+- `SOLUTION_DIR/solution.md`
+- `DOCUMENT_DIR/system_flows.md`
+
+### 1c. Produce List of Changes
+
+From the component analysis and solution synthesis, identify all issues that need refactoring:
+
+1. Hardcoded values (paths, config, magic numbers)
+2. Tight coupling between components
+3. Missing dependency injection / non-configurable parameters
+4. Global mutable state
+5. Code duplication
+6. Missing error handling
+7. Testability blockers (code that cannot be exercised in isolation)
+8. Security concerns
+9. Performance bottlenecks
+
+Write `RUN_DIR/list-of-changes.md` using `templates/list-of-changes.md` format:
+- Set **Mode**: `automatic`
+- Set **Source**: `self-discovered`
+
+---
+
+## Save action (both modes)
+
+Write all discovery artifacts to RUN_DIR.
+
+**Self-verification**:
+- [ ] Every referenced file in list-of-changes.md exists in the codebase
+- [ ] Each change entry has file paths, problem, change description, risk, and dependencies
+- [ ] Component documentation covers all areas affected by the changes
+- [ ] In guided mode: all input file entries are validated or flagged
+- [ ] In automatic mode: solution description covers all components
+- [ ] Mermaid diagrams are syntactically correct
+
+**BLOCKING**: Present discovery summary and list-of-changes.md to user. Do NOT proceed until user confirms documentation accuracy and change list completeness.
@@ -0,0 +1,94 @@
+# Phase 2: Analysis & Task Decomposition
+
+**Role**: Researcher, software architect, and task planner
+**Goal**: Research improvements, produce a refactoring roadmap, and decompose into implementable tasks
+**Constraints**: Analysis and planning only — no code changes
+
+## 2a. Deep Research
+
+1. Analyze current implementation patterns
+2. Research modern approaches for similar systems
+3. Identify what could be done differently
+4. Suggest improvements based on state-of-the-art practices
+
+Write `RUN_DIR/analysis/research_findings.md`:
+- Current state analysis: patterns used, strengths, weaknesses
+- Alternative approaches per component: current vs alternative, pros/cons, migration effort
+- Prioritized recommendations: quick wins + strategic improvements
+
+## 2b. Solution Assessment & Hardening Tracks
+
+1. Assess current implementation against acceptance criteria
+2. Identify weak points in codebase, map to specific code areas
+3. Perform gap analysis: acceptance criteria vs current state
+4. Prioritize changes by impact and effort
+
+Present optional hardening tracks for user to include in the roadmap:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Include hardening tracks?
+══════════════════════════════════════
+ A) Technical Debt — identify and address design/code/test debt
+ B) Performance Optimization — profile, identify bottlenecks, optimize
+ C) Security Review — OWASP Top 10, auth, encryption, input validation
+ D) All of the above
+ E) None — proceed with structural refactoring only
+══════════════════════════════════════
+```
+
+For each selected track, add entries to `RUN_DIR/list-of-changes.md` (append to the file produced in Phase 1):
+- **Track A**: tech debt items with location, impact, effort
+- **Track B**: performance bottlenecks with profiling data
+- **Track C**: security findings with severity and fix description
+
+Write `RUN_DIR/analysis/refactoring_roadmap.md`:
+- Weak points assessment: location, description, impact, proposed solution
+- Gap analysis: what's missing, what needs improvement
+- Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
+- Selected hardening tracks and their items
+
+## 2c. Create Epic
+
+Create a work item tracker epic for this refactoring run:
+
+1. Epic name: the RUN_DIR name (e.g., `01-testability-refactoring`)
+2. Create the epic via configured tracker MCP
+3. Record the Epic ID — all tasks in 2d will be linked under this epic
+4. If tracker unavailable, use `PENDING` placeholder and note for later
+
+## 2d. Task Decomposition
+
+Convert the finalized `RUN_DIR/list-of-changes.md` into implementable task files.
+
+1. Read `RUN_DIR/list-of-changes.md`
+2. For each change entry (or group of related entries), create an atomic task file in TASKS_DIR:
+   - Use the standard task template format (`.cursor/skills/decompose/templates/task.md`)
+   - File naming: `[##]_refactor_[short_name].md` (temporary numeric prefix)
+   - **Task**: `PENDING_refactor_[short_name]`
+   - **Description**: derived from the change entry's Problem + Change fields
+   - **Complexity**: estimate 1-5 points; split into multiple tasks if >5
+   - **Dependencies**: map change-level dependencies (C01, C02) to task-level tracker IDs
+   - **Component**: from the change entry's File(s) field
+   - **Epic**: the epic created in 2c
+   - **Acceptance Criteria**: derived from the change entry — verify the problem is resolved
+3. Create work item ticket for each task under the epic from 2c
+4. Rename each file to `[TRACKER-ID]_refactor_[short_name].md` after ticket creation
+5. Update or append to `TASKS_DIR/_dependencies_table.md` with the refactoring tasks
+
+**Self-verification**:
+- [ ] All acceptance criteria are addressed in gap analysis
+- [ ] Recommendations are grounded in actual code, not abstract
+- [ ] Roadmap phases are prioritized by impact
+- [ ] Epic created and all tasks linked to it
+- [ ] Every entry in list-of-changes.md has a corresponding task file in TASKS_DIR
+- [ ] No task exceeds 5 complexity points
+- [ ] Task dependencies are consistent (no circular dependencies)
+- [ ] `_dependencies_table.md` includes all refactoring tasks
+- [ ] Every task has a work item ticket (or PENDING placeholder)
+
+**Save action**: Write analysis artifacts to RUN_DIR, task files to TASKS_DIR
+
+**BLOCKING**: Present refactoring roadmap and task list to user. Do NOT proceed until user confirms.
+
+**Quick Assessment mode stops here.** Present final summary and write `FINAL_report.md` with phases 0-2 content.
@@ -0,0 +1,57 @@
+# Phase 3: Safety Net
+
+**Role**: QA engineer and developer
+**Goal**: Ensure tests exist that capture current behavior before refactoring
+**Constraints**: Tests must all pass on the current codebase before proceeding
+
+## Skip Condition: Testability Refactoring
+
+If the current run name contains `testability` (e.g., `01-testability-refactoring`), **skip Phase 3 entirely**. The purpose of a testability run is to make the code testable so that tests can be written afterward. Announce the skip and proceed to Phase 4.
+
+## 3a. Check Existing Tests
+
+Before designing or implementing any new tests, check what already exists:
+
+1. Scan the project for existing test files (unit tests, integration tests, blackbox tests)
+2. Run the existing test suite — record pass/fail counts
+3. Measure current coverage against the areas being refactored (from `RUN_DIR/list-of-changes.md` file paths)
+4. Assess coverage against thresholds:
+   - Minimum overall coverage: 75%
+   - Critical path coverage: 90%
+   - All public APIs must have blackbox tests
+   - All error handling paths must be tested
+
+If existing tests meet all thresholds for the refactoring areas:
+- Document the existing coverage in `RUN_DIR/test_specs/existing_coverage.md`
+- Skip to the GATE check below
+
+If existing tests partially cover the refactoring areas:
+- Document what is covered and what gaps remain
+- Proceed to 3b only for the uncovered areas
+
+If no relevant tests exist:
+- Proceed to 3b for full test design
+
+## 3b. Design Test Specs (for uncovered areas only)
+
+For each uncovered critical area, write test specs to `RUN_DIR/test_specs/[##]_[test_name].md`:
+- Blackbox tests: summary, current behavior, input data, expected result, max expected time
+- Acceptance tests: summary, preconditions, steps with expected results
+- Coverage analysis: current %, target %, uncovered critical paths
+
+## 3c. Implement Tests (for uncovered areas only)
+
+1. Set up test environment and infrastructure if not exists
+2. Implement each test from specs
+3. Run tests, verify all pass on current codebase
+4. Document any discovered issues
+
+**Self-verification**:
+- [ ] Coverage requirements met (75% overall, 90% critical paths) across existing + new tests
+- [ ] All tests pass on current codebase
+- [ ] All public APIs in refactoring scope have blackbox tests
+- [ ] Test data fixtures are configured
+
+**Save action**: Write test specs to RUN_DIR; implemented tests go into the project's test folder
+
+**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 4. If tests fail, fix the tests (not the code) or ask user for guidance. Do NOT proceed to Phase 4 with failing tests.
@@ -0,0 +1,63 @@
+# Phase 4: Execution
+
+**Role**: Orchestrator
+**Goal**: Execute all refactoring tasks by delegating to the implement skill
+**Constraints**: No inline code changes — all implementation goes through the implement skill's batching and review pipeline
+
+## 4a. Pre-Flight Checks
+
+1. Verify refactoring task files exist in TASKS_DIR (created during Phase 2d):
+   - All `[TRACKER-ID]_refactor_*.md` files are present
+   - Each task file has valid header fields (Task, Name, Description, Complexity, Dependencies)
+2. Verify `TASKS_DIR/_dependencies_table.md` includes the refactoring tasks
+3. Verify all tests pass (safety net from Phase 3 is green)
+4. If any check fails, go back to the relevant phase to fix
+
+## 4b. Delegate to Implement Skill
+
+Read and execute `.cursor/skills/implement/SKILL.md`.
+
+The implement skill will:
+1. Parse task files and dependency graph from TASKS_DIR
+2. Detect already-completed tasks (skip non-refactoring tasks from prior workflow steps)
+3. Compute execution batches for the refactoring tasks
+4. Launch implementer subagents (up to 4 in parallel)
+5. Run code review after each batch
+6. Commit and push per batch
+7. Update work item ticket status
+
+Do NOT modify, skip, or abbreviate any part of the implement skill's workflow. The refactor skill is delegating execution, not optimizing it.
+
+## 4c. Capture Results
+
+After the implement skill completes:
+
+1. Read batch reports from `_docs/03_implementation/batch_*_report.md`
+2. Read the latest `_docs/03_implementation/implementation_report_*.md` file
+3. Write `RUN_DIR/execution_log.md` summarizing:
+   - Total tasks executed
+   - Batches completed
+   - Code review verdicts per batch
+   - Files modified (aggregate list)
+   - Any blocked or failed tasks
+   - Links to batch reports
+
+## 4d. Update Task Statuses
+
+For each successfully completed refactoring task:
+
+1. Transition the work item ticket status to **Done** via the configured tracker MCP
+2. If tracker unavailable, note the pending status transitions in `RUN_DIR/execution_log.md`
+
+For any failed or blocked tasks, leave their status as-is (the implement skill already set them to In Testing or blocked).
+
+**Self-verification**:
+- [ ] All refactoring tasks show as completed in batch reports
+- [ ] All completed tasks have work item tracker status set to Done
+- [ ] All tests still pass after execution
+- [ ] No tasks remain in blocked or failed state (or user has acknowledged them)
+- [ ] `RUN_DIR/execution_log.md` written with links to batch reports
+
+**Save action**: Write `RUN_DIR/execution_log.md`
+
+**GATE**: All refactoring tasks must be implemented. If any tasks failed, present the failures to the user and ask for guidance before proceeding to Phase 5.
@@ -0,0 +1,53 @@
+# Phase 5: Test Synchronization
+
+**Role**: QA engineer and developer
+**Goal**: Reconcile the test suite with the refactored codebase — remove obsolete tests, update broken tests, add tests for new code
+**Constraints**: All tests must pass at the end of this phase. Do not change production code here — only tests.
+
+**Skip condition**: If the run name contains `testability`, skip Phase 5 entirely — no test suite exists yet to synchronize. Proceed directly to Phase 6.
+
+## 5a. Identify Obsolete Tests
+
+1. Compare the pre-refactoring codebase structure (from Phase 0 inventory) with the current state
+2. Find tests that reference removed functions, classes, modules, or endpoints
+3. Find tests that duplicate coverage due to merged/consolidated code
+4. Decide per test: **delete** (functionality removed) or **merge** (duplicates)
+
+Write `RUN_DIR/test_sync/obsolete_tests.md`:
+- Test file, test name, reason (target removed / target merged / duplicate coverage), action taken (deleted / merged into)
+
+## 5b. Update Existing Tests
+
+1. Run the full test suite — collect failures and errors
+2. For each failing test, determine the cause:
+   - Renamed/moved function or module → update import paths and references
+   - Changed function signature → update call sites and assertions
+   - Changed behavior (intentional per refactoring plan) → update expected values
+   - Changed data structures → update fixtures and assertions
+3. Fix each test, re-run to confirm it passes
+
+Write `RUN_DIR/test_sync/updated_tests.md`:
+- Test file, test name, change type (import path / signature / assertion / fixture), description of update
+
+## 5c. Add New Tests
+
+1. Identify new code introduced during Phase 4 that lacks test coverage:
+   - New public functions, classes, or modules
+   - New interfaces or abstractions introduced during decoupling
+   - New error handling paths
+2. Write tests following the same patterns and conventions as the existing test suite
+3. Ensure coverage targets from Phase 3 are maintained or improved
+
+Write `RUN_DIR/test_sync/new_tests.md`:
+- Test file, test name, target function/module, coverage type (unit / integration / blackbox)
+
+**Self-verification**:
+- [ ] All obsolete tests removed or merged
+- [ ] All pre-existing tests pass after updates
+- [ ] New code from Phase 4 has test coverage
+- [ ] Overall coverage meets or exceeds Phase 3 baseline (75% overall, 90% critical paths)
+- [ ] No tests reference removed or renamed code
+
+**Save action**: Write test_sync artifacts; implemented tests go into the project's test folder
+
+**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 6. If tests fail, fix the tests or ask user for guidance.
@@ -0,0 +1,53 @@
+# Phase 6: Final Verification
+
+**Role**: QA engineer
+**Goal**: Run all tests end-to-end, compare final metrics against baseline, and confirm the refactoring succeeded
+**Constraints**: No code changes. If failures are found, go back to the appropriate phase (4/5) to fix before retrying.
+
+**Skip condition**: If the run name contains `testability`, skip Phase 6 entirely — no test suite exists yet to verify against. Proceed directly to Phase 7.
+
+## 6a. Run Full Test Suite
+
+1. Run unit tests, integration tests, and blackbox tests
+2. Run acceptance tests derived from `acceptance_criteria.md`
+3. Record pass/fail counts and any failures
+
+If any test fails:
+- Determine whether the failure is a test issue (→ return to Phase 5) or a code issue (→ return to Phase 4)
+- Do NOT proceed until all tests pass
+
+## 6b. Capture Final Metrics
+
+Re-measure all metrics from Phase 0 baseline using the same tools:
+
+| Metric Category | What to Capture |
+|----------------|-----------------|
+| **Coverage** | Overall, unit, blackbox, critical paths |
+| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
+| **Code Smells** | Total, critical, major |
+| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
+| **Dependencies** | Total count, outdated, security vulnerabilities |
+| **Build** | Build time, test execution time, deployment time |
+
+## 6c. Compare Against Baseline
+
+1. Read `RUN_DIR/baseline_metrics.md`
+2. Produce a side-by-side comparison: baseline vs final for every metric
+3. Flag any regressions (metrics that got worse)
+4. Verify acceptance criteria are met
+
+Write `RUN_DIR/verification_report.md`:
+- Test results summary: total, passed, failed, skipped
+- Metric comparison table: metric, baseline value, final value, delta, status (improved / unchanged / regressed)
+- Acceptance criteria checklist: criterion, status (met / not met), evidence
+- Regressions (if any): metric, severity, explanation
+
+**Self-verification**:
+- [ ] All tests pass (zero failures)
+- [ ] All acceptance criteria are met
+- [ ] No critical metric regressions
+- [ ] Metrics are captured with the same tools/methodology as Phase 0
+
+**Save action**: Write `RUN_DIR/verification_report.md`
+
+**GATE (BLOCKING)**: All tests must pass and no critical regressions. Present verification report to user. Do NOT proceed to Phase 7 until user confirms.
@@ -0,0 +1,45 @@
+# Phase 7: Documentation Update
+
+**Role**: Technical writer
+**Goal**: Update existing `_docs/` artifacts to reflect all changes made during refactoring
+**Constraints**: Documentation only — no code changes. Only update docs that are affected by refactoring changes.
+
+**Skip condition**: If no `_docs/02_document/` directory exists, skip this phase entirely.
+
+## 7a. Identify Affected Documentation
+
+1. Review `RUN_DIR/execution_log.md` to list all files changed during Phase 4
+2. Review test changes from Phase 5
+3. Map changed files to their corresponding module docs in `_docs/02_document/modules/`
+4. Map changed modules to their parent component docs in `_docs/02_document/components/`
+5. Determine if system-level docs need updates (`architecture.md`, `system-flows.md`, `data_model.md`)
+6. Determine if test documentation needs updates (`_docs/02_document/tests/`)
+
+## 7b. Update Module Documentation
+
+For each module doc affected by refactoring changes:
+1. Re-read the current source file
+2. Update the module doc to reflect new/changed interfaces, dependencies, internal logic
+3. Remove documentation for deleted code; add documentation for new code
+
+## 7c. Update Component Documentation
+
+For each component doc affected:
+1. Re-read the updated module docs within the component
+2. Update inter-module interfaces, dependency graphs, caveats
+3. Update the component relationship diagram if component boundaries changed
+
+## 7d. Update System-Level Documentation
+
+If structural changes were made (new modules, removed modules, changed interfaces):
+1. Update `_docs/02_document/architecture.md` if architecture changed
+2. Update `_docs/02_document/system-flows.md` if flow sequences changed
+3. Update `_docs/02_document/diagrams/components.md` if component relationships changed
+
+**Self-verification**:
+- [ ] Every changed source file has an up-to-date module doc
+- [ ] Component docs reflect the refactored structure
+- [ ] No stale references to removed code in any doc
+- [ ] Dependency graphs in docs match actual imports
+
+**Save action**: Updated docs written in-place to `_docs/02_document/`
@@ -0,0 +1,49 @@
+# List of Changes Template
+
+Save as `RUN_DIR/list-of-changes.md`. Produced during Phase 1 (Discovery).
+
+---
+
+```markdown
+# List of Changes
+
+**Run**: [NN-run-name]
+**Mode**: [automatic | guided]
+**Source**: [self-discovered | path/to/input-file.md]
+**Date**: [YYYY-MM-DD]
+
+## Summary
+
+[1-2 sentence overview of what this refactoring run addresses]
+
+## Changes
+
+### C01: [Short Title]
+- **File(s)**: [file paths, comma-separated]
+- **Problem**: [what makes this problematic / untestable / coupled]
+- **Change**: [what to do — behavioral description, not implementation steps]
+- **Rationale**: [why this change is needed]
+- **Risk**: [low | medium | high]
+- **Dependencies**: [other change IDs this depends on, or "None"]
+
+### C02: [Short Title]
+- **File(s)**: [file paths]
+- **Problem**: [description]
+- **Change**: [description]
+- **Rationale**: [description]
+- **Risk**: [low | medium | high]
+- **Dependencies**: [C01, or "None"]
+```
+
+---
+
+## Guidelines
+
+- **Change IDs** use format `C##` (C01, C02, ...) — sequential within the run
+- Each change should map to one atomic task (1-5 complexity points); split if larger
+- **File(s)** must reference actual files verified to exist in the codebase
+- **Problem** describes the current state, not the desired state
+- **Change** describes what the system should do differently — behavioral, not prescriptive
+- **Dependencies** reference other change IDs within this list; cross-run dependencies use tracker IDs
+- In guided mode, the input file entries are validated against actual code and enriched with file paths, risk, and dependencies before writing
+- In automatic mode, entries are derived from Phase 1 component analysis and Phase 2 research findings
@@ -112,9 +112,6 @@ When the user wants to:
 - Assess or improve an existing solution draft

 **Differentiation from other Skills**:
- Needs a **visual knowledge graph** → use `research-to-diagram`
- Needs **written output** (articles/tutorials) → use `wsy-writer`
- Needs **material organization** → use `material-to-markdown`
 - Needs **research + solution draft** → use this Skill

 ## Stakeholder Perspectives
@@ -32,10 +32,7 @@ Fixed paths:

 - IMPL_DIR: `_docs/03_implementation/`
 - METRICS_DIR: `_docs/06_metrics/`
- TASKS_DIR: `_docs/02_tasks/done/`
- TASKS_ROOT: `_docs/02_tasks/`
-
-TASKS_DIR points to `done/` because retrospective analyzes completed work. To compute broader metrics (e.g., backlog size), also scan `TASKS_ROOT/backlog/` and `TASKS_ROOT/todo/`.
+- TASKS_DIR: `_docs/02_tasks/` (scan all subfolders: `todo/`, `backlog/`, `done/`)

 Announce the resolved paths to the user before proceeding.

@@ -75,7 +72,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 3). Upda
 | `batch_*_report.md` | Tasks per batch, batch count, task statuses (Done/Blocked/Partial) |
 | Code review sections in batch reports | PASS/FAIL/PASS_WITH_WARNINGS ratios, finding counts by severity and category |
 | Task spec files in TASKS_DIR | Complexity points per task, dependency count |
-| `FINAL_implementation_report.md` | Total tasks, total batches, overall duration |
+| `implementation_report_*.md` | Total tasks, total batches, overall duration |
 | Git log (if available) | Commits per batch, files changed per batch |

 #### Metrics to Compute
@@ -1,6 +1,6 @@
 # Retrospective Report Template

-Save as `_docs/05_metrics/retro_[YYYY-MM-DD].md`.
+Save as `_docs/06_metrics/retro_[YYYY-MM-DD].md`.

 ---

@@ -21,8 +21,8 @@ Run the project's test suite and report results. This skill is invoked by the au

 Check in order — first match wins:

-1. `scripts/run-tests.sh` exists → use it
-2. `docker-compose.test.yml` or equivalent test environment exists → spin it up first, then detect runner below
+1. `scripts/run-tests.sh` exists → use it (the script already encodes the correct execution strategy)
+2. `docker-compose.test.yml` exists → run the Docker Suitability Check (see below). Docker is preferred; use it unless hardware constraints prevent it.
 3. Auto-detect from project files:
   - `pytest.ini`, `pyproject.toml` with `[tool.pytest]`, or `conftest.py` → `pytest`
   - `*.csproj` or `*.sln` → `dotnet test`
@@ -32,6 +32,37 @@ Check in order — first match wins:

 If no runner detected → report failure and ask user to specify.

+#### Docker Suitability Check
+
+Docker is the preferred test environment. Before using it, verify no constraints prevent easy Docker execution:
+
+1. Check `_docs/02_document/tests/environment.md` for a "Test Execution" decision (if the test-spec skill already assessed this, follow that decision)
+2. If no prior decision exists, check for disqualifying factors:
+   - Hardware bindings: GPU, MPS, CUDA, TPU, FPGA, sensors, cameras, serial devices, host-level drivers
+   - Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs
+   - Data/volume constraints: large files (> 100MB) impractical to copy into a container
+   - Network/environment: host networking, VPN, specific DNS/firewall rules
+   - Performance: Docker overhead would invalidate benchmarks or latency measurements
+3. If any disqualifying factor found → fall back to local test runner. Present to user using Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Docker is preferred but factors
+ preventing easy Docker execution detected
+══════════════════════════════════════
+ Factors detected:
+ - [list factors]
+══════════════════════════════════════
+ A) Run tests locally (recommended)
+ B) Run tests in Docker anyway
+══════════════════════════════════════
+ Recommendation: A — detected constraints prevent
+ easy Docker execution
+══════════════════════════════════════
+```
+
+4. If no disqualifying factors → use Docker (preferred default)
+
 ### 2. Run Tests

 1. Execute the detected test runner
@@ -44,30 +75,45 @@ Present a summary:

 ```
 ══════════════════════════════════════
- TEST RESULTS: [N passed, M failed, K skipped]
+ TEST RESULTS: [N passed, M failed, K skipped, E errors]
 ══════════════════════════════════════
 ```

-### 4. Handle Outcome
+**Important**: Collection errors (import failures, missing dependencies, syntax errors) count as failures — they are not "skipped" or ignorable.
+
+### 4. Diagnose Failures
+
+Before presenting choices, list every failing/erroring test with a one-line root cause:
+
+```
+Failures:
+ 1. test_foo.py::test_bar — missing dependency 'netron' (not installed)
+ 2. test_baz.py::test_qux — AssertionError: expected 5, got 3 (logic error)
+ 3. test_old.py::test_legacy — ImportError: no module 'removed_module' (possibly obsolete)
+```
+
+Categorize each as: **missing dependency**, **broken import**, **logic/assertion error**, **possibly obsolete**, or **environment-specific**.
+
+### 5. Handle Outcome

 **All tests pass** → return success to the autopilot for auto-chain.

-**Tests fail** → present using Choose format:
+**Any test fails or errors** → this is a **blocking gate**. Never silently ignore or skip failures. Present using Choose format:

 ```
 ══════════════════════════════════════
- TEST RESULTS: [N passed, M failed, K skipped]
+ TEST RESULTS: [N passed, M failed, K skipped, E errors]
 ══════════════════════════════════════
- A) Fix failing tests and re-run
- B) Proceed anyway (not recommended)
+ A) Investigate and fix failing tests/code, then re-run
+ B) Remove obsolete tests (if diagnosis shows they are no longer relevant)
 C) Abort — fix manually
 ══════════════════════════════════════
 Recommendation: A — fix failures before proceeding
 ══════════════════════════════════════
 ```

- If user picks A → attempt to fix failures, then re-run (loop back to step 2)
- If user picks B → return success with warning to the autopilot
+- If user picks A → investigate root causes, attempt fixes, then re-run (loop back to step 2)
+- If user picks B → confirm which tests to remove, delete them, then re-run (loop back to step 2)
 - If user picks C → return failure to the autopilot

 ## Trigger Conditions
@@ -147,7 +147,7 @@ If TESTS_OUTPUT_DIR already contains files:

 ## Progress Tracking

-At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes.
+At the start of execution, create a TodoWrite with all four phases. Update status as each phase completes.

 ## Workflow

@@ -209,7 +209,7 @@ Based on all acquired data, acceptance_criteria, and restrictions, form detailed
 - [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
 - [ ] Positive and negative scenarios are balanced
 - [ ] Consumer app has no direct access to system internals
- [ ] Docker environment is self-contained (`docker compose up` sufficient)
+- [ ] Test environment matches project constraints (see Docker Suitability Assessment below)
 - [ ] External dependencies have mock/stub services defined
 - [ ] Traceability matrix has no uncovered AC or restrictions

@@ -337,11 +337,53 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab

 ---

+### Docker Suitability Assessment (BLOCKING — runs before Phase 4)
+
+Docker is the **preferred** test execution environment (reproducibility, isolation, CI parity). Before generating scripts, check whether the project has any constraints that prevent easy Docker usage.
+
+**Disqualifying factors** (any one is sufficient to fall back to local):
+- Hardware bindings: GPU, MPS, TPU, FPGA, accelerators, sensors, cameras, serial devices, host-level drivers (CUDA, Metal, OpenCL, etc.)
+- Host dependencies: licensed software, OS-specific services, kernel modules, proprietary SDKs not installable in a container
+- Data/volume constraints: large files (> 100MB) that would be impractical to copy into a container, databases that must run on the host
+- Network/environment: tests that require host networking, VPN access, or specific DNS/firewall rules
+- Performance: Docker overhead would invalidate benchmarks or latency-sensitive measurements
+
+**Assessment steps**:
+1. Scan project source, config files, and dependencies for indicators of the factors above
+2. Check `TESTS_OUTPUT_DIR/environment.md` for environment requirements
+3. Check `_docs/00_problem/restrictions.md` and `_docs/01_solution/solution.md` for constraints
+
+**Decision**:
+- If ANY disqualifying factor is found → recommend **local test execution** as fallback. Present to user using Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Test execution environment
+══════════════════════════════════════
+ Docker is preferred, but factors preventing easy
+ Docker execution detected:
+ - [list factors found]
+══════════════════════════════════════
+ A) Local execution (recommended)
+ B) Docker execution (constraints may cause issues)
+══════════════════════════════════════
+ Recommendation: A — detected constraints prevent
+ easy Docker execution
+══════════════════════════════════════
+```
+
+- If NO disqualifying factors → use Docker (preferred default)
+- Record the decision in `TESTS_OUTPUT_DIR/environment.md` under a "Test Execution" section
+
+---
+
 ### Phase 4: Test Runner Script Generation

+**Skip condition**: If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.
+
 **Role**: DevOps engineer
 **Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
-**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure.
+**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Docker Suitability Assessment decision above.

 #### Step 1 — Detect test infrastructure

@@ -350,7 +392,9 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab
   - .NET: `dotnet test` (*.csproj, *.sln)
   - Rust: `cargo test` (Cargo.toml)
   - Node: `npm test` or `vitest` / `jest` (package.json)
-2. Identify docker-compose files for integration/blackbox tests (`docker-compose.test.yml`, `e2e/docker-compose*.yml`)
+2. Check Docker Suitability Assessment result:
+   - If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
+   - If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
 3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
 4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements

@@ -360,10 +404,11 @@ Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spe

 1. Set `set -euo pipefail` and trap cleanup on EXIT
 2. Optionally accept a `--unit-only` flag to skip blackbox tests
-3. Run unit tests using the detected test runner
-4. If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down
-5. Print a summary of passed/failed/skipped tests
-6. Exit 0 on all pass, exit 1 on any failure
+3. Run unit/blackbox tests using the detected test runner:
+   - **Local mode**: activate virtualenv (if present), run test runner directly on host
+   - **Docker mode**: spin up docker-compose environment, wait for health checks, run test suite, tear down
+4. Print a summary of passed/failed/skipped tests
+5. Exit 0 on all pass, exit 1 on any failure

 #### Step 3 — Generate `scripts/run-performance-tests.sh`

@@ -371,7 +416,7 @@ Create `scripts/run-performance-tests.sh` at the project root. The script must:

 1. Set `set -euo pipefail` and trap cleanup on EXIT
 2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
-3. Spin up the system under test (docker-compose or local)
+3. Start the system under test (local or docker-compose, matching the Docker Suitability Assessment decision)
 4. Run load/performance scenarios using the detected tool
 5. Compare results against threshold values from the test spec
 6. Print a pass/fail summary per scenario
@@ -85,7 +85,7 @@ Announce the detected mode to the user.

 ## Phase 2: Requirements Gathering

-Use the AskQuestion tool for structured input. Adapt based on what Phase 1 found — only ask for what's missing.
+Use the AskQuestion tool for structured input (fall back to plain-text questions if the tool is unavailable). Adapt based on what Phase 1 found — only ask for what's missing.

 **Round 1 — Structural:**