Enhance autopilot documentation and workflows: Add assumptions regarding single project per workspace, update notification sound references, and introduce context budget heuristics for managing session limits. Revise various skill documents to reflect changes in task management, including ticketing and testing processes, ensuring clarity and consistency across the system.

2026-06-22 10:51:08 +00:00 · 2026-03-24 05:56:12 +02:00
parent 749217bbb6
commit a5fc4fe073
14 changed files with 341 additions and 57 deletions
@@ -1,3 +1,7 @@
 ## Assumptions
 - **Single project per workspace**: this system assumes one project per Cursor workspace. All `_docs/` paths are relative to the workspace root. For monorepos, open each service in its own Cursor workspace window.
 ## How to Use
 Type `/autopilot` to start or continue the full workflow. The orchestrator detects where your project is and picks up from there.
@@ -37,6 +37,7 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det
 - **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here
 - **Sound on pause**: follow `.cursor/rules/human-attention-sound.mdc` — play a notification sound before every pause that requires human input
 - **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically
 - **Single project per workspace**: all `_docs/` paths are relative to workspace root; for monorepos, each service needs its own Cursor workspace
 ## Flow Resolution
@@ -83,13 +83,13 @@ If `_docs/03_implementation/` has batch reports, the implement skill detects com
 ---
 **Step 2e — Refactor**
-Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactor/FINAL_refactor_report.md` does not exist
+Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactoring/FINAL_report.md` does not exist
 Action: Read and execute `.cursor/skills/refactor/SKILL.md`
 The refactor skill runs the full 6-phase method using the implemented tests as a safety net.
-If `_docs/04_refactor/` has phase reports, the refactor skill detects completed phases and continues.
+If `_docs/04_refactoring/` has phase reports, the refactor skill detects completed phases and continues.
 ---
@@ -147,8 +147,8 @@ Condition: the autopilot state shows Step 2g (Implement) is completed AND the au
 Action: Run the full test suite to verify the implementation before deployment.
-1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
+1. If `scripts/run-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
-2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
+2. Otherwise, detect the project's test runner manually (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests; if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests
 If all tests pass → auto-chain to Step 2hb (Security Audit).
@@ -208,12 +208,11 @@ Action: Present using Choose format:
 ```
 - If user picks A → Run performance tests:
-  1. Check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios
+  1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
-  2. Detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks)
+  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
-  3. Execute performance test scenarios against the running system
+  3. Present results vs acceptance criteria thresholds
-  4. Present results vs acceptance criteria thresholds
+  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
-  5. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
+  5. After completion, auto-chain to Step 2i (Deploy)
  6. After completion, auto-chain to Step 2i (Deploy)
 - If user picks B → Mark Step 2hc as `skipped` in the state file, auto-chain to Step 2i (Deploy).
 ---
@@ -132,8 +132,8 @@ Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND t
 Action: Run the full test suite to verify the implementation before deployment.
-1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
+1. If `scripts/run-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
-2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
+2. Otherwise, detect the project's test runner manually (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests; if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests
 If all tests pass → auto-chain to Step 5b (Security Audit).
@@ -193,12 +193,11 @@ Action: Present using Choose format:
 ```
 - If user picks A → Run performance tests:
-  1. Check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios
+  1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
-  2. Detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks)
+  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
-  3. Execute performance test scenarios against the running system
+  3. Present results vs acceptance criteria thresholds
-  4. Present results vs acceptance criteria thresholds
+  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
-  5. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
+  5. After completion, auto-chain to Step 6 (Deploy)
  6. After completion, auto-chain to Step 6 (Deploy)
 - If user picks B → Mark Step 5c as `skipped` in the state file, auto-chain to Step 6 (Deploy).
 ---
@@ -46,7 +46,7 @@ Rules:
 2. Always include a recommendation with a brief justification
 3. Keep option descriptions to one line each
 4. If only 2 options make sense, use A/B only — do not pad with filler options
-5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice
+5. Play the notification sound (per `human-attention-sound.mdc`) before presenting the choice
 6. Record every user decision in the state file's `Key Decisions` section
 7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
@@ -154,7 +154,7 @@ After 3 failed auto-retries of the same skill, the failure is likely not user-re
   - Set `status: failed` in `Current Step`
   - Set `retry_count: 3`
   - Add a blocker entry describing the repeated failure
-2. Play notification sound (per `human-input-sound.mdc`)
+2. Play notification sound (per `human-attention-sound.mdc`)
 3. Present using Choose format:
 ```
@@ -251,6 +251,32 @@ When a skill needs to read large files (e.g., full solution.md, architecture.md)
 - Use search tools (Grep, SemanticSearch) to find specific sections rather than reading entire files
 - Summarize key decisions from prior steps in the state file so they don't need to be re-read
 ### Context Budget Heuristic
 Agents cannot programmatically query context window usage. Use these heuristics to avoid degradation:
 | Zone | Indicators | Action |
 |------|-----------|--------|
 | **Safe** | State file + SKILL.md + 2–3 focused artifacts loaded | Continue normally |
 | **Caution** | 5+ artifacts loaded, or 3+ large files (architecture, solution, discovery), or conversation has 20+ tool calls | Complete current sub-step, then suggest session break |
 | **Danger** | Repeated truncation in tool output, tool calls failing unexpectedly, responses becoming shallow or repetitive | Save immediately, update state file, force session boundary |
 **Skill-specific guidelines**:
 | Skill | Recommended session breaks |
 |-------|---------------------------|
 | **document** | After every ~5 modules in Step 1; between Step 4 (Verification) and Step 5 (Solution Extraction) |
 | **implement** | Each batch is a natural checkpoint; if more than 2 batches completed in one session, suggest break |
 | **plan** | Between Step 5 (Test Specifications) and Step 6 (Epics) for projects with many components |
 | **research** | Between Mode A rounds; between Mode A and Mode B |
 **How to detect caution/danger zone without API**:
 1. Count tool calls made so far — if approaching 20+, context is likely filling up
 2. If reading a file returns truncated content, context is under pressure
 3. If the agent starts producing shorter or less detailed responses than earlier in the conversation, context quality is degrading
 4. When in doubt, save and suggest a new conversation — re-entry is cheap thanks to the state file
 ## Rollback Protocol
 ### Implementation Steps (git-based)
@@ -20,7 +20,7 @@ retry_count: [0-3 — number of consecutive auto-retry attempts for current step
 (include the step reference table from the active flow file)
 When updating `Current Step`, always write it as:
-  step: N          ← autopilot step (0–6 or 2b/2c/2d/2e/2f/2g/2h/2hb/2i or 5b)
+  step: N          ← autopilot step (0–6 or 2b/2c/2d/2e/2ea/2f/2g/2h/2hb/2hc/2i or 5b/5c)
  sub_step: M      ← sub-skill's own internal step/phase number + name
  retry_count: 0   ← reset on new step or success; increment on each failed retry
 Example:
@@ -152,3 +152,42 @@ The `/implement` skill invokes this skill after each batch completes:
 2. Passes task spec paths + changed files to this skill
 3. If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms
 4. If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info)
 ## Integration Contract
 ### Inputs (provided by the implement skill)
 | Input | Type | Source | Required |
 |-------|------|--------|----------|
 | `task_specs` | list of file paths | Task `.md` files from `_docs/02_tasks/` for the current batch | Yes |
 | `changed_files` | list of file paths | Files modified by implementer agents (from `git diff` or agent reports) | Yes |
 | `batch_number` | integer | Current batch number (for report naming) | Yes |
 | `project_restrictions` | file path | `_docs/00_problem/restrictions.md` | If exists |
 | `solution_overview` | file path | `_docs/01_solution/solution.md` | If exists |
 ### Invocation Pattern
 The implement skill invokes code-review by:
 1. Reading `.cursor/skills/code-review/SKILL.md`
 2. Providing the inputs above as context (read the files, pass content to the review phases)
 3. Executing all 6 phases sequentially
 4. Consuming the verdict from the output
 ### Outputs (returned to the implement skill)
 | Output | Type | Description |
 |--------|------|-------------|
 | `verdict` | `PASS` / `PASS_WITH_WARNINGS` / `FAIL` | Drives the implement skill's auto-fix gate |
 | `findings` | structured list | Each finding has: severity, category, file:line, title, description, suggestion, task reference |
 | `critical_count` | integer | Number of Critical findings |
 | `high_count` | integer | Number of High findings |
 | `report_path` | file path | `_docs/03_implementation/reviews/batch_[NN]_review.md` |
 ### Report Persistence
 Save the review report to `_docs/03_implementation/reviews/batch_[NN]_review.md` (create the `reviews/` directory if it does not exist). The report uses the Output Format defined above.
 The implement skill uses `verdict` to decide:
 - `PASS` / `PASS_WITH_WARNINGS` → proceed to commit
 - `FAIL` → enter auto-fix loop (up to 2 attempts), then escalate to user
@@ -36,13 +36,30 @@ Fixed paths:
 - SOLUTION_DIR: `_docs/01_solution/`
 - PROBLEM_DIR: `_docs/00_problem/`
-Announce resolved paths to user before proceeding.
+Optional input:
 - FOCUS_DIR: a specific directory subtree provided by the user (e.g., `/document @src/api/`). When set, only this subtree and its transitive dependencies are analyzed.
 Announce resolved paths (and FOCUS_DIR if set) to user before proceeding.
 ## Mode Detection
 Determine the execution mode before any other logic:
 | Mode | Trigger | Scope |
 |------|---------|-------|
 | **Full** | No input file, no existing state | Entire codebase |
 | **Focus Area** | User provides a directory path (e.g., `@src/api/`) | Only the specified subtree + transitive dependencies |
 | **Resume** | `state.json` exists in DOCUMENT_DIR | Continue from last checkpoint |
 Focus Area mode produces module + component docs for the targeted area only. It can be run repeatedly for different areas — each run appends to the existing module and component docs without overwriting other areas.
 ## Prerequisite Checks
-1. If `_docs/` already exists and contains files, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?**
+1. If `_docs/` already exists and contains files AND mode is **Full**, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?**
 2. Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
 3. If DOCUMENT_DIR contains a `state.json`, offer to **resume from last checkpoint or start fresh**
 4. If FOCUS_DIR is set, verify the directory exists and contains source files — **STOP if missing**
 ## Progress Tracking
@@ -53,7 +70,9 @@ Create a TodoWrite with all steps (0 through 7). Update status as each step comp
 ### Step 0: Codebase Discovery
 **Role**: Code analyst
-**Goal**: Build a complete map of the codebase before analyzing any code.
+**Goal**: Build a complete map of the codebase (or targeted subtree) before analyzing any code.
 **Focus Area scoping**: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.
 Scan and catalog:
@@ -69,6 +88,7 @@ Scan and catalog:
   - Entry points (no internal dependents)
   - Cycles (mark for grouped analysis)
   - Topological processing order
   - If FOCUS_DIR: mark which modules are in-scope vs dependency-only
 **Save**: `DOCUMENT_DIR/00_discovery.md` containing:
 - Directory tree (concise, relevant directories only)
@@ -82,14 +102,18 @@ Scan and catalog:
 {
  "current_step": "module-analysis",
  "completed_steps": ["discovery"],
  "focus_dir": null,
  "modules_total": 0,
  "modules_documented": [],
  "modules_remaining": [],
  "module_batch": 0,
  "components_written": [],
  "last_updated": ""
 }
 ```
 Set `focus_dir` to the FOCUS_DIR path if in Focus Area mode, or `null` for Full mode.
 ---
 ### Step 1: Module-Level Documentation
@@ -97,6 +121,8 @@ Scan and catalog:
 **Role**: Code analyst
 **Goal**: Document every identified module individually, processing in topological order (leaves first).
 **Batched processing**: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update `state.json`, present a progress summary. Between batches, evaluate whether to suggest a session break.
 For each module in topological order:
 1. **Read**: read the module's source code. Assess complexity and what context is needed.
@@ -119,7 +145,26 @@ For each module in topological order:
 **Large modules**: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.
 **Save**: `DOCUMENT_DIR/modules/[module_name].md` for each module.
-**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`).
+**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`). Increment `module_batch` after each batch of ~5.
 **Session break heuristic**: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:
 ```
 ══════════════════════════════════════
 SESSION BREAK SUGGESTED
 ══════════════════════════════════════
 Modules documented: [X] of [Y]
 Batches completed this session: [N]
 ══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a fresh conversation (recommended)
 ══════════════════════════════════════
 Recommendation: B — fresh context improves
 analysis quality for remaining modules
 ══════════════════════════════════════
 ```
 Re-entry is seamless: `state.json` tracks exactly which modules are done.
 ---
@@ -238,6 +283,23 @@ Apply corrections inline to the documents that need them.
 **BLOCKING**: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.
 **Session boundary**: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:
 ```
 ══════════════════════════════════════
 VERIFICATION COMPLETE — session break?
 ══════════════════════════════════════
 Steps 0–4 (analysis + verification) are done.
 Steps 5–7 (solution + problem extraction + report)
 can run in a fresh conversation.
 ══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a new conversation (recommended)
 ══════════════════════════════════════
 ```
 If **Focus Area mode**: Steps 5–7 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run `/document` again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.
 ---
 ### Step 5: Solution Extraction (Retrospective)
@@ -370,9 +432,11 @@ Maintain `DOCUMENT_DIR/state.json`:
 {
  "current_step": "module-analysis",
  "completed_steps": ["discovery"],
  "focus_dir": null,
  "modules_total": 12,
  "modules_documented": ["utils/helpers", "models/user"],
  "modules_remaining": ["services/auth", "api/endpoints"],
  "module_batch": 1,
  "components_written": [],
  "last_updated": "2026-03-21T14:00:00Z"
 }
@@ -423,16 +487,21 @@ When resuming:
 ┌──────────────────────────────────────────────────────────────────┐
 │          Bottom-Up Codebase Documentation (8-Step)               │
 ├──────────────────────────────────────────────────────────────────┤
-│ PREREQ: Check _docs/ exists (overwrite/merge/new?)              │
+│ MODE: Full / Focus Area (@dir) / Resume (state.json)             │
-│ PREREQ: Check state.json for resume                             │
+│ PREREQ: Check _docs/ exists (overwrite/merge/new?)               │
 │ PREREQ: Check state.json for resume                              │
 │                                                                  │
 │ 0. Discovery          → dependency graph, tech stack, topo order │
 │    (Focus Area: scoped to FOCUS_DIR + transitive deps)           │
 │ 1. Module Docs        → per-module analysis (leaves first)       │
 │    (batched ~5 modules; session break between batches)           │
 │ 2. Component Assembly → group modules, write component specs     │
 │    [BLOCKING: user confirms components]                          │
 │ 3. System Synthesis   → architecture, flows, data model, deploy  │
 │ 4. Verification       → compare all docs vs code, fix errors     │
 │    [BLOCKING: user reviews corrections]                          │
 │    [SESSION BREAK suggested before Steps 5–7]                    │
 │    ── Focus Area mode stops here ──                              │
 │ 5. Solution Extraction → retrospective solution.md               │
 │ 6. Problem Extraction → retrospective problem, restrictions, AC  │
 │    [BLOCKING: user confirms problem docs]                        │
@@ -441,5 +510,6 @@ When resuming:
 │ Principles: Bottom-up always · Dependencies first                │
 │             Incremental context · Verify against code            │
 │             Save immediately · Resume from checkpoint            │
 │             Batch modules · Session breaks for large codebases   │
 └──────────────────────────────────────────────────────────────────┘
 ```
@@ -73,9 +73,9 @@ For each task in the batch:
 - Determine: files OWNED (exclusive write), files READ-ONLY (shared interfaces, types), files FORBIDDEN (other agents' owned files)
 - If two tasks in the same batch would modify the same file, schedule them sequentially instead of in parallel
-### 5. Update Jira Status → In Progress
+### 5. Update Tracker Status → In Progress
-For each task in the batch, transition its Jira ticket status to **In Progress** via Jira MCP before launching the implementer.
+For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `protocols.md` for detection) before launching the implementer. If `tracker: local`, skip this step.
 ### 6. Launch Implementer Subagents
@@ -127,12 +127,12 @@ Track `auto_fix_attempts` count in the batch report for retrospective analysis.
 - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
  - `git add` all changed files from the batch
-  - `git commit` with a message that includes ALL JIRA-IDs of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[JIRA-ID-1] [JIRA-ID-2] ... Summary of changes`
+  - `git commit` with a message that includes ALL task IDs (Jira IDs, ADO IDs, or numeric prefixes) of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[TASK-ID-1] [TASK-ID-2] ... Summary of changes`
  - `git push` to the remote branch
-### 12. Update Jira Status → In Testing
+### 12. Update Tracker Status → In Testing
-After the batch is committed and pushed, transition the Jira ticket status of each task in the batch to **In Testing** via Jira MCP.
+After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step.
 ### 13. Loop
@@ -213,27 +213,27 @@ Present using the Choose format for each decision that has meaningful alternativ
 ---
-### Step 7: Jira Ticket
+### Step 7: Work Item Ticket
 **Role**: Project coordinator
-**Goal**: Create a Jira ticket and link it to the task file.
+**Goal**: Create a work item ticket and link it to the task file.
-1. Create a Jira ticket for the task:
+1. Create a ticket via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md` for detection):
   - Summary: the task's **Name** field
   - Description: the task's **Problem** and **Acceptance Criteria** sections
   - Story points: the task's **Complexity** value
   - Link to the appropriate epic (ask user if unclear which epic)
-2. Write the Jira ticket ID and Epic ID back into the task file header:
+2. Write the ticket ID and Epic ID back into the task file header:
-   - Update **Task** field: `[JIRA-ID]_[short_name]`
+   - Update **Task** field: `[TICKET-ID]_[short_name]`
-   - Update **Jira** field: `[JIRA-ID]`
+   - Update **Jira** field: `[TICKET-ID]`
   - Update **Epic** field: `[EPIC-ID]`
-3. Rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`
+3. Rename the file from `[##]_[short_name].md` to `[TICKET-ID]_[short_name].md`
-If Jira MCP is not authenticated or unavailable:
+If the work item tracker is not authenticated or unavailable (`tracker: local`):
 - Keep the numeric prefix
 - Set **Jira** to `pending`
 - Set **Epic** to `pending`
- The task is still valid and can be implemented; Jira sync happens later
+- The task is still valid and can be implemented; tracker sync happens later
 ---
@@ -1,13 +1,13 @@
-## Step 6: Jira Epics
+## Step 6: Work Item Epics
 **Role**: Professional product manager
-**Goal**: Create Jira epics from components, ordered by dependency
+**Goal**: Create epics from components, ordered by dependency
-**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the Jira epic should understand the full context without needing to open separate files.
+**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the epic should understand the full context without needing to open separate files.
 1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
-2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md`
+2. Generate epics for each component using the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md`), structured per `templates/epic-spec.md`
 3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
 4. Include effort estimation per epic (T-shirt size or story points range)
 5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
@@ -15,7 +15,7 @@
 **CRITICAL — Epic description richness requirements**:
-Each epic description in Jira MUST include ALL of the following sections with substantial content:
+Each epic description MUST include ALL of the following sections with substantial content:
 - **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections)
 - **Problem / Context**: what problem this component solves, why it exists, current pain points
 - **Scope**: detailed in-scope and out-of-scope lists
@@ -31,7 +31,7 @@ Each epic description in Jira MUST include ALL of the following sections with su
 - **Key constraints**: from restrictions.md that affect this component
 - **Testing strategy**: summary of test types and coverage from tests.md
-Do NOT create minimal epics with just a summary and short description. The Jira epic is the primary reference document for the implementation team.
+Do NOT create minimal epics with just a summary and short description. The epic is the primary reference document for the implementation team.
 **Self-verification**:
 - [ ] "Bootstrap & Initial Structure" epic exists and is first in order
@@ -45,4 +45,4 @@ Do NOT create minimal epics with just a summary and short description. The Jira
 7. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`.
-**Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
+**Save action**: Epics created via the configured tracker MCP. Also saved locally in `epics.md` with ticket IDs. If `tracker: local`, save locally only.
@@ -1,6 +1,6 @@
-# Jira Epic Template
+# Epic Template
-Use this template for each Jira epic. Create epics via Jira MCP.
+Use this template for each epic. Create epics via the configured work item tracker (Jira MCP or Azure DevOps MCP).
 ---
@@ -5,8 +5,8 @@ description: |
  then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits)
  that treat the system as a black box. Every test pairs input data with quantifiable expected results
  so tests can verify correctness, not just execution.
-  3-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate.
+  4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate,
-  Produces 8 artifacts under tests/.
+  test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/.
  Trigger phrases:
  - "test spec", "test specification", "test scenarios"
  - "blackbox test spec", "black box tests", "blackbox tests"
@@ -133,6 +133,8 @@ TESTS_OUTPUT_DIR/
 | Phase 3 | Updated test data spec (if data added) | `test-data.md` |
 | Phase 3 | Updated test files (if tests removed) | respective test file |
 | Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
 | Phase 4 | Test runner script | `scripts/run-tests.sh` |
 | Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` |
 ### Resumability
@@ -335,6 +337,56 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab
 ---
 ### Phase 4: Test Runner Script Generation
 **Role**: DevOps engineer
 **Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
 **Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure.
 #### Step 1 — Detect test infrastructure
 1. Identify the project's test runner from manifests and config files:
   - Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini)
   - .NET: `dotnet test` (*.csproj, *.sln)
   - Rust: `cargo test` (Cargo.toml)
   - Node: `npm test` or `vitest` / `jest` (package.json)
 2. Identify docker-compose files for integration/blackbox tests (`docker-compose.test.yml`, `e2e/docker-compose*.yml`)
 3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
 4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
 #### Step 2 — Generate `scripts/run-tests.sh`
 Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
 1. Set `set -euo pipefail` and trap cleanup on EXIT
 2. Optionally accept a `--unit-only` flag to skip blackbox tests
 3. Run unit tests using the detected test runner
 4. If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down
 5. Print a summary of passed/failed/skipped tests
 6. Exit 0 on all pass, exit 1 on any failure
 #### Step 3 — Generate `scripts/run-performance-tests.sh`
 Create `scripts/run-performance-tests.sh` at the project root. The script must:
 1. Set `set -euo pipefail` and trap cleanup on EXIT
 2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
 3. Spin up the system under test (docker-compose or local)
 4. Run load/performance scenarios using the detected tool
 5. Compare results against threshold values from the test spec
 6. Print a pass/fail summary per scenario
 7. Exit 0 if all thresholds met, exit 1 otherwise
 #### Step 4 — Verify scripts
 1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
 2. Mark both scripts as executable (`chmod +x`)
 3. Present a summary of what each script does to the user
 **Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
 ---
 ## Escalation Rules
 | Situation | Action |
@@ -373,7 +425,7 @@ When the user wants to:
 ```
 ┌──────────────────────────────────────────────────────────────────────┐
-│              Test Scenario Specification (3-Phase)                    │
+│              Test Scenario Specification (4-Phase)                   │
 ├──────────────────────────────────────────────────────────────────────┤
 │ PREREQ: Data Gate (BLOCKING)                                         │
 │   → verify AC, restrictions, input_data (incl. expected_results.md)  │
@@ -397,15 +449,21 @@ When the user wants to:
 │                                                                      │
 │ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)    │
 │   → build test-data + expected-result requirements checklist         │
-│   → ask user: provide data+result (A) or remove test (B)            │
+│   → ask user: provide data+result (A) or remove test (B)             │
 │   → validate input data (quality + quantity)                         │
 │   → validate expected results (quantifiable + comparison method)     │
 │   → remove tests without data or expected result, warn user          │
-│   → final coverage check (≥70% or FAIL + loop back)                 │
+│   → final coverage check (≥70% or FAIL + loop back)                  │
-│   [BLOCKING: coverage ≥ 70% required to pass]                       │
+│   [BLOCKING: coverage ≥ 70% required to pass]                        │
 │                                                                      │
 │ Phase 4: Test Runner Script Generation                               │
 │   → detect test runner + docker-compose + load tool                  │
 │   → scripts/run-tests.sh (unit + blackbox)                           │
 │   → scripts/run-performance-tests.sh (load/perf scenarios)           │
 │   → verify scripts are valid and executable                          │
 ├──────────────────────────────────────────────────────────────────────┤
 │ Principles: Black-box only · Traceability · Save immediately         │
 │             Ask don't assume · Spec don't code                       │
-│             No test without data · No test without expected result    │
+│             No test without data · No test without expected result   │
 └──────────────────────────────────────────────────────────────────────┘
 ```
@@ -0,0 +1,88 @@
 # Test Runner Script Structure
 Reference for generating `scripts/run-tests.sh` and `scripts/run-performance-tests.sh`.
 ## `scripts/run-tests.sh`
 ```bash
 #!/usr/bin/env bash
 set -euo pipefail
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
 UNIT_ONLY=false
 RESULTS_DIR="$PROJECT_ROOT/test-results"
 for arg in "$@"; do
  case $arg in
    --unit-only) UNIT_ONLY=true ;;
  esac
 done
 cleanup() {
  # tear down docker-compose if it was started
 }
 trap cleanup EXIT
 mkdir -p "$RESULTS_DIR"
 # --- Unit Tests ---
 # [detect runner: pytest / dotnet test / cargo test / npm test]
 # [run and capture exit code]
 # [save results to $RESULTS_DIR/unit-results.*]
 # --- Blackbox Tests (skip if --unit-only) ---
 # if ! $UNIT_ONLY; then
 #   [docker compose -f <compose-file> up -d]
 #   [wait for health checks]
 #   [run blackbox test suite]
 #   [save results to $RESULTS_DIR/blackbox-results.*]
 # fi
 # --- Summary ---
 # [print passed / failed / skipped counts]
 # [exit 0 if all passed, exit 1 otherwise]
 ```
 ## `scripts/run-performance-tests.sh`
 ```bash
 #!/usr/bin/env bash
 set -euo pipefail
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
 RESULTS_DIR="$PROJECT_ROOT/test-results"
 cleanup() {
  # tear down test environment if started
 }
 trap cleanup EXIT
 mkdir -p "$RESULTS_DIR"
 # --- Start System Under Test ---
 # [docker compose up -d or start local server]
 # [wait for health checks]
 # --- Run Performance Scenarios ---
 # [detect tool: k6 / locust / artillery / wrk / built-in]
 # [run each scenario from performance-tests.md]
 # [capture metrics: latency P50/P95/P99, throughput, error rate]
 # --- Compare Against Thresholds ---
 # [read thresholds from test spec or CLI args]
 # [print per-scenario pass/fail]
 # --- Summary ---
 # [exit 0 if all thresholds met, exit 1 otherwise]
 ```
 ## Key Requirements
 - Both scripts must be idempotent (safe to run multiple times)
 - Both scripts must work in CI (no interactive prompts, no GUI)
 - Use `trap cleanup EXIT` to ensure teardown even on failure
 - Exit codes: 0 = all pass, 1 = failures detected
 - Write results to `test-results/` directory (add to `.gitignore` if not already present)
 - The actual commands depend on the detected tech stack — fill them in during Phase 4 of the test-spec skill