diff --git a/.cursor/README.md b/.cursor/README.md
index 4740c79..d9522b4 100644
--- a/.cursor/README.md
+++ b/.cursor/README.md
@@ -1,3 +1,7 @@
+## Assumptions
+
+- **Single project per workspace**: this system assumes one project per Cursor workspace. All `_docs/` paths are relative to the workspace root. For monorepos, open each service in its own Cursor workspace window.
+
 ## How to Use
 
 Type `/autopilot` to start or continue the full workflow. The orchestrator detects where your project is and picks up from there.
diff --git a/.cursor/skills/autopilot/SKILL.md b/.cursor/skills/autopilot/SKILL.md
index 5e0178d..57d39a1 100644
--- a/.cursor/skills/autopilot/SKILL.md
+++ b/.cursor/skills/autopilot/SKILL.md
@@ -37,6 +37,7 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det
 - **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here
 - **Sound on pause**: follow `.cursor/rules/human-attention-sound.mdc` — play a notification sound before every pause that requires human input
 - **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically
+- **Single project per workspace**: all `_docs/` paths are relative to workspace root; for monorepos, each service needs its own Cursor workspace
 
 ## Flow Resolution
 
diff --git a/.cursor/skills/autopilot/flows/existing-code.md b/.cursor/skills/autopilot/flows/existing-code.md
index d20a25e..91e120f 100644
--- a/.cursor/skills/autopilot/flows/existing-code.md
+++ b/.cursor/skills/autopilot/flows/existing-code.md
@@ -83,13 +83,13 @@ If `_docs/03_implementation/` has batch reports, the implement skill detects com
 ---
 
 **Step 2e — Refactor**
-Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactor/FINAL_refactor_report.md` does not exist
+Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactoring/FINAL_report.md` does not exist
 
 Action: Read and execute `.cursor/skills/refactor/SKILL.md`
 
 The refactor skill runs the full 6-phase method using the implemented tests as a safety net.
 
-If `_docs/04_refactor/` has phase reports, the refactor skill detects completed phases and continues.
+If `_docs/04_refactoring/` has phase reports, the refactor skill detects completed phases and continues.
 
 ---
 
@@ -147,8 +147,8 @@ Condition: the autopilot state shows Step 2g (Implement) is completed AND the au
 
 Action: Run the full test suite to verify the implementation before deployment.
 
-1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
-2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
+1. If `scripts/run-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+2. Otherwise, detect the project's test runner manually (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests; if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests
 
 If all tests pass → auto-chain to Step 2hb (Security Audit).
@@ -208,12 +208,11 @@ Action: Present using Choose format:
 ```
 
 - If user picks A → Run performance tests:
-  1. Check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios
-  2. Detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks)
-  3. Execute performance test scenarios against the running system
-  4. Present results vs acceptance criteria thresholds
-  5. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
-  6. After completion, auto-chain to Step 2i (Deploy)
+  1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
+  3. Present results vs acceptance criteria thresholds
+  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
+  5. After completion, auto-chain to Step 2i (Deploy)
 - If user picks B → Mark Step 2hc as `skipped` in the state file, auto-chain to Step 2i (Deploy).
 
 ---
diff --git a/.cursor/skills/autopilot/flows/greenfield.md b/.cursor/skills/autopilot/flows/greenfield.md
index e7158bc..859094d 100644
--- a/.cursor/skills/autopilot/flows/greenfield.md
+++ b/.cursor/skills/autopilot/flows/greenfield.md
@@ -132,8 +132,8 @@ Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND t
 
 Action: Run the full test suite to verify the implementation before deployment.
 
-1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
-2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
+1. If `scripts/run-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+2. Otherwise, detect the project's test runner manually (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests; if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests
 
 If all tests pass → auto-chain to Step 5b (Security Audit).
@@ -193,12 +193,11 @@ Action: Present using Choose format:
 ```
 
 - If user picks A → Run performance tests:
-  1. Check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios
-  2. Detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks)
-  3. Execute performance test scenarios against the running system
-  4. Present results vs acceptance criteria thresholds
-  5. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
-  6. After completion, auto-chain to Step 6 (Deploy)
+  1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
+  3. Present results vs acceptance criteria thresholds
+  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
+  5. After completion, auto-chain to Step 6 (Deploy)
 - If user picks B → Mark Step 5c as `skipped` in the state file, auto-chain to Step 6 (Deploy).
 
 ---
diff --git a/.cursor/skills/autopilot/protocols.md b/.cursor/skills/autopilot/protocols.md
index fe118ee..18eb731 100644
--- a/.cursor/skills/autopilot/protocols.md
+++ b/.cursor/skills/autopilot/protocols.md
@@ -46,7 +46,7 @@ Rules:
 2. Always include a recommendation with a brief justification
 3. Keep option descriptions to one line each
 4. If only 2 options make sense, use A/B only — do not pad with filler options
-5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice
+5. Play the notification sound (per `human-attention-sound.mdc`) before presenting the choice
 6. Record every user decision in the state file's `Key Decisions` section
 7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
 
@@ -154,7 +154,7 @@ After 3 failed auto-retries of the same skill, the failure is likely not user-re
    - Set `status: failed` in `Current Step`
    - Set `retry_count: 3`
    - Add a blocker entry describing the repeated failure
-2. Play notification sound (per `human-input-sound.mdc`)
+2. Play notification sound (per `human-attention-sound.mdc`)
 3. Present using Choose format:
 
 ```
@@ -251,6 +251,32 @@ When a skill needs to read large files (e.g., full solution.md, architecture.md)
 - Use search tools (Grep, SemanticSearch) to find specific sections rather than reading entire files
 - Summarize key decisions from prior steps in the state file so they don't need to be re-read
 
+### Context Budget Heuristic
+
+Agents cannot programmatically query context window usage. Use these heuristics to avoid degradation:
+
+| Zone | Indicators | Action |
+|------|-----------|--------|
+| **Safe** | State file + SKILL.md + 2–3 focused artifacts loaded | Continue normally |
+| **Caution** | 5+ artifacts loaded, or 3+ large files (architecture, solution, discovery), or conversation has 20+ tool calls | Complete current sub-step, then suggest session break |
+| **Danger** | Repeated truncation in tool output, tool calls failing unexpectedly, responses becoming shallow or repetitive | Save immediately, update state file, force session boundary |
+
+**Skill-specific guidelines**:
+
+| Skill | Recommended session breaks |
+|-------|---------------------------|
+| **document** | After every ~5 modules in Step 1; between Step 4 (Verification) and Step 5 (Solution Extraction) |
+| **implement** | Each batch is a natural checkpoint; if more than 2 batches completed in one session, suggest break |
+| **plan** | Between Step 5 (Test Specifications) and Step 6 (Epics) for projects with many components |
+| **research** | Between Mode A rounds; between Mode A and Mode B |
+
+**How to detect caution/danger zone without API**:
+
+1. Count tool calls made so far — if approaching 20+, context is likely filling up
+2. If reading a file returns truncated content, context is under pressure
+3. If the agent starts producing shorter or less detailed responses than earlier in the conversation, context quality is degrading
+4. When in doubt, save and suggest a new conversation — re-entry is cheap thanks to the state file
+
 ## Rollback Protocol
 
 ### Implementation Steps (git-based)
diff --git a/.cursor/skills/autopilot/state.md b/.cursor/skills/autopilot/state.md
index d8250e2..50650aa 100644
--- a/.cursor/skills/autopilot/state.md
+++ b/.cursor/skills/autopilot/state.md
@@ -20,7 +20,7 @@ retry_count: [0-3 — number of consecutive auto-retry attempts for current step
 (include the step reference table from the active flow file)
 
 When updating `Current Step`, always write it as:
-  step: N          ← autopilot step (0–6 or 2b/2c/2d/2e/2f/2g/2h/2hb/2i or 5b)
+  step: N          ← autopilot step (0–6 or 2b/2c/2d/2e/2ea/2f/2g/2h/2hb/2hc/2i or 5b/5c)
   sub_step: M      ← sub-skill's own internal step/phase number + name
   retry_count: 0   ← reset on new step or success; increment on each failed retry
 Example:
diff --git a/.cursor/skills/code-review/SKILL.md b/.cursor/skills/code-review/SKILL.md
index 44c190c..041013a 100644
--- a/.cursor/skills/code-review/SKILL.md
+++ b/.cursor/skills/code-review/SKILL.md
@@ -152,3 +152,42 @@ The `/implement` skill invokes this skill after each batch completes:
 2. Passes task spec paths + changed files to this skill
 3. If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms
 4. If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info)
+
+## Integration Contract
+
+### Inputs (provided by the implement skill)
+
+| Input | Type | Source | Required |
+|-------|------|--------|----------|
+| `task_specs` | list of file paths | Task `.md` files from `_docs/02_tasks/` for the current batch | Yes |
+| `changed_files` | list of file paths | Files modified by implementer agents (from `git diff` or agent reports) | Yes |
+| `batch_number` | integer | Current batch number (for report naming) | Yes |
+| `project_restrictions` | file path | `_docs/00_problem/restrictions.md` | If exists |
+| `solution_overview` | file path | `_docs/01_solution/solution.md` | If exists |
+
+### Invocation Pattern
+
+The implement skill invokes code-review by:
+
+1. Reading `.cursor/skills/code-review/SKILL.md`
+2. Providing the inputs above as context (read the files, pass content to the review phases)
+3. Executing all 6 phases sequentially
+4. Consuming the verdict from the output
+
+### Outputs (returned to the implement skill)
+
+| Output | Type | Description |
+|--------|------|-------------|
+| `verdict` | `PASS` / `PASS_WITH_WARNINGS` / `FAIL` | Drives the implement skill's auto-fix gate |
+| `findings` | structured list | Each finding has: severity, category, file:line, title, description, suggestion, task reference |
+| `critical_count` | integer | Number of Critical findings |
+| `high_count` | integer | Number of High findings |
+| `report_path` | file path | `_docs/03_implementation/reviews/batch_[NN]_review.md` |
+
+### Report Persistence
+
+Save the review report to `_docs/03_implementation/reviews/batch_[NN]_review.md` (create the `reviews/` directory if it does not exist). The report uses the Output Format defined above.
+
+The implement skill uses `verdict` to decide:
+- `PASS` / `PASS_WITH_WARNINGS` → proceed to commit
+- `FAIL` → enter auto-fix loop (up to 2 attempts), then escalate to user
diff --git a/.cursor/skills/document/SKILL.md b/.cursor/skills/document/SKILL.md
index 46b47aa..c920555 100644
--- a/.cursor/skills/document/SKILL.md
+++ b/.cursor/skills/document/SKILL.md
@@ -36,13 +36,30 @@ Fixed paths:
 - SOLUTION_DIR: `_docs/01_solution/`
 - PROBLEM_DIR: `_docs/00_problem/`
 
-Announce resolved paths to user before proceeding.
+Optional input:
+
+- FOCUS_DIR: a specific directory subtree provided by the user (e.g., `/document @src/api/`). When set, only this subtree and its transitive dependencies are analyzed.
+
+Announce resolved paths (and FOCUS_DIR if set) to user before proceeding.
+
+## Mode Detection
+
+Determine the execution mode before any other logic:
+
+| Mode | Trigger | Scope |
+|------|---------|-------|
+| **Full** | No input file, no existing state | Entire codebase |
+| **Focus Area** | User provides a directory path (e.g., `@src/api/`) | Only the specified subtree + transitive dependencies |
+| **Resume** | `state.json` exists in DOCUMENT_DIR | Continue from last checkpoint |
+
+Focus Area mode produces module + component docs for the targeted area only. It can be run repeatedly for different areas — each run appends to the existing module and component docs without overwriting other areas.
 
 ## Prerequisite Checks
 
-1. If `_docs/` already exists and contains files, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?**
+1. If `_docs/` already exists and contains files AND mode is **Full**, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?**
 2. Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
 3. If DOCUMENT_DIR contains a `state.json`, offer to **resume from last checkpoint or start fresh**
+4. If FOCUS_DIR is set, verify the directory exists and contains source files — **STOP if missing**
 
 ## Progress Tracking
 
@@ -53,7 +70,9 @@ Create a TodoWrite with all steps (0 through 7). Update status as each step comp
 ### Step 0: Codebase Discovery
 
 **Role**: Code analyst
-**Goal**: Build a complete map of the codebase before analyzing any code.
+**Goal**: Build a complete map of the codebase (or targeted subtree) before analyzing any code.
+
+**Focus Area scoping**: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.
 
 Scan and catalog:
 
@@ -69,6 +88,7 @@ Scan and catalog:
    - Entry points (no internal dependents)
    - Cycles (mark for grouped analysis)
    - Topological processing order
+   - If FOCUS_DIR: mark which modules are in-scope vs dependency-only
 
 **Save**: `DOCUMENT_DIR/00_discovery.md` containing:
 - Directory tree (concise, relevant directories only)
@@ -82,14 +102,18 @@ Scan and catalog:
 {
   "current_step": "module-analysis",
   "completed_steps": ["discovery"],
+  "focus_dir": null,
   "modules_total": 0,
   "modules_documented": [],
   "modules_remaining": [],
+  "module_batch": 0,
   "components_written": [],
   "last_updated": ""
 }
 ```
 
+Set `focus_dir` to the FOCUS_DIR path if in Focus Area mode, or `null` for Full mode.
+
 ---
 
 ### Step 1: Module-Level Documentation
@@ -97,6 +121,8 @@ Scan and catalog:
 **Role**: Code analyst
 **Goal**: Document every identified module individually, processing in topological order (leaves first).
 
+**Batched processing**: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update `state.json`, present a progress summary. Between batches, evaluate whether to suggest a session break.
+
 For each module in topological order:
 
 1. **Read**: read the module's source code. Assess complexity and what context is needed.
@@ -119,7 +145,26 @@ For each module in topological order:
 **Large modules**: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.
 
 **Save**: `DOCUMENT_DIR/modules/[module_name].md` for each module.
-**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`).
+**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`). Increment `module_batch` after each batch of ~5.
+
+**Session break heuristic**: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:
+
+```
+══════════════════════════════════════
+ SESSION BREAK SUGGESTED
+══════════════════════════════════════
+ Modules documented: [X] of [Y]
+ Batches completed this session: [N]
+══════════════════════════════════════
+ A) Continue in this conversation
+ B) Save and continue in a fresh conversation (recommended)
+══════════════════════════════════════
+ Recommendation: B — fresh context improves
+ analysis quality for remaining modules
+══════════════════════════════════════
+```
+
+Re-entry is seamless: `state.json` tracks exactly which modules are done.
 
 ---
 
@@ -238,6 +283,23 @@ Apply corrections inline to the documents that need them.
 
 **BLOCKING**: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.
 
+**Session boundary**: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:
+
+```
+══════════════════════════════════════
+ VERIFICATION COMPLETE — session break?
+══════════════════════════════════════
+ Steps 0–4 (analysis + verification) are done.
+ Steps 5–7 (solution + problem extraction + report)
+ can run in a fresh conversation.
+══════════════════════════════════════
+ A) Continue in this conversation
+ B) Save and continue in a new conversation (recommended)
+══════════════════════════════════════
+```
+
+If **Focus Area mode**: Steps 5–7 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run `/document` again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.
+
 ---
 
 ### Step 5: Solution Extraction (Retrospective)
@@ -370,9 +432,11 @@ Maintain `DOCUMENT_DIR/state.json`:
 {
   "current_step": "module-analysis",
   "completed_steps": ["discovery"],
+  "focus_dir": null,
   "modules_total": 12,
   "modules_documented": ["utils/helpers", "models/user"],
   "modules_remaining": ["services/auth", "api/endpoints"],
+  "module_batch": 1,
   "components_written": [],
   "last_updated": "2026-03-21T14:00:00Z"
 }
@@ -423,16 +487,21 @@ When resuming:
 ┌──────────────────────────────────────────────────────────────────┐
 │          Bottom-Up Codebase Documentation (8-Step)               │
 ├──────────────────────────────────────────────────────────────────┤
-│ PREREQ: Check _docs/ exists (overwrite/merge/new?)              │
-│ PREREQ: Check state.json for resume                             │
+│ MODE: Full / Focus Area (@dir) / Resume (state.json)             │
+│ PREREQ: Check _docs/ exists (overwrite/merge/new?)               │
+│ PREREQ: Check state.json for resume                              │
 │                                                                  │
 │ 0. Discovery          → dependency graph, tech stack, topo order │
+│    (Focus Area: scoped to FOCUS_DIR + transitive deps)           │
 │ 1. Module Docs        → per-module analysis (leaves first)       │
+│    (batched ~5 modules; session break between batches)           │
 │ 2. Component Assembly → group modules, write component specs     │
 │    [BLOCKING: user confirms components]                          │
 │ 3. System Synthesis   → architecture, flows, data model, deploy  │
 │ 4. Verification       → compare all docs vs code, fix errors     │
 │    [BLOCKING: user reviews corrections]                          │
+│    [SESSION BREAK suggested before Steps 5–7]                    │
+│    ── Focus Area mode stops here ──                              │
 │ 5. Solution Extraction → retrospective solution.md               │
 │ 6. Problem Extraction → retrospective problem, restrictions, AC  │
 │    [BLOCKING: user confirms problem docs]                        │
@@ -441,5 +510,6 @@ When resuming:
 │ Principles: Bottom-up always · Dependencies first                │
 │             Incremental context · Verify against code            │
 │             Save immediately · Resume from checkpoint            │
+│             Batch modules · Session breaks for large codebases   │
 └──────────────────────────────────────────────────────────────────┘
 ```
diff --git a/.cursor/skills/implement/SKILL.md b/.cursor/skills/implement/SKILL.md
index e1b5a83..cf44a57 100644
--- a/.cursor/skills/implement/SKILL.md
+++ b/.cursor/skills/implement/SKILL.md
@@ -73,9 +73,9 @@ For each task in the batch:
 - Determine: files OWNED (exclusive write), files READ-ONLY (shared interfaces, types), files FORBIDDEN (other agents' owned files)
 - If two tasks in the same batch would modify the same file, schedule them sequentially instead of in parallel
 
-### 5. Update Jira Status → In Progress
+### 5. Update Tracker Status → In Progress
 
-For each task in the batch, transition its Jira ticket status to **In Progress** via Jira MCP before launching the implementer.
+For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `protocols.md` for detection) before launching the implementer. If `tracker: local`, skip this step.
 
 ### 6. Launch Implementer Subagents
 
@@ -127,12 +127,12 @@ Track `auto_fix_attempts` count in the batch report for retrospective analysis.
 
 - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
   - `git add` all changed files from the batch
-  - `git commit` with a message that includes ALL JIRA-IDs of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[JIRA-ID-1] [JIRA-ID-2] ... Summary of changes`
+  - `git commit` with a message that includes ALL task IDs (Jira IDs, ADO IDs, or numeric prefixes) of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[TASK-ID-1] [TASK-ID-2] ... Summary of changes`
   - `git push` to the remote branch
 
-### 12. Update Jira Status → In Testing
+### 12. Update Tracker Status → In Testing
 
-After the batch is committed and pushed, transition the Jira ticket status of each task in the batch to **In Testing** via Jira MCP.
+After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step.
 
 ### 13. Loop
 
diff --git a/.cursor/skills/new-task/SKILL.md b/.cursor/skills/new-task/SKILL.md
index 69b0d87..e68ff4c 100644
--- a/.cursor/skills/new-task/SKILL.md
+++ b/.cursor/skills/new-task/SKILL.md
@@ -213,27 +213,27 @@ Present using the Choose format for each decision that has meaningful alternativ
 
 ---
 
-### Step 7: Jira Ticket
+### Step 7: Work Item Ticket
 
 **Role**: Project coordinator
-**Goal**: Create a Jira ticket and link it to the task file.
+**Goal**: Create a work item ticket and link it to the task file.
 
-1. Create a Jira ticket for the task:
+1. Create a ticket via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md` for detection):
    - Summary: the task's **Name** field
    - Description: the task's **Problem** and **Acceptance Criteria** sections
    - Story points: the task's **Complexity** value
    - Link to the appropriate epic (ask user if unclear which epic)
-2. Write the Jira ticket ID and Epic ID back into the task file header:
-   - Update **Task** field: `[JIRA-ID]_[short_name]`
-   - Update **Jira** field: `[JIRA-ID]`
+2. Write the ticket ID and Epic ID back into the task file header:
+   - Update **Task** field: `[TICKET-ID]_[short_name]`
+   - Update **Jira** field: `[TICKET-ID]`
    - Update **Epic** field: `[EPIC-ID]`
-3. Rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`
+3. Rename the file from `[##]_[short_name].md` to `[TICKET-ID]_[short_name].md`
 
-If Jira MCP is not authenticated or unavailable:
+If the work item tracker is not authenticated or unavailable (`tracker: local`):
 - Keep the numeric prefix
 - Set **Jira** to `pending`
 - Set **Epic** to `pending`
-- The task is still valid and can be implemented; Jira sync happens later
+- The task is still valid and can be implemented; tracker sync happens later
 
 ---
 
diff --git a/.cursor/skills/plan/steps/06_jira-epics.md b/.cursor/skills/plan/steps/06_jira-epics.md
index b9a1ecd..e93d95e 100644
--- a/.cursor/skills/plan/steps/06_jira-epics.md
+++ b/.cursor/skills/plan/steps/06_jira-epics.md
@@ -1,13 +1,13 @@
-## Step 6: Jira Epics
+## Step 6: Work Item Epics
 
 **Role**: Professional product manager
 
-**Goal**: Create Jira epics from components, ordered by dependency
+**Goal**: Create epics from components, ordered by dependency
 
-**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the Jira epic should understand the full context without needing to open separate files.
+**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the epic should understand the full context without needing to open separate files.
 
 1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
-2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md`
+2. Generate epics for each component using the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md`), structured per `templates/epic-spec.md`
 3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
 4. Include effort estimation per epic (T-shirt size or story points range)
 5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
@@ -15,7 +15,7 @@
 
 **CRITICAL — Epic description richness requirements**:
 
-Each epic description in Jira MUST include ALL of the following sections with substantial content:
+Each epic description MUST include ALL of the following sections with substantial content:
 - **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections)
 - **Problem / Context**: what problem this component solves, why it exists, current pain points
 - **Scope**: detailed in-scope and out-of-scope lists
@@ -31,7 +31,7 @@ Each epic description in Jira MUST include ALL of the following sections with su
 - **Key constraints**: from restrictions.md that affect this component
 - **Testing strategy**: summary of test types and coverage from tests.md
 
-Do NOT create minimal epics with just a summary and short description. The Jira epic is the primary reference document for the implementation team.
+Do NOT create minimal epics with just a summary and short description. The epic is the primary reference document for the implementation team.
 
 **Self-verification**:
 - [ ] "Bootstrap & Initial Structure" epic exists and is first in order
@@ -45,4 +45,4 @@ Do NOT create minimal epics with just a summary and short description. The Jira
 
 7. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`.
 
-**Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
+**Save action**: Epics created via the configured tracker MCP. Also saved locally in `epics.md` with ticket IDs. If `tracker: local`, save locally only.
diff --git a/.cursor/skills/plan/templates/epic-spec.md b/.cursor/skills/plan/templates/epic-spec.md
index 3157a84..6cb60e6 100644
--- a/.cursor/skills/plan/templates/epic-spec.md
+++ b/.cursor/skills/plan/templates/epic-spec.md
@@ -1,6 +1,6 @@
-# Jira Epic Template
+# Epic Template
 
-Use this template for each Jira epic. Create epics via Jira MCP.
+Use this template for each epic. Create epics via the configured work item tracker (Jira MCP or Azure DevOps MCP).
 
 ---
 
diff --git a/.cursor/skills/test-spec/SKILL.md b/.cursor/skills/test-spec/SKILL.md
index 9985407..54a056d 100644
--- a/.cursor/skills/test-spec/SKILL.md
+++ b/.cursor/skills/test-spec/SKILL.md
@@ -5,8 +5,8 @@ description: |
   then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits)
   that treat the system as a black box. Every test pairs input data with quantifiable expected results
   so tests can verify correctness, not just execution.
-  3-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate.
-  Produces 8 artifacts under tests/.
+  4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate,
+  test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/.
   Trigger phrases:
   - "test spec", "test specification", "test scenarios"
   - "blackbox test spec", "black box tests", "blackbox tests"
@@ -133,6 +133,8 @@ TESTS_OUTPUT_DIR/
 | Phase 3 | Updated test data spec (if data added) | `test-data.md` |
 | Phase 3 | Updated test files (if tests removed) | respective test file |
 | Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
+| Phase 4 | Test runner script | `scripts/run-tests.sh` |
+| Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` |
 
 ### Resumability
 
@@ -335,6 +337,56 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab
 
 ---
 
+### Phase 4: Test Runner Script Generation
+
+**Role**: DevOps engineer
+**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
+**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure.
+
+#### Step 1 — Detect test infrastructure
+
+1. Identify the project's test runner from manifests and config files:
+   - Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini)
+   - .NET: `dotnet test` (*.csproj, *.sln)
+   - Rust: `cargo test` (Cargo.toml)
+   - Node: `npm test` or `vitest` / `jest` (package.json)
+2. Identify docker-compose files for integration/blackbox tests (`docker-compose.test.yml`, `e2e/docker-compose*.yml`)
+3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
+4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
+
+#### Step 2 — Generate `scripts/run-tests.sh`
+
+Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
+
+1. Set `set -euo pipefail` and trap cleanup on EXIT
+2. Optionally accept a `--unit-only` flag to skip blackbox tests
+3. Run unit tests using the detected test runner
+4. If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down
+5. Print a summary of passed/failed/skipped tests
+6. Exit 0 on all pass, exit 1 on any failure
+
+#### Step 3 — Generate `scripts/run-performance-tests.sh`
+
+Create `scripts/run-performance-tests.sh` at the project root. The script must:
+
+1. Set `set -euo pipefail` and trap cleanup on EXIT
+2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
+3. Spin up the system under test (docker-compose or local)
+4. Run load/performance scenarios using the detected tool
+5. Compare results against threshold values from the test spec
+6. Print a pass/fail summary per scenario
+7. Exit 0 if all thresholds met, exit 1 otherwise
+
+#### Step 4 — Verify scripts
+
+1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
+2. Mark both scripts as executable (`chmod +x`)
+3. Present a summary of what each script does to the user
+
+**Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
+
+---
+
 ## Escalation Rules
 
 | Situation | Action |
@@ -373,7 +425,7 @@ When the user wants to:
 
 ```
 ┌──────────────────────────────────────────────────────────────────────┐
-│              Test Scenario Specification (3-Phase)                    │
+│              Test Scenario Specification (4-Phase)                   │
 ├──────────────────────────────────────────────────────────────────────┤
 │ PREREQ: Data Gate (BLOCKING)                                         │
 │   → verify AC, restrictions, input_data (incl. expected_results.md)  │
@@ -397,15 +449,21 @@ When the user wants to:
 │                                                                      │
 │ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)    │
 │   → build test-data + expected-result requirements checklist         │
-│   → ask user: provide data+result (A) or remove test (B)            │
+│   → ask user: provide data+result (A) or remove test (B)             │
 │   → validate input data (quality + quantity)                         │
 │   → validate expected results (quantifiable + comparison method)     │
 │   → remove tests without data or expected result, warn user          │
-│   → final coverage check (≥70% or FAIL + loop back)                 │
-│   [BLOCKING: coverage ≥ 70% required to pass]                       │
+│   → final coverage check (≥70% or FAIL + loop back)                  │
+│   [BLOCKING: coverage ≥ 70% required to pass]                        │
+│                                                                      │
+│ Phase 4: Test Runner Script Generation                               │
+│   → detect test runner + docker-compose + load tool                  │
+│   → scripts/run-tests.sh (unit + blackbox)                           │
+│   → scripts/run-performance-tests.sh (load/perf scenarios)           │
+│   → verify scripts are valid and executable                          │
 ├──────────────────────────────────────────────────────────────────────┤
 │ Principles: Black-box only · Traceability · Save immediately         │
 │             Ask don't assume · Spec don't code                       │
-│             No test without data · No test without expected result    │
+│             No test without data · No test without expected result   │
 └──────────────────────────────────────────────────────────────────────┘
 ```
diff --git a/.cursor/skills/test-spec/templates/run-tests-script.md b/.cursor/skills/test-spec/templates/run-tests-script.md
new file mode 100644
index 0000000..e5c41ff
--- /dev/null
+++ b/.cursor/skills/test-spec/templates/run-tests-script.md
@@ -0,0 +1,88 @@
+# Test Runner Script Structure
+
+Reference for generating `scripts/run-tests.sh` and `scripts/run-performance-tests.sh`.
+
+## `scripts/run-tests.sh`
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+UNIT_ONLY=false
+RESULTS_DIR="$PROJECT_ROOT/test-results"
+
+for arg in "$@"; do
+  case $arg in
+    --unit-only) UNIT_ONLY=true ;;
+  esac
+done
+
+cleanup() {
+  # tear down docker-compose if it was started
+}
+trap cleanup EXIT
+
+mkdir -p "$RESULTS_DIR"
+
+# --- Unit Tests ---
+# [detect runner: pytest / dotnet test / cargo test / npm test]
+# [run and capture exit code]
+# [save results to $RESULTS_DIR/unit-results.*]
+
+# --- Blackbox Tests (skip if --unit-only) ---
+# if ! $UNIT_ONLY; then
+#   [docker compose -f <compose-file> up -d]
+#   [wait for health checks]
+#   [run blackbox test suite]
+#   [save results to $RESULTS_DIR/blackbox-results.*]
+# fi
+
+# --- Summary ---
+# [print passed / failed / skipped counts]
+# [exit 0 if all passed, exit 1 otherwise]
+```
+
+## `scripts/run-performance-tests.sh`
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+RESULTS_DIR="$PROJECT_ROOT/test-results"
+
+cleanup() {
+  # tear down test environment if started
+}
+trap cleanup EXIT
+
+mkdir -p "$RESULTS_DIR"
+
+# --- Start System Under Test ---
+# [docker compose up -d or start local server]
+# [wait for health checks]
+
+# --- Run Performance Scenarios ---
+# [detect tool: k6 / locust / artillery / wrk / built-in]
+# [run each scenario from performance-tests.md]
+# [capture metrics: latency P50/P95/P99, throughput, error rate]
+
+# --- Compare Against Thresholds ---
+# [read thresholds from test spec or CLI args]
+# [print per-scenario pass/fail]
+
+# --- Summary ---
+# [exit 0 if all thresholds met, exit 1 otherwise]
+```
+
+## Key Requirements
+
+- Both scripts must be idempotent (safe to run multiple times)
+- Both scripts must work in CI (no interactive prompts, no GUI)
+- Use `trap cleanup EXIT` to ensure teardown even on failure
+- Exit codes: 0 = all pass, 1 = failures detected
+- Write results to `test-results/` directory (add to `.gitignore` if not already present)
+- The actual commands depend on the detected tech stack — fill them in during Phase 4 of the test-spec skill