diff --git a/.cursor/README.md b/.cursor/README.md index 4740c79..d9522b4 100644 --- a/.cursor/README.md +++ b/.cursor/README.md @@ -1,3 +1,7 @@ +## Assumptions + +- **Single project per workspace**: this system assumes one project per Cursor workspace. All `_docs/` paths are relative to the workspace root. For monorepos, open each service in its own Cursor workspace window. + ## How to Use Type `/autopilot` to start or continue the full workflow. The orchestrator detects where your project is and picks up from there. diff --git a/.cursor/skills/autopilot/SKILL.md b/.cursor/skills/autopilot/SKILL.md index 5e0178d..57d39a1 100644 --- a/.cursor/skills/autopilot/SKILL.md +++ b/.cursor/skills/autopilot/SKILL.md @@ -37,6 +37,7 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det - **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here - **Sound on pause**: follow `.cursor/rules/human-attention-sound.mdc` — play a notification sound before every pause that requires human input - **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically +- **Single project per workspace**: all `_docs/` paths are relative to workspace root; for monorepos, each service needs its own Cursor workspace ## Flow Resolution diff --git a/.cursor/skills/autopilot/flows/existing-code.md b/.cursor/skills/autopilot/flows/existing-code.md index d20a25e..91e120f 100644 --- a/.cursor/skills/autopilot/flows/existing-code.md +++ b/.cursor/skills/autopilot/flows/existing-code.md @@ -83,13 +83,13 @@ If `_docs/03_implementation/` has batch reports, the implement skill detects com --- **Step 2e — Refactor** -Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactor/FINAL_refactor_report.md` does not exist +Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactoring/FINAL_report.md` does not exist Action: Read and execute `.cursor/skills/refactor/SKILL.md` The refactor skill runs the full 6-phase method using the implemented tests as a safety net. -If `_docs/04_refactor/` has phase reports, the refactor skill detects completed phases and continues. +If `_docs/04_refactoring/` has phase reports, the refactor skill detects completed phases and continues. --- @@ -147,8 +147,8 @@ Condition: the autopilot state shows Step 2g (Implement) is completed AND the au Action: Run the full test suite to verify the implementation before deployment. -1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests -2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite +1. If `scripts/run-tests.sh` exists (generated by the test-spec skill Phase 4), execute it +2. Otherwise, detect the project's test runner manually (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests; if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite 3. **Report results**: present a summary of passed/failed/skipped tests If all tests pass → auto-chain to Step 2hb (Security Audit). @@ -208,12 +208,11 @@ Action: Present using Choose format: ``` - If user picks A → Run performance tests: - 1. Check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios - 2. Detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks) - 3. Execute performance test scenarios against the running system - 4. Present results vs acceptance criteria thresholds - 5. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort - 6. After completion, auto-chain to Step 2i (Deploy) + 1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it + 2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system + 3. Present results vs acceptance criteria thresholds + 4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort + 5. After completion, auto-chain to Step 2i (Deploy) - If user picks B → Mark Step 2hc as `skipped` in the state file, auto-chain to Step 2i (Deploy). --- diff --git a/.cursor/skills/autopilot/flows/greenfield.md b/.cursor/skills/autopilot/flows/greenfield.md index e7158bc..859094d 100644 --- a/.cursor/skills/autopilot/flows/greenfield.md +++ b/.cursor/skills/autopilot/flows/greenfield.md @@ -132,8 +132,8 @@ Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND t Action: Run the full test suite to verify the implementation before deployment. -1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests -2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite +1. If `scripts/run-tests.sh` exists (generated by the test-spec skill Phase 4), execute it +2. Otherwise, detect the project's test runner manually (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests; if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite 3. **Report results**: present a summary of passed/failed/skipped tests If all tests pass → auto-chain to Step 5b (Security Audit). @@ -193,12 +193,11 @@ Action: Present using Choose format: ``` - If user picks A → Run performance tests: - 1. Check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios - 2. Detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks) - 3. Execute performance test scenarios against the running system - 4. Present results vs acceptance criteria thresholds - 5. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort - 6. After completion, auto-chain to Step 6 (Deploy) + 1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it + 2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system + 3. Present results vs acceptance criteria thresholds + 4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort + 5. After completion, auto-chain to Step 6 (Deploy) - If user picks B → Mark Step 5c as `skipped` in the state file, auto-chain to Step 6 (Deploy). --- diff --git a/.cursor/skills/autopilot/protocols.md b/.cursor/skills/autopilot/protocols.md index fe118ee..18eb731 100644 --- a/.cursor/skills/autopilot/protocols.md +++ b/.cursor/skills/autopilot/protocols.md @@ -46,7 +46,7 @@ Rules: 2. Always include a recommendation with a brief justification 3. Keep option descriptions to one line each 4. If only 2 options make sense, use A/B only — do not pad with filler options -5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice +5. Play the notification sound (per `human-attention-sound.mdc`) before presenting the choice 6. Record every user decision in the state file's `Key Decisions` section 7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive @@ -154,7 +154,7 @@ After 3 failed auto-retries of the same skill, the failure is likely not user-re - Set `status: failed` in `Current Step` - Set `retry_count: 3` - Add a blocker entry describing the repeated failure -2. Play notification sound (per `human-input-sound.mdc`) +2. Play notification sound (per `human-attention-sound.mdc`) 3. Present using Choose format: ``` @@ -251,6 +251,32 @@ When a skill needs to read large files (e.g., full solution.md, architecture.md) - Use search tools (Grep, SemanticSearch) to find specific sections rather than reading entire files - Summarize key decisions from prior steps in the state file so they don't need to be re-read +### Context Budget Heuristic + +Agents cannot programmatically query context window usage. Use these heuristics to avoid degradation: + +| Zone | Indicators | Action | +|------|-----------|--------| +| **Safe** | State file + SKILL.md + 2–3 focused artifacts loaded | Continue normally | +| **Caution** | 5+ artifacts loaded, or 3+ large files (architecture, solution, discovery), or conversation has 20+ tool calls | Complete current sub-step, then suggest session break | +| **Danger** | Repeated truncation in tool output, tool calls failing unexpectedly, responses becoming shallow or repetitive | Save immediately, update state file, force session boundary | + +**Skill-specific guidelines**: + +| Skill | Recommended session breaks | +|-------|---------------------------| +| **document** | After every ~5 modules in Step 1; between Step 4 (Verification) and Step 5 (Solution Extraction) | +| **implement** | Each batch is a natural checkpoint; if more than 2 batches completed in one session, suggest break | +| **plan** | Between Step 5 (Test Specifications) and Step 6 (Epics) for projects with many components | +| **research** | Between Mode A rounds; between Mode A and Mode B | + +**How to detect caution/danger zone without API**: + +1. Count tool calls made so far — if approaching 20+, context is likely filling up +2. If reading a file returns truncated content, context is under pressure +3. If the agent starts producing shorter or less detailed responses than earlier in the conversation, context quality is degrading +4. When in doubt, save and suggest a new conversation — re-entry is cheap thanks to the state file + ## Rollback Protocol ### Implementation Steps (git-based) diff --git a/.cursor/skills/autopilot/state.md b/.cursor/skills/autopilot/state.md index d8250e2..50650aa 100644 --- a/.cursor/skills/autopilot/state.md +++ b/.cursor/skills/autopilot/state.md @@ -20,7 +20,7 @@ retry_count: [0-3 — number of consecutive auto-retry attempts for current step (include the step reference table from the active flow file) When updating `Current Step`, always write it as: - step: N ← autopilot step (0–6 or 2b/2c/2d/2e/2f/2g/2h/2hb/2i or 5b) + step: N ← autopilot step (0–6 or 2b/2c/2d/2e/2ea/2f/2g/2h/2hb/2hc/2i or 5b/5c) sub_step: M ← sub-skill's own internal step/phase number + name retry_count: 0 ← reset on new step or success; increment on each failed retry Example: diff --git a/.cursor/skills/code-review/SKILL.md b/.cursor/skills/code-review/SKILL.md index 44c190c..041013a 100644 --- a/.cursor/skills/code-review/SKILL.md +++ b/.cursor/skills/code-review/SKILL.md @@ -152,3 +152,42 @@ The `/implement` skill invokes this skill after each batch completes: 2. Passes task spec paths + changed files to this skill 3. If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms 4. If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info) + +## Integration Contract + +### Inputs (provided by the implement skill) + +| Input | Type | Source | Required | +|-------|------|--------|----------| +| `task_specs` | list of file paths | Task `.md` files from `_docs/02_tasks/` for the current batch | Yes | +| `changed_files` | list of file paths | Files modified by implementer agents (from `git diff` or agent reports) | Yes | +| `batch_number` | integer | Current batch number (for report naming) | Yes | +| `project_restrictions` | file path | `_docs/00_problem/restrictions.md` | If exists | +| `solution_overview` | file path | `_docs/01_solution/solution.md` | If exists | + +### Invocation Pattern + +The implement skill invokes code-review by: + +1. Reading `.cursor/skills/code-review/SKILL.md` +2. Providing the inputs above as context (read the files, pass content to the review phases) +3. Executing all 6 phases sequentially +4. Consuming the verdict from the output + +### Outputs (returned to the implement skill) + +| Output | Type | Description | +|--------|------|-------------| +| `verdict` | `PASS` / `PASS_WITH_WARNINGS` / `FAIL` | Drives the implement skill's auto-fix gate | +| `findings` | structured list | Each finding has: severity, category, file:line, title, description, suggestion, task reference | +| `critical_count` | integer | Number of Critical findings | +| `high_count` | integer | Number of High findings | +| `report_path` | file path | `_docs/03_implementation/reviews/batch_[NN]_review.md` | + +### Report Persistence + +Save the review report to `_docs/03_implementation/reviews/batch_[NN]_review.md` (create the `reviews/` directory if it does not exist). The report uses the Output Format defined above. + +The implement skill uses `verdict` to decide: +- `PASS` / `PASS_WITH_WARNINGS` → proceed to commit +- `FAIL` → enter auto-fix loop (up to 2 attempts), then escalate to user diff --git a/.cursor/skills/document/SKILL.md b/.cursor/skills/document/SKILL.md index 46b47aa..c920555 100644 --- a/.cursor/skills/document/SKILL.md +++ b/.cursor/skills/document/SKILL.md @@ -36,13 +36,30 @@ Fixed paths: - SOLUTION_DIR: `_docs/01_solution/` - PROBLEM_DIR: `_docs/00_problem/` -Announce resolved paths to user before proceeding. +Optional input: + +- FOCUS_DIR: a specific directory subtree provided by the user (e.g., `/document @src/api/`). When set, only this subtree and its transitive dependencies are analyzed. + +Announce resolved paths (and FOCUS_DIR if set) to user before proceeding. + +## Mode Detection + +Determine the execution mode before any other logic: + +| Mode | Trigger | Scope | +|------|---------|-------| +| **Full** | No input file, no existing state | Entire codebase | +| **Focus Area** | User provides a directory path (e.g., `@src/api/`) | Only the specified subtree + transitive dependencies | +| **Resume** | `state.json` exists in DOCUMENT_DIR | Continue from last checkpoint | + +Focus Area mode produces module + component docs for the targeted area only. It can be run repeatedly for different areas — each run appends to the existing module and component docs without overwriting other areas. ## Prerequisite Checks -1. If `_docs/` already exists and contains files, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?** +1. If `_docs/` already exists and contains files AND mode is **Full**, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?** 2. Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist 3. If DOCUMENT_DIR contains a `state.json`, offer to **resume from last checkpoint or start fresh** +4. If FOCUS_DIR is set, verify the directory exists and contains source files — **STOP if missing** ## Progress Tracking @@ -53,7 +70,9 @@ Create a TodoWrite with all steps (0 through 7). Update status as each step comp ### Step 0: Codebase Discovery **Role**: Code analyst -**Goal**: Build a complete map of the codebase before analyzing any code. +**Goal**: Build a complete map of the codebase (or targeted subtree) before analyzing any code. + +**Focus Area scoping**: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it. Scan and catalog: @@ -69,6 +88,7 @@ Scan and catalog: - Entry points (no internal dependents) - Cycles (mark for grouped analysis) - Topological processing order + - If FOCUS_DIR: mark which modules are in-scope vs dependency-only **Save**: `DOCUMENT_DIR/00_discovery.md` containing: - Directory tree (concise, relevant directories only) @@ -82,14 +102,18 @@ Scan and catalog: { "current_step": "module-analysis", "completed_steps": ["discovery"], + "focus_dir": null, "modules_total": 0, "modules_documented": [], "modules_remaining": [], + "module_batch": 0, "components_written": [], "last_updated": "" } ``` +Set `focus_dir` to the FOCUS_DIR path if in Focus Area mode, or `null` for Full mode. + --- ### Step 1: Module-Level Documentation @@ -97,6 +121,8 @@ Scan and catalog: **Role**: Code analyst **Goal**: Document every identified module individually, processing in topological order (leaves first). +**Batched processing**: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update `state.json`, present a progress summary. Between batches, evaluate whether to suggest a session break. + For each module in topological order: 1. **Read**: read the module's source code. Assess complexity and what context is needed. @@ -119,7 +145,26 @@ For each module in topological order: **Large modules**: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine. **Save**: `DOCUMENT_DIR/modules/[module_name].md` for each module. -**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`). +**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`). Increment `module_batch` after each batch of ~5. + +**Session break heuristic**: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break: + +``` +══════════════════════════════════════ + SESSION BREAK SUGGESTED +══════════════════════════════════════ + Modules documented: [X] of [Y] + Batches completed this session: [N] +══════════════════════════════════════ + A) Continue in this conversation + B) Save and continue in a fresh conversation (recommended) +══════════════════════════════════════ + Recommendation: B — fresh context improves + analysis quality for remaining modules +══════════════════════════════════════ +``` + +Re-entry is seamless: `state.json` tracks exactly which modules are done. --- @@ -238,6 +283,23 @@ Apply corrections inline to the documents that need them. **BLOCKING**: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes. +**Session boundary**: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context: + +``` +══════════════════════════════════════ + VERIFICATION COMPLETE — session break? +══════════════════════════════════════ + Steps 0–4 (analysis + verification) are done. + Steps 5–7 (solution + problem extraction + report) + can run in a fresh conversation. +══════════════════════════════════════ + A) Continue in this conversation + B) Save and continue in a new conversation (recommended) +══════════════════════════════════════ +``` + +If **Focus Area mode**: Steps 5–7 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run `/document` again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis. + --- ### Step 5: Solution Extraction (Retrospective) @@ -370,9 +432,11 @@ Maintain `DOCUMENT_DIR/state.json`: { "current_step": "module-analysis", "completed_steps": ["discovery"], + "focus_dir": null, "modules_total": 12, "modules_documented": ["utils/helpers", "models/user"], "modules_remaining": ["services/auth", "api/endpoints"], + "module_batch": 1, "components_written": [], "last_updated": "2026-03-21T14:00:00Z" } @@ -423,16 +487,21 @@ When resuming: ┌──────────────────────────────────────────────────────────────────┐ │ Bottom-Up Codebase Documentation (8-Step) │ ├──────────────────────────────────────────────────────────────────┤ -│ PREREQ: Check _docs/ exists (overwrite/merge/new?) │ -│ PREREQ: Check state.json for resume │ +│ MODE: Full / Focus Area (@dir) / Resume (state.json) │ +│ PREREQ: Check _docs/ exists (overwrite/merge/new?) │ +│ PREREQ: Check state.json for resume │ │ │ │ 0. Discovery → dependency graph, tech stack, topo order │ +│ (Focus Area: scoped to FOCUS_DIR + transitive deps) │ │ 1. Module Docs → per-module analysis (leaves first) │ +│ (batched ~5 modules; session break between batches) │ │ 2. Component Assembly → group modules, write component specs │ │ [BLOCKING: user confirms components] │ │ 3. System Synthesis → architecture, flows, data model, deploy │ │ 4. Verification → compare all docs vs code, fix errors │ │ [BLOCKING: user reviews corrections] │ +│ [SESSION BREAK suggested before Steps 5–7] │ +│ ── Focus Area mode stops here ── │ │ 5. Solution Extraction → retrospective solution.md │ │ 6. Problem Extraction → retrospective problem, restrictions, AC │ │ [BLOCKING: user confirms problem docs] │ @@ -441,5 +510,6 @@ When resuming: │ Principles: Bottom-up always · Dependencies first │ │ Incremental context · Verify against code │ │ Save immediately · Resume from checkpoint │ +│ Batch modules · Session breaks for large codebases │ └──────────────────────────────────────────────────────────────────┘ ``` diff --git a/.cursor/skills/implement/SKILL.md b/.cursor/skills/implement/SKILL.md index e1b5a83..cf44a57 100644 --- a/.cursor/skills/implement/SKILL.md +++ b/.cursor/skills/implement/SKILL.md @@ -73,9 +73,9 @@ For each task in the batch: - Determine: files OWNED (exclusive write), files READ-ONLY (shared interfaces, types), files FORBIDDEN (other agents' owned files) - If two tasks in the same batch would modify the same file, schedule them sequentially instead of in parallel -### 5. Update Jira Status → In Progress +### 5. Update Tracker Status → In Progress -For each task in the batch, transition its Jira ticket status to **In Progress** via Jira MCP before launching the implementer. +For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `protocols.md` for detection) before launching the implementer. If `tracker: local`, skip this step. ### 6. Launch Implementer Subagents @@ -127,12 +127,12 @@ Track `auto_fix_attempts` count in the batch report for retrospective analysis. - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS): - `git add` all changed files from the batch - - `git commit` with a message that includes ALL JIRA-IDs of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[JIRA-ID-1] [JIRA-ID-2] ... Summary of changes` + - `git commit` with a message that includes ALL task IDs (Jira IDs, ADO IDs, or numeric prefixes) of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[TASK-ID-1] [TASK-ID-2] ... Summary of changes` - `git push` to the remote branch -### 12. Update Jira Status → In Testing +### 12. Update Tracker Status → In Testing -After the batch is committed and pushed, transition the Jira ticket status of each task in the batch to **In Testing** via Jira MCP. +After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step. ### 13. Loop diff --git a/.cursor/skills/new-task/SKILL.md b/.cursor/skills/new-task/SKILL.md index 69b0d87..e68ff4c 100644 --- a/.cursor/skills/new-task/SKILL.md +++ b/.cursor/skills/new-task/SKILL.md @@ -213,27 +213,27 @@ Present using the Choose format for each decision that has meaningful alternativ --- -### Step 7: Jira Ticket +### Step 7: Work Item Ticket **Role**: Project coordinator -**Goal**: Create a Jira ticket and link it to the task file. +**Goal**: Create a work item ticket and link it to the task file. -1. Create a Jira ticket for the task: +1. Create a ticket via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md` for detection): - Summary: the task's **Name** field - Description: the task's **Problem** and **Acceptance Criteria** sections - Story points: the task's **Complexity** value - Link to the appropriate epic (ask user if unclear which epic) -2. Write the Jira ticket ID and Epic ID back into the task file header: - - Update **Task** field: `[JIRA-ID]_[short_name]` - - Update **Jira** field: `[JIRA-ID]` +2. Write the ticket ID and Epic ID back into the task file header: + - Update **Task** field: `[TICKET-ID]_[short_name]` + - Update **Jira** field: `[TICKET-ID]` - Update **Epic** field: `[EPIC-ID]` -3. Rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md` +3. Rename the file from `[##]_[short_name].md` to `[TICKET-ID]_[short_name].md` -If Jira MCP is not authenticated or unavailable: +If the work item tracker is not authenticated or unavailable (`tracker: local`): - Keep the numeric prefix - Set **Jira** to `pending` - Set **Epic** to `pending` -- The task is still valid and can be implemented; Jira sync happens later +- The task is still valid and can be implemented; tracker sync happens later --- diff --git a/.cursor/skills/plan/steps/06_jira-epics.md b/.cursor/skills/plan/steps/06_jira-epics.md index b9a1ecd..e93d95e 100644 --- a/.cursor/skills/plan/steps/06_jira-epics.md +++ b/.cursor/skills/plan/steps/06_jira-epics.md @@ -1,13 +1,13 @@ -## Step 6: Jira Epics +## Step 6: Work Item Epics **Role**: Professional product manager -**Goal**: Create Jira epics from components, ordered by dependency +**Goal**: Create epics from components, ordered by dependency -**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the Jira epic should understand the full context without needing to open separate files. +**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the epic should understand the full context without needing to open separate files. 1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure. -2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md` +2. Generate epics for each component using the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md`), structured per `templates/epic-spec.md` 3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph) 4. Include effort estimation per epic (T-shirt size or story points range) 5. Ensure each epic has clear acceptance criteria cross-referenced with component specs @@ -15,7 +15,7 @@ **CRITICAL — Epic description richness requirements**: -Each epic description in Jira MUST include ALL of the following sections with substantial content: +Each epic description MUST include ALL of the following sections with substantial content: - **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections) - **Problem / Context**: what problem this component solves, why it exists, current pain points - **Scope**: detailed in-scope and out-of-scope lists @@ -31,7 +31,7 @@ Each epic description in Jira MUST include ALL of the following sections with su - **Key constraints**: from restrictions.md that affect this component - **Testing strategy**: summary of test types and coverage from tests.md -Do NOT create minimal epics with just a summary and short description. The Jira epic is the primary reference document for the implementation team. +Do NOT create minimal epics with just a summary and short description. The epic is the primary reference document for the implementation team. **Self-verification**: - [ ] "Bootstrap & Initial Structure" epic exists and is first in order @@ -45,4 +45,4 @@ Do NOT create minimal epics with just a summary and short description. The Jira 7. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`. -**Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs. +**Save action**: Epics created via the configured tracker MCP. Also saved locally in `epics.md` with ticket IDs. If `tracker: local`, save locally only. diff --git a/.cursor/skills/plan/templates/epic-spec.md b/.cursor/skills/plan/templates/epic-spec.md index 3157a84..6cb60e6 100644 --- a/.cursor/skills/plan/templates/epic-spec.md +++ b/.cursor/skills/plan/templates/epic-spec.md @@ -1,6 +1,6 @@ -# Jira Epic Template +# Epic Template -Use this template for each Jira epic. Create epics via Jira MCP. +Use this template for each epic. Create epics via the configured work item tracker (Jira MCP or Azure DevOps MCP). --- diff --git a/.cursor/skills/test-spec/SKILL.md b/.cursor/skills/test-spec/SKILL.md index 9985407..54a056d 100644 --- a/.cursor/skills/test-spec/SKILL.md +++ b/.cursor/skills/test-spec/SKILL.md @@ -5,8 +5,8 @@ description: | then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits) that treat the system as a black box. Every test pairs input data with quantifiable expected results so tests can verify correctness, not just execution. - 3-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate. - Produces 8 artifacts under tests/. + 4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate, + test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/. Trigger phrases: - "test spec", "test specification", "test scenarios" - "blackbox test spec", "black box tests", "blackbox tests" @@ -133,6 +133,8 @@ TESTS_OUTPUT_DIR/ | Phase 3 | Updated test data spec (if data added) | `test-data.md` | | Phase 3 | Updated test files (if tests removed) | respective test file | | Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` | +| Phase 4 | Test runner script | `scripts/run-tests.sh` | +| Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` | ### Resumability @@ -335,6 +337,56 @@ When coverage ≥ 70% and all remaining tests have validated data AND quantifiab --- +### Phase 4: Test Runner Script Generation + +**Role**: DevOps engineer +**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently. +**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. + +#### Step 1 — Detect test infrastructure + +1. Identify the project's test runner from manifests and config files: + - Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini) + - .NET: `dotnet test` (*.csproj, *.sln) + - Rust: `cargo test` (Cargo.toml) + - Node: `npm test` or `vitest` / `jest` (package.json) +2. Identify docker-compose files for integration/blackbox tests (`docker-compose.test.yml`, `e2e/docker-compose*.yml`) +3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks) +4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements + +#### Step 2 — Generate `scripts/run-tests.sh` + +Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must: + +1. Set `set -euo pipefail` and trap cleanup on EXIT +2. Optionally accept a `--unit-only` flag to skip blackbox tests +3. Run unit tests using the detected test runner +4. If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down +5. Print a summary of passed/failed/skipped tests +6. Exit 0 on all pass, exit 1 on any failure + +#### Step 3 — Generate `scripts/run-performance-tests.sh` + +Create `scripts/run-performance-tests.sh` at the project root. The script must: + +1. Set `set -euo pipefail` and trap cleanup on EXIT +2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args) +3. Spin up the system under test (docker-compose or local) +4. Run load/performance scenarios using the detected tool +5. Compare results against threshold values from the test spec +6. Print a pass/fail summary per scenario +7. Exit 0 if all thresholds met, exit 1 otherwise + +#### Step 4 — Verify scripts + +1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`) +2. Mark both scripts as executable (`chmod +x`) +3. Present a summary of what each script does to the user + +**Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root. + +--- + ## Escalation Rules | Situation | Action | @@ -373,7 +425,7 @@ When the user wants to: ``` ┌──────────────────────────────────────────────────────────────────────┐ -│ Test Scenario Specification (3-Phase) │ +│ Test Scenario Specification (4-Phase) │ ├──────────────────────────────────────────────────────────────────────┤ │ PREREQ: Data Gate (BLOCKING) │ │ → verify AC, restrictions, input_data (incl. expected_results.md) │ @@ -397,15 +449,21 @@ When the user wants to: │ │ │ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │ │ → build test-data + expected-result requirements checklist │ -│ → ask user: provide data+result (A) or remove test (B) │ +│ → ask user: provide data+result (A) or remove test (B) │ │ → validate input data (quality + quantity) │ │ → validate expected results (quantifiable + comparison method) │ │ → remove tests without data or expected result, warn user │ -│ → final coverage check (≥70% or FAIL + loop back) │ -│ [BLOCKING: coverage ≥ 70% required to pass] │ +│ → final coverage check (≥70% or FAIL + loop back) │ +│ [BLOCKING: coverage ≥ 70% required to pass] │ +│ │ +│ Phase 4: Test Runner Script Generation │ +│ → detect test runner + docker-compose + load tool │ +│ → scripts/run-tests.sh (unit + blackbox) │ +│ → scripts/run-performance-tests.sh (load/perf scenarios) │ +│ → verify scripts are valid and executable │ ├──────────────────────────────────────────────────────────────────────┤ │ Principles: Black-box only · Traceability · Save immediately │ │ Ask don't assume · Spec don't code │ -│ No test without data · No test without expected result │ +│ No test without data · No test without expected result │ └──────────────────────────────────────────────────────────────────────┘ ``` diff --git a/.cursor/skills/test-spec/templates/run-tests-script.md b/.cursor/skills/test-spec/templates/run-tests-script.md new file mode 100644 index 0000000..e5c41ff --- /dev/null +++ b/.cursor/skills/test-spec/templates/run-tests-script.md @@ -0,0 +1,88 @@ +# Test Runner Script Structure + +Reference for generating `scripts/run-tests.sh` and `scripts/run-performance-tests.sh`. + +## `scripts/run-tests.sh` + +```bash +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" +UNIT_ONLY=false +RESULTS_DIR="$PROJECT_ROOT/test-results" + +for arg in "$@"; do + case $arg in + --unit-only) UNIT_ONLY=true ;; + esac +done + +cleanup() { + # tear down docker-compose if it was started +} +trap cleanup EXIT + +mkdir -p "$RESULTS_DIR" + +# --- Unit Tests --- +# [detect runner: pytest / dotnet test / cargo test / npm test] +# [run and capture exit code] +# [save results to $RESULTS_DIR/unit-results.*] + +# --- Blackbox Tests (skip if --unit-only) --- +# if ! $UNIT_ONLY; then +# [docker compose -f up -d] +# [wait for health checks] +# [run blackbox test suite] +# [save results to $RESULTS_DIR/blackbox-results.*] +# fi + +# --- Summary --- +# [print passed / failed / skipped counts] +# [exit 0 if all passed, exit 1 otherwise] +``` + +## `scripts/run-performance-tests.sh` + +```bash +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" +RESULTS_DIR="$PROJECT_ROOT/test-results" + +cleanup() { + # tear down test environment if started +} +trap cleanup EXIT + +mkdir -p "$RESULTS_DIR" + +# --- Start System Under Test --- +# [docker compose up -d or start local server] +# [wait for health checks] + +# --- Run Performance Scenarios --- +# [detect tool: k6 / locust / artillery / wrk / built-in] +# [run each scenario from performance-tests.md] +# [capture metrics: latency P50/P95/P99, throughput, error rate] + +# --- Compare Against Thresholds --- +# [read thresholds from test spec or CLI args] +# [print per-scenario pass/fail] + +# --- Summary --- +# [exit 0 if all thresholds met, exit 1 otherwise] +``` + +## Key Requirements + +- Both scripts must be idempotent (safe to run multiple times) +- Both scripts must work in CI (no interactive prompts, no GUI) +- Use `trap cleanup EXIT` to ensure teardown even on failure +- Exit codes: 0 = all pass, 1 = failures detected +- Write results to `test-results/` directory (add to `.gitignore` if not already present) +- The actual commands depend on the detected tech stack — fill them in during Phase 4 of the test-spec skill