Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-06-22 09:21:07 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -17,6 +17,17 @@ disable-model-invocation: true

 Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Detects project state from `_docs/`, resumes from where work stopped, and flows through skills automatically. The user invokes `/autopilot` once — the engine handles sequencing, transitions, and re-entry.

+## File Index
+
+| File | Purpose |
+|------|---------|
+| `flows/greenfield.md` | Detection rules, step table, and auto-chain rules for new projects |
+| `flows/existing-code.md` | Detection rules, step table, and auto-chain rules for existing codebases |
+| `state.md` | State file format, rules, re-entry protocol, session boundaries |
+| `protocols.md` | User interaction, Jira MCP auth, choice format, error handling, status summary |
+
+**On every invocation**: read all four files above before executing any logic.
+
 ## Core Principles

 - **Auto-chain**: when a skill completes, immediately start the next one — no pause between skills
@@ -26,411 +37,50 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det
 - **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here
 - **Sound on pause**: follow `.cursor/rules/human-input-sound.mdc` — play a notification sound before every pause that requires human input
 - **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically
- **Jira MCP recommended**: steps that create Jira artifacts (Plan Step 6, Decompose) should have authenticated Jira MCP — if unavailable, offer user the choice to continue with local-only task tracking

-## Jira MCP Authentication
+## Flow Resolution

-Several workflow steps create Jira artifacts (epics, tasks, links). The Jira MCP server must be authenticated **before** any step that writes to Jira.
+Determine which flow to use:

-### Steps That Require Jira MCP
+1. If workspace has source code files **and** `_docs/` does not exist → **existing-code flow** (Pre-Step detection)
+2. If `_docs/_autopilot_state.md` exists and records Document in `Completed Steps` → **existing-code flow**
+3. If `_docs/_autopilot_state.md` exists and `step: done` AND workspace contains source code → **existing-code flow** (completed project re-entry — loops to New Task)
+4. Otherwise → **greenfield flow**

-| Step | Sub-Step | Jira Action |
-|------|----------|-------------|
-| 2 (Plan) | Step 6 — Jira Epics | Create epics for each component |
-| 3 (Decompose) | Step 1–3 — All tasks | Create Jira ticket per task, link to epic |
+After selecting the flow, apply its detection rules (first match wins) to determine the current step.

-### Authentication Gate
+## Execution Loop

-Before entering **Step 2 (Plan)** or **Step 3 (Decompose)** for the first time, the autopilot must:
-
-1. Call `mcp_auth` on the Jira MCP server
-2. If authentication succeeds → proceed normally
-3. If the user **skips** or authentication fails → present using Choose format:
-
-```
-══════════════════════════════════════
- Jira MCP authentication failed
-══════════════════════════════════════
- A) Retry authentication (retry mcp_auth)
- B) Continue without Jira (tasks saved locally only)
-══════════════════════════════════════
- Recommendation: A — Jira IDs drive task referencing,
- dependency tracking, and implementation batching.
- Without Jira, task files use numeric prefixes instead.
-══════════════════════════════════════
-```
-
-If user picks **B** (continue without Jira):
- Set a flag in the state file: `jira_enabled: false`
- All skills that would create Jira tickets instead save metadata locally in the task/epic files with `Jira: pending` status
- Task files keep numeric prefixes (e.g., `01_initial_structure.md`) instead of Jira ID prefixes
- The workflow proceeds normally in all other respects
-
-### Re-Authentication
-
-If Jira MCP was already authenticated in a previous invocation (verify by listing available Jira tools beyond `mcp_auth`), skip the auth gate.
-
-## User Interaction Protocol
-
-Every time the autopilot or a sub-skill needs a user decision, use the **Choose A / B / C / D** format. This applies to:
-
- State transitions where multiple valid next actions exist
- Sub-skill BLOCKING gates that require user judgment
- Any fork where the autopilot cannot confidently pick the right path
- Trade-off decisions (tech choices, scope, risk acceptance)
-
-### When to Ask (MUST ask)
-
- The next action is ambiguous (e.g., "another research round or proceed?")
- The decision has irreversible consequences (e.g., architecture choices, skipping a step)
- The user's intent or preference cannot be inferred from existing artifacts
- A sub-skill's BLOCKING gate explicitly requires user confirmation
- Multiple valid approaches exist with meaningfully different trade-offs
-
-### When NOT to Ask (auto-transition)
-
- Only one logical next step exists (e.g., Problem complete → Research is the only option)
- The transition is deterministic from the state (e.g., Plan complete → Decompose)
- The decision is low-risk and reversible
- Existing artifacts or prior decisions already imply the answer
-
-### Choice Format
-
-Always present decisions in this format:
-
-```
-══════════════════════════════════════
- DECISION REQUIRED: [brief context]
-══════════════════════════════════════
- A) [Option A — short description]
- B) [Option B — short description]
- C) [Option C — short description, if applicable]
- D) [Option D — short description, if applicable]
-══════════════════════════════════════
- Recommendation: [A/B/C/D] — [one-line reason]
-══════════════════════════════════════
-```
-
-Rules:
-1. Always provide 2–4 concrete options (never open-ended questions)
-2. Always include a recommendation with a brief justification
-3. Keep option descriptions to one line each
-4. If only 2 options make sense, use A/B only — do not pad with filler options
-5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice
-6. Record every user decision in the state file's `Key Decisions` section
-7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
-
-## State File: `_docs/_autopilot_state.md`
-
-The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
-
-### Format
-
-```markdown
-# Autopilot State
-
-## Current Step
-step: [0-5 or "done"]
-name: [Problem / Research / Plan / Decompose / Implement / Deploy / Done]
-status: [not_started / in_progress / completed]
-sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
-
-## Step ↔ SubStep Reference
-| Step | Name       | Sub-Skill              | Internal SubSteps                        |
-|------|------------|------------------------|------------------------------------------|
-| 0    | Problem    | problem/SKILL.md       | Phase 1–4                                |
-| 1    | Research   | research/SKILL.md      | Mode A: Phase 1–4 · Mode B: Step 0–8    |
-| 2    | Plan       | plan/SKILL.md          | Step 1–6                                 |
-| 3    | Decompose  | decompose/SKILL.md     | Step 1–4                                 |
-| 4    | Implement  | implement/SKILL.md     | (batch-driven, no fixed sub-steps)       |
-| 5    | Deploy     | deploy/SKILL.md        | Step 1–7                                 |
-
-When updating `Current Step`, always write it as:
-  step: N          ← autopilot step (0–5)
-  sub_step: M      ← sub-skill's own internal step/phase number + name
-Example:
-  step: 2
-  name: Plan
-  status: in_progress
-  sub_step: 4 — Architecture Review & Risk Assessment
-
-## Completed Steps
-
-| Step | Name | Completed | Key Outcome |
-|------|------|-----------|-------------|
-| 0 | Problem | [date] | [one-line summary] |
-| 1 | Research | [date] | [N drafts, final approach summary] |
-| 2 | Plan | [date] | [N components, architecture summary] |
-| 3 | Decompose | [date] | [N tasks, total complexity points] |
-| 4 | Implement | [date] | [N batches, pass/fail summary] |
-| 5 | Deploy | [date] | [artifacts produced] |
-
-## Key Decisions
- [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
- [decision 2: e.g. "6 research rounds, final draft: solution_draft06.md"]
- [decision N]
-
-## Last Session
-date: [date]
-ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
-reason: [completed step / session boundary / user paused / context limit]
-notes: [any context for next session, e.g. "User asked to revisit risk assessment"]
-
-## Blockers
- [blocker 1, if any]
- [none]
-```
-
-### State File Rules
-
-1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 0)
-2. **Update** the state file after every step completion, every session boundary, and every BLOCKING gate confirmation
-3. **Read** the state file as the first action on every invocation — before folder scanning
-4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 2 but `_docs/02_document/architecture.md` already exists), trust the folder structure and update the state file to match
-5. **Never delete** the state file. It accumulates history across the entire project lifecycle
-
-## Execution Entry Point
-
-Every invocation of this skill follows the same sequence:
+Every invocation follows this sequence:

 ```
 1. Read _docs/_autopilot_state.md (if exists)
-2. Cross-check state file against _docs/ folder structure
-3. Resolve current step (state file + folder scan)
-4. Present Status Summary (from state file context)
-5. Enter Execution Loop:
-   a. Read and execute the current skill's SKILL.md
-   b. When skill completes → update state file
-   c. Re-detect next step
-   d. If next skill is ready → auto-chain (go to 5a with next skill)
-   e. If session boundary reached → update state file with session notes → suggest new conversation
-   f. If all steps done → update state file → report completion
+2. Read all File Index files above
+3. Cross-check state file against _docs/ folder structure (rules in state.md)
+4. Resolve flow (see Flow Resolution above)
+5. Resolve current step (detection rules from the active flow file)
+6. Present Status Summary (format in protocols.md)
+7. Execute:
+   a. Delegate to current skill (see Skill Delegation below)
+   b. When skill completes → update state file (rules in state.md)
+   c. Re-detect next step from the active flow's detection rules
+   d. If next skill is ready → auto-chain (go to 7a with next skill)
+   e. If session boundary reached → update state, suggest new conversation (rules in state.md)
+   f. If all steps done → update state → report completion
 ```

-## State Detection
-
-Read `_docs/_autopilot_state.md` first. If it exists and is consistent with the folder structure, use the `Current Step` from the state file. If the state file doesn't exist or is inconsistent, fall back to folder scanning.
-
-### Folder Scan Rules (fallback)
-
-Scan `_docs/` to determine the current workflow position. Check rules in order — first match wins.
-
-### Detection Rules
-
-**Pre-Step — Existing Codebase Detection**
-Condition: `_docs/` does not exist AND the workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`, `src/`, `Cargo.toml`, `*.csproj`, `package.json`)
-
-Action: An existing codebase without documentation was detected. Present using Choose format:
-
-```
-══════════════════════════════════════
- DECISION REQUIRED: Existing codebase detected
-══════════════════════════════════════
- A) Start fresh — define the problem from scratch (normal workflow)
- B) Document existing codebase first — run /document to reverse-engineer docs, then continue
-══════════════════════════════════════
- Recommendation: B — the /document skill analyzes your code
- bottom-up and produces _docs/ artifacts automatically,
- then you can continue with refactor or the normal workflow.
-══════════════════════════════════════
-```
-
- If user picks A → proceed to Step 0 (Problem Gathering) as normal
- If user picks B → read and execute `.cursor/skills/document/SKILL.md`. After document skill completes, re-detect state (the produced `_docs/` artifacts will place the project at Step 2 or later).
-
---
-
-**Step 0 — Problem Gathering**
-Condition: `_docs/00_problem/` does not exist, OR any of these are missing/empty:
- `problem.md`
- `restrictions.md`
- `acceptance_criteria.md`
- `input_data/` (must contain at least one file)
-
-Action: Read and execute `.cursor/skills/problem/SKILL.md`
-
---
-
-**Step 1 — Research (Initial)**
-Condition: `_docs/00_problem/` is complete AND `_docs/01_solution/` has no `solution_draft*.md` files
-
-Action: Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode A)
-
---
-
-**Step 1b — Research Decision**
-Condition: `_docs/01_solution/` contains `solution_draft*.md` files AND `_docs/01_solution/solution.md` does not exist AND `_docs/02_document/architecture.md` does not exist
-
-Action: Present the current research state to the user:
- How many solution drafts exist
- Whether tech_stack.md and security_analysis.md exist
- One-line summary from the latest draft
-
-Then present using the **Choose format**:
-
-```
-══════════════════════════════════════
- DECISION REQUIRED: Research complete — next action?
-══════════════════════════════════════
- A) Run another research round (Mode B assessment)
- B) Proceed to planning with current draft
-══════════════════════════════════════
- Recommendation: [A or B] — [reason based on draft quality]
-══════════════════════════════════════
-```
-
- If user picks A → Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode B)
- If user picks B → auto-chain to Step 2 (Plan)
-
---
-
-**Step 2 — Plan**
-Condition: `_docs/01_solution/` has `solution_draft*.md` files AND `_docs/02_document/architecture.md` does not exist
-
-Action:
-1. The plan skill's Prereq 2 will rename the latest draft to `solution.md` — this is handled by the plan skill itself
-2. Read and execute `.cursor/skills/plan/SKILL.md`
-
-If `_docs/02_document/` exists but is incomplete (has some artifacts but no `FINAL_report.md`), the plan skill's built-in resumability handles it.
-
---
-
-**Step 3 — Decompose**
-Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_tasks/` does not exist or has no task files (excluding `_dependencies_table.md`)
-
-Action: Read and execute `.cursor/skills/decompose/SKILL.md`
-
-If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
-
---
-
-**Step 4 — Implement**
-Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
-
-Action: Read and execute `.cursor/skills/implement/SKILL.md`
-
-If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
-
---
-
-**Step 5 — Deploy**
-Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND `_docs/04_deploy/` does not exist or is incomplete
-
-Action: Read and execute `.cursor/skills/deploy/SKILL.md`
-
---
-
-**Done**
-Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md)
-
-Action: Report project completion with summary.
-
-## Status Summary
-
-On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback).
-
-Format:
-
-```
-═══════════════════════════════════════════════════
- AUTOPILOT STATUS
-═══════════════════════════════════════════════════
- Step 0  Problem      [DONE / IN PROGRESS / NOT STARTED]
- Step 1  Research     [DONE (N drafts) / IN PROGRESS / NOT STARTED]
- Step 2  Plan         [DONE / IN PROGRESS / NOT STARTED]
- Step 3  Decompose    [DONE (N tasks) / IN PROGRESS / NOT STARTED]
- Step 4  Implement    [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
- Step 5  Deploy       [DONE / IN PROGRESS / NOT STARTED]
-═══════════════════════════════════════════════════
- Current: Step N — Name
- SubStep: M — [sub-skill internal step name]
- Action:  [what will happen next]
-═══════════════════════════════════════════════════
-```
-
-For re-entry (state file exists), also include:
- Key decisions from the state file's `Key Decisions` section
- Last session context from the `Last Session` section
- Any blockers from the `Blockers` section
-
-## Auto-Chain Rules
-
-After a skill completes, apply these rules:
-
-| Completed Step | Next Action |
-|---------------|-------------|
-| Problem Gathering | Auto-chain → Research (Mode A) |
-| Research (any round) | Auto-chain → Research Decision (ask user: another round or proceed?) |
-| Research Decision → proceed | Auto-chain → Plan |
-| Plan | Auto-chain → Decompose |
-| Decompose | **Session boundary** — suggest new conversation before Implement |
-| Implement | Auto-chain → Deploy |
-| Deploy | Report completion |
-
-### Session Boundary: Decompose → Implement
-
-After decompose completes, **do not auto-chain to implement**. Instead:
-
-1. Update state file: mark Decompose as completed, set current step to 4 (Implement) with status `not_started`
-2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
-3. Present a summary: number of tasks, estimated batches, total complexity points
-4. Use Choose format:
-
-```
-══════════════════════════════════════
- DECISION REQUIRED: Decompose complete — start implementation?
-══════════════════════════════════════
- A) Start a new conversation for implementation (recommended for context freshness)
- B) Continue implementation in this conversation
-══════════════════════════════════════
- Recommendation: A — implementation is the longest phase, fresh context helps
-══════════════════════════════════════
-```
-
-This is the only hard session boundary. All other transitions auto-chain.
-
 ## Skill Delegation

 For each step, the delegation pattern is:

-1. Update state file: set `step` to the autopilot step number (0–5), status to `in_progress`, set `sub_step` to the sub-skill's current internal step/phase number and name
+1. Update state file: set `step` to the autopilot step number, status to `in_progress`, set `sub_step` to the sub-skill's current internal step/phase
 2. Announce: "Starting [Skill Name]..."
 3. Read the skill file: `.cursor/skills/[name]/SKILL.md`
-4. Execute the skill's workflow exactly as written, including:
-   - All BLOCKING gates (present to user, wait for confirmation)
-   - All self-verification checklists
-   - All save actions
-   - All escalation rules
-   - Update `sub_step` in the state file each time the sub-skill advances to a new internal step/phase
-5. When the skill's workflow is fully complete:
-   - Update state file: mark step as `completed`, record date, write one-line key outcome
-   - Add any key decisions made during this step to the `Key Decisions` section
-   - Return to the auto-chain rules
+4. Execute the skill's workflow exactly as written, including all BLOCKING gates, self-verification checklists, save actions, and escalation rules. Update `sub_step` in state each time the sub-skill advances.
+5. When complete: mark step `completed`, record date + key outcome, add key decisions to state file, return to auto-chain rules (from active flow file)

 Do NOT modify, skip, or abbreviate any part of the sub-skill's workflow. The autopilot is a sequencer, not an optimizer.

-## Re-Entry Protocol
-
-When the user invokes `/autopilot` and work already exists:
-
-1. Read `_docs/_autopilot_state.md`
-2. Cross-check against `_docs/` folder structure
-3. Present Status Summary with context from state file (key decisions, last session, blockers)
-4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
-5. Continue execution from detected state
-
-## Error Handling
-
-All error situations that require user input MUST use the **Choose A / B / C / D** format.
-
-| Situation | Action |
-|-----------|--------|
-| State detection is ambiguous (artifacts suggest two different steps) | Present findings and use Choose format with the candidate steps as options |
-| Sub-skill fails or hits an unrecoverable blocker | Use Choose format: A) retry, B) skip with warning, C) abort and fix manually |
-| User wants to skip a step | Use Choose format: A) skip (with dependency warning), B) execute the step |
-| User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
-| User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |
-
 ## Trigger Conditions

 This skill activates when the user wants to:
@@ -454,35 +104,27 @@ This skill activates when the user wants to:
 │              Autopilot (Auto-Chain Orchestrator)                │
 ├────────────────────────────────────────────────────────────────┤
 │ EVERY INVOCATION:                                              │
-│   1. State Detection (scan _docs/)                             │
-│   2. Status Summary (show progress)                            │
-│   3. Execute current skill                                     │
-│   4. Auto-chain to next skill (loop)                           │
+│   1. Read state file + module files                            │
+│   2. Resolve flow & current step                               │
+│   3. Status Summary → Execute → Auto-chain (loop)             │
 │                                                                │
-│ WORKFLOW:                                                       │
-│   Step 0  Problem    → .cursor/skills/problem/SKILL.md         │
-│     ↓ auto-chain                                               │
-│   Step 1  Research   → .cursor/skills/research/SKILL.md        │
-│     ↓ auto-chain (ask: another round?)                         │
-│   Step 2  Plan       → .cursor/skills/plan/SKILL.md            │
-│     ↓ auto-chain                                               │
-│   Step 3  Decompose  → .cursor/skills/decompose/SKILL.md       │
-│     ↓ SESSION BOUNDARY (suggest new conversation)              │
-│   Step 4  Implement  → .cursor/skills/implement/SKILL.md       │
-│     ↓ auto-chain                                               │
-│   Step 5  Deploy     → .cursor/skills/deploy/SKILL.md          │
-│     ↓                                                          │
-│   DONE                                                         │
+│ GREENFIELD FLOW (flows/greenfield.md):                         │
+│   Step 0 Problem → Step 1 Research → Step 2 Plan              │
+│   → Step 3 Decompose → [SESSION] → Step 4 Implement           │
+│   → Step 5 Run Tests → Step 6 Deploy → DONE                   │
 │                                                                │
-│ STATE FILE: _docs/_autopilot_state.md                          │
-│ FALLBACK: _docs/ folder structure scan                         │
+│ EXISTING CODE FLOW (flows/existing-code.md):                   │
+│   Pre-Step Document → 2b Test Spec → 2c Decompose Tests      │
+│   → [SESSION] → 2d Implement Tests → 2e Refactor             │
+│   → 2f New Task → [SESSION] → 2g Implement                   │
+│   → 2h Run Tests → 2i Deploy → DONE                          │
+│                                                                │
+│ STATE: _docs/_autopilot_state.md (see state.md)                │
+│ PROTOCOLS: choice format, Jira auth, errors (see protocols.md) │
 │ PAUSE POINTS: sub-skill BLOCKING gates only                    │
-│ SESSION BREAK: after Decompose (before Implement)              │
-│ USER INPUT: Choose A/B/C/D format at genuine decisions only    │
-│ AUTO-TRANSITION: when path is unambiguous, don't ask            │
+│ SESSION BREAK: after Decompose/New Task (before Implement)     │
 ├────────────────────────────────────────────────────────────────┤
-│ Principles: Auto-chain · State to file · Rich re-entry         │
-│             Delegate don't duplicate · Pause at decisions only  │
-│             Minimize interruptions · Choose format for decisions │
+│ Auto-chain · State to file · Rich re-entry · Delegate          │
+│ Pause at decisions only · Minimize interruptions               │
 └────────────────────────────────────────────────────────────────┘
 ```
@@ -0,0 +1,181 @@
+# Existing Code Workflow
+
+Workflow for projects with an existing codebase. Starts with documentation, produces test specs, decomposes and implements tests, refactors with that safety net, then adds new functionality and deploys.
+
+## Step Reference Table
+
+| Step | Name                    | Sub-Skill                       | Internal SubSteps                     |
+|------|-------------------------|---------------------------------|---------------------------------------|
+| —    | Document (pre-step)     | document/SKILL.md               | Steps 1–8                             |
+| 2b   | Blackbox Test Spec      | blackbox-test-spec/SKILL.md     | Phase 1a–1b                           |
+| 2c   | Decompose Tests         | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4             |
+| 2d   | Implement Tests         | implement/SKILL.md              | (batch-driven, no fixed sub-steps)    |
+| 2e   | Refactor                | refactor/SKILL.md               | Phases 0–5 (6-phase method)           |
+| 2f   | New Task                | new-task/SKILL.md               | Steps 1–8 (loop)                      |
+| 2g   | Implement               | implement/SKILL.md              | (batch-driven, no fixed sub-steps)    |
+| 2h   | Run Tests               | (autopilot-managed)             | Unit tests → Integration/blackbox tests |
+| 2i   | Deploy                  | deploy/SKILL.md                 | Steps 1–7                             |
+
+After Step 2i, the existing-code workflow is complete.
+
+## Detection Rules
+
+Check rules in order — first match wins.
+
+---
+
+**Pre-Step — Existing Codebase Detection**
+Condition: `_docs/` does not exist AND the workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`, `src/`, `Cargo.toml`, `*.csproj`, `package.json`)
+
+Action: An existing codebase without documentation was detected. Present using Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Existing codebase detected
+══════════════════════════════════════
+ A) Start fresh — define the problem from scratch (greenfield workflow)
+ B) Document existing codebase first — run /document to reverse-engineer docs, then continue
+══════════════════════════════════════
+ Recommendation: B — the /document skill analyzes your code
+ bottom-up and produces _docs/ artifacts automatically,
+ then you can continue with test specs, refactor, and new features.
+══════════════════════════════════════
+```
+
+- If user picks A → proceed to Step 0 (Problem Gathering) in the greenfield flow
+- If user picks B → read and execute `.cursor/skills/document/SKILL.md`. After document skill completes, re-detect state (the produced `_docs/` artifacts will place the project at Step 2b or later).
+
+---
+
+**Step 2b — Blackbox Test Spec**
+Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/integration_tests/traceability_matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry)
+
+Action: Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`
+
+This step applies when the codebase was documented via the `/document` skill. Test specifications must be produced before refactoring or further development.
+
+---
+
+**Step 2c — Decompose Tests**
+Condition: `_docs/02_document/integration_tests/traceability_matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files)
+
+Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/integration_tests/` as input). The decompose skill will:
+1. Run Step 1t (test infrastructure bootstrap)
+2. Run Step 3 (integration test task decomposition)
+3. Run Step 4 (cross-verification against test coverage)
+
+If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
+
+---
+
+**Step 2d — Implement Tests**
+Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND the autopilot state shows Step 2c (Decompose Tests) is completed AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
+
+Action: Read and execute `.cursor/skills/implement/SKILL.md`
+
+The implement skill reads test tasks from `_docs/02_tasks/` and implements them.
+
+If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
+
+---
+
+**Step 2e — Refactor**
+Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactor/FINAL_refactor_report.md` does not exist
+
+Action: Read and execute `.cursor/skills/refactor/SKILL.md`
+
+The refactor skill runs the full 6-phase method using the implemented tests as a safety net.
+
+If `_docs/04_refactor/` has phase reports, the refactor skill detects completed phases and continues.
+
+---
+
+**Step 2f — New Task**
+Condition: `_docs/04_refactor/FINAL_refactor_report.md` exists AND the autopilot state shows Step 2e (Refactor) is completed AND the autopilot state does NOT show Step 2f (New Task) as completed
+
+Action: Read and execute `.cursor/skills/new-task/SKILL.md`
+
+The new-task skill interactively guides the user through defining new functionality. It loops until the user is done adding tasks. New task files are written to `_docs/02_tasks/`.
+
+---
+
+**Step 2g — Implement**
+Condition: the autopilot state shows Step 2f (New Task) is completed AND `_docs/03_implementation/` does not contain a FINAL report covering the new tasks (check state for distinction between test implementation and feature implementation)
+
+Action: Read and execute `.cursor/skills/implement/SKILL.md`
+
+The implement skill reads the new tasks from `_docs/02_tasks/` and implements them. Tasks already implemented in Step 2d are skipped (the implement skill tracks completed tasks in batch reports).
+
+If `_docs/03_implementation/` has batch reports from this phase, the implement skill detects completed tasks and continues.
+
+---
+
+**Step 2h — Run Tests**
+Condition: the autopilot state shows Step 2g (Implement) is completed AND the autopilot state does NOT show Step 2h (Run Tests) as completed
+
+Action: Run the full test suite to verify the implementation before deployment.
+
+1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
+2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite
+3. **Report results**: present a summary of passed/failed/skipped tests
+
+If all tests pass → auto-chain to Step 2i (Deploy).
+
+If tests fail → present using Choose format:
+
+```
+══════════════════════════════════════
+ TEST RESULTS: [N passed, M failed, K skipped]
+══════════════════════════════════════
+ A) Fix failing tests and re-run
+ B) Proceed to deploy anyway (not recommended)
+ C) Abort — fix manually
+══════════════════════════════════════
+ Recommendation: A — fix failures before deploying
+══════════════════════════════════════
+```
+
+---
+
+**Step 2i — Deploy**
+Condition: the autopilot state shows Step 2h (Run Tests) is completed AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Read and execute `.cursor/skills/deploy/SKILL.md`
+
+After deployment completes, the existing-code workflow is done.
+
+---
+
+**Re-Entry After Completion**
+Condition: the autopilot state shows `step: done` OR all steps through 2i (Deploy) are completed
+
+Action: The project completed a full cycle. Present status and loop back to New Task:
+
+```
+══════════════════════════════════════
+ PROJECT CYCLE COMPLETE
+══════════════════════════════════════
+ The previous cycle finished successfully.
+ You can now add new functionality.
+══════════════════════════════════════
+ A) Add new features (start New Task)
+ B) Done — no more changes needed
+══════════════════════════════════════
+```
+
+- If user picks A → set `step: 2f`, `status: not_started` in the state file, then auto-chain to Step 2f (New Task). Previous cycle history stays in Completed Steps.
+- If user picks B → report final project status and exit.
+
+## Auto-Chain Rules
+
+| Completed Step | Next Action |
+|---------------|-------------|
+| Document (existing code) | Auto-chain → Blackbox Test Spec (Step 2b) |
+| Blackbox Test Spec (Step 2b) | Auto-chain → Decompose Tests (Step 2c) |
+| Decompose Tests (Step 2c) | **Session boundary** — suggest new conversation before Implement Tests |
+| Implement Tests (Step 2d) | Auto-chain → Refactor (Step 2e) |
+| Refactor (Step 2e) | Auto-chain → New Task (Step 2f) |
+| New Task (Step 2f) | **Session boundary** — suggest new conversation before Implement |
+| Implement (Step 2g) | Auto-chain → Run Tests (Step 2h) |
+| Run Tests (Step 2h, all pass) | Auto-chain → Deploy (Step 2i) |
+| Deploy (Step 2i) | **Workflow complete** — existing-code flow done |
@@ -0,0 +1,146 @@
+# Greenfield Workflow
+
+Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → Decompose → Implement → Run Tests → Deploy.
+
+## Step Reference Table
+
+| Step | Name      | Sub-Skill              | Internal SubSteps                     |
+|------|-----------|------------------------|---------------------------------------|
+| 0    | Problem   | problem/SKILL.md       | Phase 1–4                             |
+| 1    | Research  | research/SKILL.md      | Mode A: Phase 1–4 · Mode B: Step 0–8 |
+| 2    | Plan      | plan/SKILL.md          | Step 1–6                              |
+| 3    | Decompose | decompose/SKILL.md     | Step 1–4                              |
+| 4    | Implement | implement/SKILL.md     | (batch-driven, no fixed sub-steps)    |
+| 5    | Run Tests | (autopilot-managed)    | Unit tests → Integration/blackbox tests |
+| 6    | Deploy    | deploy/SKILL.md        | Step 1–7                              |
+
+## Detection Rules
+
+Check rules in order — first match wins.
+
+---
+
+**Step 0 — Problem Gathering**
+Condition: `_docs/00_problem/` does not exist, OR any of these are missing/empty:
+- `problem.md`
+- `restrictions.md`
+- `acceptance_criteria.md`
+- `input_data/` (must contain at least one file)
+
+Action: Read and execute `.cursor/skills/problem/SKILL.md`
+
+---
+
+**Step 1 — Research (Initial)**
+Condition: `_docs/00_problem/` is complete AND `_docs/01_solution/` has no `solution_draft*.md` files
+
+Action: Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode A)
+
+---
+
+**Step 1b — Research Decision**
+Condition: `_docs/01_solution/` contains `solution_draft*.md` files AND `_docs/01_solution/solution.md` does not exist AND `_docs/02_document/architecture.md` does not exist
+
+Action: Present the current research state to the user:
+- How many solution drafts exist
+- Whether tech_stack.md and security_analysis.md exist
+- One-line summary from the latest draft
+
+Then present using the **Choose format**:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Research complete — next action?
+══════════════════════════════════════
+ A) Run another research round (Mode B assessment)
+ B) Proceed to planning with current draft
+══════════════════════════════════════
+ Recommendation: [A or B] — [reason based on draft quality]
+══════════════════════════════════════
+```
+
+- If user picks A → Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode B)
+- If user picks B → auto-chain to Step 2 (Plan)
+
+---
+
+**Step 2 — Plan**
+Condition: `_docs/01_solution/` has `solution_draft*.md` files AND `_docs/02_document/architecture.md` does not exist
+
+Action:
+1. The plan skill's Prereq 2 will rename the latest draft to `solution.md` — this is handled by the plan skill itself
+2. Read and execute `.cursor/skills/plan/SKILL.md`
+
+If `_docs/02_document/` exists but is incomplete (has some artifacts but no `FINAL_report.md`), the plan skill's built-in resumability handles it.
+
+---
+
+**Step 3 — Decompose**
+Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_tasks/` does not exist or has no task files (excluding `_dependencies_table.md`) AND (workspace has no source code files OR the user explicitly chose normal workflow in Step 2c)
+
+Action: Read and execute `.cursor/skills/decompose/SKILL.md`
+
+If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
+
+---
+
+**Step 4 — Implement**
+Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
+
+Action: Read and execute `.cursor/skills/implement/SKILL.md`
+
+If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
+
+---
+
+**Step 5 — Run Tests**
+Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state does NOT show Step 5 (Run Tests) as completed AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Run the full test suite to verify the implementation before deployment.
+
+1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
+2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite
+3. **Report results**: present a summary of passed/failed/skipped tests
+
+If all tests pass → auto-chain to Step 6 (Deploy).
+
+If tests fail → present using Choose format:
+
+```
+══════════════════════════════════════
+ TEST RESULTS: [N passed, M failed, K skipped]
+══════════════════════════════════════
+ A) Fix failing tests and re-run
+ B) Proceed to deploy anyway (not recommended)
+ C) Abort — fix manually
+══════════════════════════════════════
+ Recommendation: A — fix failures before deploying
+══════════════════════════════════════
+```
+
+---
+
+**Step 6 — Deploy**
+Condition: the autopilot state shows Step 5 (Run Tests) is completed AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Read and execute `.cursor/skills/deploy/SKILL.md`
+
+---
+
+**Done**
+Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md)
+
+Action: Report project completion with summary. If the user runs autopilot again after greenfield completion, Flow Resolution rule 3 routes to the existing-code flow (re-entry after completion) so they can add new features.
+
+## Auto-Chain Rules
+
+| Completed Step | Next Action |
+|---------------|-------------|
+| Problem Gathering | Auto-chain → Research (Mode A) |
+| Research (any round) | Auto-chain → Research Decision (ask user: another round or proceed?) |
+| Research Decision → proceed | Auto-chain → Plan |
+| Plan | Auto-chain → Decompose |
+| Decompose | **Session boundary** — suggest new conversation before Implement |
+| Implement | Auto-chain → Run Tests (Step 5) |
+| Run Tests (all pass) | Auto-chain → Deploy (Step 6) |
+| Deploy | Report completion |
@@ -0,0 +1,158 @@
+# Autopilot Protocols
+
+## User Interaction Protocol
+
+Every time the autopilot or a sub-skill needs a user decision, use the **Choose A / B / C / D** format. This applies to:
+
+- State transitions where multiple valid next actions exist
+- Sub-skill BLOCKING gates that require user judgment
+- Any fork where the autopilot cannot confidently pick the right path
+- Trade-off decisions (tech choices, scope, risk acceptance)
+
+### When to Ask (MUST ask)
+
+- The next action is ambiguous (e.g., "another research round or proceed?")
+- The decision has irreversible consequences (e.g., architecture choices, skipping a step)
+- The user's intent or preference cannot be inferred from existing artifacts
+- A sub-skill's BLOCKING gate explicitly requires user confirmation
+- Multiple valid approaches exist with meaningfully different trade-offs
+
+### When NOT to Ask (auto-transition)
+
+- Only one logical next step exists (e.g., Problem complete → Research is the only option)
+- The transition is deterministic from the state (e.g., Plan complete → Decompose)
+- The decision is low-risk and reversible
+- Existing artifacts or prior decisions already imply the answer
+
+### Choice Format
+
+Always present decisions in this format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: [brief context]
+══════════════════════════════════════
+ A) [Option A — short description]
+ B) [Option B — short description]
+ C) [Option C — short description, if applicable]
+ D) [Option D — short description, if applicable]
+══════════════════════════════════════
+ Recommendation: [A/B/C/D] — [one-line reason]
+══════════════════════════════════════
+```
+
+Rules:
+1. Always provide 2–4 concrete options (never open-ended questions)
+2. Always include a recommendation with a brief justification
+3. Keep option descriptions to one line each
+4. If only 2 options make sense, use A/B only — do not pad with filler options
+5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice
+6. Record every user decision in the state file's `Key Decisions` section
+7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
+
+## Jira MCP Authentication
+
+Several workflow steps create Jira artifacts (epics, tasks, links). The Jira MCP server must be authenticated **before** any step that writes to Jira.
+
+### Steps That Require Jira MCP
+
+| Step | Sub-Step | Jira Action |
+|------|----------|-------------|
+| 2 (Plan) | Step 6 — Jira Epics | Create epics for each component |
+| 2c (Decompose Tests) | Step 1t + Step 3 — All test tasks | Create Jira ticket per task, link to epic |
+| 2f (New Task) | Step 7 — Jira ticket | Create Jira ticket per task, link to epic |
+| 3 (Decompose) | Step 1–3 — All tasks | Create Jira ticket per task, link to epic |
+
+### Authentication Gate
+
+Before entering **Step 2 (Plan)**, **Step 2c (Decompose Tests)**, **Step 2f (New Task)**, or **Step 3 (Decompose)** for the first time, the autopilot must:
+
+1. Call `mcp_auth` on the Jira MCP server
+2. If authentication succeeds → proceed normally
+3. If the user **skips** or authentication fails → present using Choose format:
+
+```
+══════════════════════════════════════
+ Jira MCP authentication failed
+══════════════════════════════════════
+ A) Retry authentication (retry mcp_auth)
+ B) Continue without Jira (tasks saved locally only)
+══════════════════════════════════════
+ Recommendation: A — Jira IDs drive task referencing,
+ dependency tracking, and implementation batching.
+ Without Jira, task files use numeric prefixes instead.
+══════════════════════════════════════
+```
+
+If user picks **B** (continue without Jira):
+- Set a flag in the state file: `jira_enabled: false`
+- All skills that would create Jira tickets instead save metadata locally in the task/epic files with `Jira: pending` status
+- Task files keep numeric prefixes (e.g., `01_initial_structure.md`) instead of Jira ID prefixes
+- The workflow proceeds normally in all other respects
+
+### Re-Authentication
+
+If Jira MCP was already authenticated in a previous invocation (verify by listing available Jira tools beyond `mcp_auth`), skip the auth gate.
+
+## Error Handling
+
+All error situations that require user input MUST use the **Choose A / B / C / D** format.
+
+| Situation | Action |
+|-----------|--------|
+| State detection is ambiguous (artifacts suggest two different steps) | Present findings and use Choose format with the candidate steps as options |
+| Sub-skill fails or hits an unrecoverable blocker | Use Choose format: A) retry, B) skip with warning, C) abort and fix manually |
+| User wants to skip a step | Use Choose format: A) skip (with dependency warning), B) execute the step |
+| User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
+| User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |
+
+## Status Summary
+
+On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback). Use the template matching the active flow (see Flow Resolution in SKILL.md).
+
+### Greenfield Flow
+
+```
+═══════════════════════════════════════════════════
+ AUTOPILOT STATUS (greenfield)
+═══════════════════════════════════════════════════
+ Step 0   Problem             [DONE / IN PROGRESS / NOT STARTED]
+ Step 1   Research            [DONE (N drafts) / IN PROGRESS / NOT STARTED]
+ Step 2   Plan                [DONE / IN PROGRESS / NOT STARTED]
+ Step 3   Decompose           [DONE (N tasks) / IN PROGRESS / NOT STARTED]
+ Step 4   Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
+ Step 5   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED]
+ Step 6   Deploy              [DONE / IN PROGRESS / NOT STARTED]
+═══════════════════════════════════════════════════
+ Current: Step N — Name
+ SubStep: M — [sub-skill internal step name]
+ Action:  [what will happen next]
+═══════════════════════════════════════════════════
+```
+
+### Existing Code Flow
+
+```
+═══════════════════════════════════════════════════
+ AUTOPILOT STATUS (existing-code)
+═══════════════════════════════════════════════════
+ Pre      Document            [DONE / IN PROGRESS / NOT STARTED]
+ Step 2b  Blackbox Test Spec  [DONE / IN PROGRESS / NOT STARTED]
+ Step 2c  Decompose Tests     [DONE (N tasks) / IN PROGRESS / NOT STARTED]
+ Step 2d  Implement Tests     [DONE / IN PROGRESS (batch M) / NOT STARTED]
+ Step 2e  Refactor            [DONE / IN PROGRESS (phase N) / NOT STARTED]
+ Step 2f  New Task            [DONE (N tasks) / IN PROGRESS / NOT STARTED]
+ Step 2g  Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
+ Step 2h  Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED]
+ Step 2i  Deploy              [DONE / IN PROGRESS / NOT STARTED]
+═══════════════════════════════════════════════════
+ Current: Step N — Name
+ SubStep: M — [sub-skill internal step name]
+ Action:  [what will happen next]
+═══════════════════════════════════════════════════
+```
+
+For re-entry (state file exists), also include:
+- Key decisions from the state file's `Key Decisions` section
+- Last session context from the `Last Session` section
+- Any blockers from the `Blockers` section
@@ -0,0 +1,102 @@
+# Autopilot State Management
+
+## State File: `_docs/_autopilot_state.md`
+
+The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
+
+### Format
+
+```markdown
+# Autopilot State
+
+## Current Step
+step: [0-6 or "2b" / "2c" / "2d" / "2e" / "2f" / "2g" / "2h" / "2i" or "done"]
+name: [Problem / Research / Plan / Blackbox Test Spec / Decompose Tests / Implement Tests / Refactor / New Task / Implement / Run Tests / Deploy / Decompose / Done]
+status: [not_started / in_progress / completed]
+sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
+
+## Step ↔ SubStep Reference
+(include the step reference table from the active flow file)
+
+When updating `Current Step`, always write it as:
+  step: N          ← autopilot step (0–6 or 2b/2c/2d/2e/2f/2g/2h/2i)
+  sub_step: M      ← sub-skill's own internal step/phase number + name
+Example:
+  step: 2
+  name: Plan
+  status: in_progress
+  sub_step: 4 — Architecture Review & Risk Assessment
+
+## Completed Steps
+
+| Step | Name | Completed | Key Outcome |
+|------|------|-----------|-------------|
+| 0 | Problem | [date] | [one-line summary] |
+| 1 | Research | [date] | [N drafts, final approach summary] |
+| ... | ... | ... | ... |
+
+## Key Decisions
+- [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
+- [decision N]
+
+## Last Session
+date: [date]
+ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
+reason: [completed step / session boundary / user paused / context limit]
+notes: [any context for next session]
+
+## Blockers
+- [blocker 1, if any]
+- [none]
+```
+
+### State File Rules
+
+1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 0)
+2. **Update** the state file after every step completion, every session boundary, and every BLOCKING gate confirmation
+3. **Read** the state file as the first action on every invocation — before folder scanning
+4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 2 but `_docs/02_document/architecture.md` already exists), trust the folder structure and update the state file to match
+5. **Never delete** the state file. It accumulates history across the entire project lifecycle
+
+## State Detection
+
+Read `_docs/_autopilot_state.md` first. If it exists and is consistent with the folder structure, use the `Current Step` from the state file. If the state file doesn't exist or is inconsistent, fall back to folder scanning.
+
+### Folder Scan Rules (fallback)
+
+Scan `_docs/` to determine the current workflow position. The detection rules are defined in each flow file (`flows/greenfield.md` and `flows/existing-code.md`). Check the existing-code flow first (Pre-Step detection), then greenfield flow rules. First match wins.
+
+## Re-Entry Protocol
+
+When the user invokes `/autopilot` and work already exists:
+
+1. Read `_docs/_autopilot_state.md`
+2. Cross-check against `_docs/` folder structure
+3. Present Status Summary with context from state file (key decisions, last session, blockers)
+4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
+5. Continue execution from detected state
+
+## Session Boundaries
+
+After any decompose/planning step completes (Step 2c, Step 2f, or Step 3), **do not auto-chain to implement**. Instead:
+
+1. Update state file: mark the step as completed, set current step to the next implement step with status `not_started`
+   - After Step 2c (Decompose Tests) → set current step to 2d (Implement Tests)
+   - After Step 2f (New Task) → set current step to 2g (Implement)
+   - After Step 3 (Decompose) → set current step to 4 (Implement)
+2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
+3. Present a summary: number of tasks, estimated batches, total complexity points
+4. Use Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Decompose complete — start implementation?
+══════════════════════════════════════
+ A) Start a new conversation for implementation (recommended for context freshness)
+ B) Continue implementation in this conversation
+══════════════════════════════════════
+ Recommendation: A — implementation is the longest phase, fresh context helps
+══════════════════════════════════════
+```
+
+These are the only hard session boundaries. All other transitions auto-chain.
@@ -0,0 +1,218 @@
+---
+name: blackbox-test-spec
+description: |
+  Black-box integration test specification skill. Analyzes input data completeness and produces
+  detailed E2E test scenarios (functional + non-functional) that treat the system as a black box.
+  2-phase workflow: input data completeness analysis, then test scenario specification.
+  Produces 5 artifacts under integration_tests/.
+  Trigger phrases:
+  - "blackbox test spec", "black box tests", "integration test spec"
+  - "test specification", "e2e test spec"
+  - "test scenarios", "black box scenarios"
+category: build
+tags: [testing, black-box, integration-tests, e2e, test-specification, qa]
+disable-model-invocation: true
+---
+
+# Black-Box Test Scenario Specification
+
+Analyze input data completeness and produce detailed black-box integration test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
+
+## Core Principles
+
+- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
+- **Traceability**: every test traces to at least one acceptance criterion or restriction
+- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
+- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
+- **Spec, don't code**: this workflow produces test specifications, never test implementation code
+
+## Context Resolution
+
+Fixed paths — no mode detection needed:
+
+- PROBLEM_DIR: `_docs/00_problem/`
+- SOLUTION_DIR: `_docs/01_solution/`
+- DOCUMENT_DIR: `_docs/02_document/`
+- TESTS_OUTPUT_DIR: `_docs/02_document/integration_tests/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+| File | Purpose |
+|------|---------|
+| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations |
+| `_docs/00_problem/input_data/` | Reference data examples |
+| `_docs/01_solution/solution.md` | Finalized solution |
+
+### Optional Files (used when available)
+
+| File | Purpose |
+|------|---------|
+| `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
+| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
+| `DOCUMENT_DIR/components/` | Component specs for interface identification |
+
+### Prerequisite Checks (BLOCKING)
+
+1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
+2. `restrictions.md` exists and is non-empty — **STOP if missing**
+3. `input_data/` exists and contains at least one file — **STOP if missing**
+4. `problem.md` exists and is non-empty — **STOP if missing**
+5. `solution.md` exists and is non-empty — **STOP if missing**
+6. Create TESTS_OUTPUT_DIR if it does not exist
+7. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**
+
+## Artifact Management
+
+### Directory Structure
+
+```
+TESTS_OUTPUT_DIR/
+├── environment.md
+├── test_data.md
+├── functional_tests.md
+├── non_functional_tests.md
+└── traceability_matrix.md
+```
+
+### Save Timing
+
+| Phase | Save immediately after | Filename |
+|-------|------------------------|----------|
+| Phase 1a | Input data analysis (no file — findings feed Phase 1b) | — |
+| Phase 1b | Environment spec | `environment.md` |
+| Phase 1b | Test data spec | `test_data.md` |
+| Phase 1b | Functional tests | `functional_tests.md` |
+| Phase 1b | Non-functional tests | `non_functional_tests.md` |
+| Phase 1b | Traceability matrix | `traceability_matrix.md` |
+
+### Resumability
+
+If TESTS_OUTPUT_DIR already contains files:
+
+1. List existing files and match them to the save timing table above
+2. Identify which phase/artifacts are complete
+3. Resume from the next incomplete artifact
+4. Inform the user which artifacts are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with both phases. Update status as each phase completes.
+
+## Workflow
+
+### Phase 1a: Input Data Completeness Analysis
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
+**Constraints**: Analysis only — no test specs yet
+
+1. Read `_docs/01_solution/solution.md`
+2. Read `acceptance_criteria.md`, `restrictions.md`
+3. Read testing strategy from solution.md (if present)
+4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
+5. Analyze `input_data/` contents against:
+   - Coverage of acceptance criteria scenarios
+   - Coverage of restriction edge cases
+   - Coverage of testing strategy requirements
+6. Threshold: at least 70% coverage of the scenarios
+7. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/`
+8. Present coverage assessment to user
+
+**BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient.
+
+---
+
+### Phase 1b: Black-Box Test Scenario Specification
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Produce detailed black-box test specifications covering functional and non-functional scenarios
+**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
+
+Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
+
+1. Define test environment using `.cursor/skills/plan/templates/integration-environment.md` as structure
+2. Define test data management using `.cursor/skills/plan/templates/integration-test-data.md` as structure
+3. Write functional test scenarios (positive + negative) using `.cursor/skills/plan/templates/integration-functional-tests.md` as structure
+4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `.cursor/skills/plan/templates/integration-non-functional-tests.md` as structure
+5. Build traceability matrix using `.cursor/skills/plan/templates/integration-traceability-matrix.md` as structure
+
+**Self-verification**:
+- [ ] Every acceptance criterion is covered by at least one test scenario
+- [ ] Every restriction is verified by at least one test scenario
+- [ ] Positive and negative scenarios are balanced
+- [ ] Consumer app has no direct access to system internals
+- [ ] Docker environment is self-contained (`docker compose up` sufficient)
+- [ ] External dependencies have mock/stub services defined
+- [ ] Traceability matrix has no uncovered AC or restrictions
+
+**Save action**: Write all files under TESTS_OUTPUT_DIR:
+- `environment.md`
+- `test_data.md`
+- `functional_tests.md`
+- `non_functional_tests.md`
+- `traceability_matrix.md`
+
+**BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed.
+
+Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
+
+---
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
+| Ambiguous requirements | ASK user |
+| Input data coverage below 70% | Search internet for supplementary data, ASK user to validate |
+| Test scenario conflicts with restrictions | ASK user to clarify intent |
+| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
+
+## Common Mistakes
+
+- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
+- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
+- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
+- **Untraceable tests**: every test should trace to at least one AC or restriction
+- **Writing test code**: this skill produces specifications, never implementation code
+
+## Trigger Conditions
+
+When the user wants to:
+- Specify black-box integration tests before implementation or refactoring
+- Analyze input data completeness for test coverage
+- Produce E2E test scenarios from acceptance criteria
+
+**Keywords**: "blackbox test spec", "black box tests", "integration test spec", "test specification", "e2e test spec", "test scenarios"
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│       Black-Box Test Scenario Specification (2-Phase)           │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: Data Gate (BLOCKING)                                    │
+│   → verify AC, restrictions, input_data, solution exist         │
+│                                                                │
+│ Phase 1a: Input Data Completeness Analysis                      │
+│   → assess input_data/ coverage vs AC scenarios (≥70%)          │
+│   [BLOCKING: user confirms input data coverage]                │
+│                                                                │
+│ Phase 1b: Black-Box Test Scenario Specification                 │
+│   → environment.md                                              │
+│   → test_data.md                                                │
+│   → functional_tests.md (positive + negative)                   │
+│   → non_functional_tests.md (perf, resilience, security, limits)│
+│   → traceability_matrix.md                                      │
+│   [BLOCKING: user confirms test coverage]                      │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Black-box only · Traceability · Save immediately    │
+│             Ask don't assume · Spec don't code                  │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -3,11 +3,12 @@ name: decompose
 description: |
  Decompose planned components into atomic implementable tasks with bootstrap structure plan.
  4-step workflow: bootstrap structure plan, component task decomposition, integration test task decomposition, and cross-task verification.
-  Supports full decomposition (_docs/ structure) and single component mode.
+  Supports full decomposition (_docs/ structure), single component mode, and tests-only mode.
  Trigger phrases:
  - "decompose", "decompose features", "feature decomposition"
  - "task decomposition", "break down components"
  - "prepare for implementation"
+  - "decompose tests", "test decomposition"
 category: build
 tags: [decomposition, tasks, dependencies, jira, implementation-prep]
 disable-model-invocation: true
@@ -44,6 +45,14 @@ Determine the operating mode based on invocation before any other logic runs.
 - Ask user for the parent Epic ID
 - Runs Step 2 (that component only, appending to existing task numbering)

+**Tests-only mode** (provided file/directory is within `integration_tests/`, or `DOCUMENT_DIR/integration_tests/` exists and input explicitly requests test decomposition):
+- DOCUMENT_DIR: `_docs/02_document/`
+- TASKS_DIR: `_docs/02_tasks/`
+- TESTS_DIR: `DOCUMENT_DIR/integration_tests/`
+- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR
+- Runs Step 1t (test infrastructure bootstrap) + Step 3 (integration test decomposition) + Step 4 (cross-verification against test coverage)
+- Skips Step 1 (project bootstrap) and Step 2 (component decomposition) — the codebase already exists
+
 Announce the detected mode and resolved paths to the user before proceeding.

 ## Input Specification
@@ -70,6 +79,19 @@ Announce the detected mode and resolved paths to the user before proceeding.
 | The provided component `description.md` | Component spec to decompose |
 | Corresponding `tests.md` in the same directory (if available) | Test specs for context |

+**Tests-only mode:**
+
+| File | Purpose |
+|------|---------|
+| `TESTS_DIR/environment.md` | Test environment specification (Docker services, networks, volumes) |
+| `TESTS_DIR/test_data.md` | Test data management (seed data, mocks, isolation) |
+| `TESTS_DIR/functional_tests.md` | Functional test scenarios (positive + negative) |
+| `TESTS_DIR/non_functional_tests.md` | Non-functional test scenarios (perf, resilience, security, limits) |
+| `TESTS_DIR/traceability_matrix.md` | AC/restriction coverage mapping |
+| `_docs/00_problem/problem.md` | Problem context |
+| `_docs/00_problem/restrictions.md` | Constraints for test design |
+| `_docs/00_problem/acceptance_criteria.md` | Acceptance criteria being verified |
+
 ### Prerequisite Checks (BLOCKING)

 **Default:**
@@ -80,6 +102,12 @@ Announce the detected mode and resolved paths to the user before proceeding.
 **Single component mode:**
 1. The provided component file exists and is non-empty — **STOP if missing**

+**Tests-only mode:**
+1. `TESTS_DIR/functional_tests.md` exists and is non-empty — **STOP if missing**
+2. `TESTS_DIR/environment.md` exists — **STOP if missing**
+3. Create TASKS_DIR if it does not exist
+4. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?**
+
 ## Artifact Management

 ### Directory Structure
@@ -100,6 +128,7 @@ TASKS_DIR/
 | Step | Save immediately after | Filename |
 |------|------------------------|----------|
 | Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` |
+| Step 1t | Test infrastructure bootstrap complete + Jira ticket created + file renamed | `[JIRA-ID]_test_infrastructure.md` |
 | Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
 | Step 3 | Each integration test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
 | Step 4 | Cross-task verification complete | `_dependencies_table.md` |
@@ -118,6 +147,42 @@ At the start of execution, create a TodoWrite with all applicable steps. Update

 ## Workflow

+### Step 1t: Test Infrastructure Bootstrap (tests-only mode only)
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Produce `01_test_infrastructure.md` — the first task describing the test project scaffold
+**Constraints**: This is a plan document, not code. The `/implement` skill executes it.
+
+1. Read `TESTS_DIR/environment.md` and `TESTS_DIR/test_data.md`
+2. Read problem.md, restrictions.md, acceptance_criteria.md for domain context
+3. Document the test infrastructure plan using `templates/test-infrastructure-task.md`
+
+The test infrastructure bootstrap must include:
+- Test project folder layout (`e2e/` directory structure)
+- Mock/stub service definitions for each external dependency
+- `docker-compose.test.yml` structure from environment.md
+- Test runner configuration (framework, plugins, fixtures)
+- Test data fixture setup from test_data.md seed data sets
+- Test reporting configuration (format, output path)
+- Data isolation strategy
+
+**Self-verification**:
+- [ ] Every external dependency from environment.md has a mock service defined
+- [ ] Docker Compose structure covers all services from environment.md
+- [ ] Test data fixtures cover all seed data sets from test_data.md
+- [ ] Test runner configuration matches the consumer app tech stack from environment.md
+- [ ] Data isolation strategy is defined
+
+**Save action**: Write `01_test_infrastructure.md` (temporary numeric name)
+
+**Jira action**: Create a Jira ticket for this task under the "Integration Tests" epic. Write the Jira ticket ID and Epic ID back into the task header.
+
+**Rename action**: Rename the file from `01_test_infrastructure.md` to `[JIRA-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename.
+
+**BLOCKING**: Present test infrastructure plan summary to user. Do NOT proceed until user confirms.
+
+---
+
 ### Step 1: Bootstrap Structure Plan (default mode only)

 **Role**: Professional software architect
@@ -166,7 +231,7 @@ The bootstrap structure plan must include:

 ---

-### Step 2: Task Decomposition (all modes)
+### Step 2: Task Decomposition (default and single component modes)

 **Role**: Professional software architect
 **Goal**: Decompose each component into atomic, implementable task specs — numbered sequentially starting from 02
@@ -200,18 +265,22 @@ For each component (or the single provided component):

 ---

-### Step 3: Integration Test Task Decomposition (default mode only)
+### Step 3: Integration Test Task Decomposition (default and tests-only modes)

 **Role**: Professional Quality Assurance Engineer
 **Goal**: Decompose integration test specs into atomic, implementable task specs
 **Constraints**: Behavioral specs only — describe what, not how. No test code.

-**Numbering**: Continue sequential numbering from where Step 2 left off.
+**Numbering**:
+- In default mode: continue sequential numbering from where Step 2 left off.
+- In tests-only mode: start from 02 (01 is the test infrastructure bootstrap from Step 1t).

 1. Read all test specs from `DOCUMENT_DIR/integration_tests/` (functional_tests.md, non_functional_tests.md)
 2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
 3. Each task should reference the specific test scenarios it implements and the environment/test_data specs
-4. Dependencies: integration test tasks depend on the component implementation tasks they exercise
+4. Dependencies:
+   - In default mode: integration test tasks depend on the component implementation tasks they exercise
+   - In tests-only mode: integration test tasks depend on the test infrastructure bootstrap task (Step 1t)
 5. Write each task spec using `templates/task.md`
 6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
 7. Note task dependencies (referencing Jira IDs of already-created dependency tasks)
@@ -221,31 +290,41 @@ For each component (or the single provided component):
 - [ ] Every functional test scenario from `integration_tests/functional_tests.md` is covered by a task
 - [ ] Every non-functional test scenario from `integration_tests/non_functional_tests.md` is covered by a task
 - [ ] No task exceeds 5 complexity points
- [ ] Dependencies correctly reference the component tasks being tested
+- [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode)
 - [ ] Every task has a Jira ticket linked to the "Integration Tests" epic

 **Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`.

 ---

-### Step 4: Cross-Task Verification (default mode only)
+### Step 4: Cross-Task Verification (default and tests-only modes)

 **Role**: Professional software architect and analyst
 **Goal**: Verify task consistency and produce `_dependencies_table.md`
 **Constraints**: Review step — fix gaps found, do not add new tasks

 1. Verify task dependencies across all tasks are consistent
-2. Check no gaps: every interface in architecture.md has tasks covering it
-3. Check no overlaps: tasks don't duplicate work across components
+2. Check no gaps:
+   - In default mode: every interface in architecture.md has tasks covering it
+   - In tests-only mode: every test scenario in `traceability_matrix.md` is covered by a task
+3. Check no overlaps: tasks don't duplicate work
 4. Check no circular dependencies in the task graph
 5. Produce `_dependencies_table.md` using `templates/dependencies-table.md`

 **Self-verification**:
+
+Default mode:
 - [ ] Every architecture interface is covered by at least one task
 - [ ] No circular dependencies in the task graph
 - [ ] Cross-component dependencies are explicitly noted in affected task specs
 - [ ] `_dependencies_table.md` contains every task with correct dependencies

+Tests-only mode:
+- [ ] Every test scenario from traceability_matrix.md "Covered" entries has a corresponding task
+- [ ] No circular dependencies in the task graph
+- [ ] Test task dependencies reference the test infrastructure bootstrap
+- [ ] `_dependencies_table.md` contains every task with correct dependencies
+
 **Save action**: Write `_dependencies_table.md`

 **BLOCKING**: Present dependency summary to user. Do NOT proceed until user confirms.
@@ -279,15 +358,27 @@ For each component (or the single provided component):

 ```
 ┌────────────────────────────────────────────────────────────────┐
-│          Task Decomposition (4-Step Method)                     │
+│          Task Decomposition (Multi-Mode)                        │
 ├────────────────────────────────────────────────────────────────┤
-│ CONTEXT: Resolve mode (default / single component)             │
-│ 1. Bootstrap Structure  → [JIRA-ID]_initial_structure.md       │
-│    [BLOCKING: user confirms structure]                         │
-│ 2. Component Tasks      → [JIRA-ID]_[short_name].md each      │
-│ 3. Integration Tests    → [JIRA-ID]_[short_name].md each      │
-│ 4. Cross-Verification   → _dependencies_table.md              │
-│    [BLOCKING: user confirms dependencies]                      │
+│ CONTEXT: Resolve mode (default / single component / tests-only)│
+│                                                                │
+│ DEFAULT MODE:                                                   │
+│  1.  Bootstrap Structure  → [JIRA-ID]_initial_structure.md     │
+│      [BLOCKING: user confirms structure]                       │
+│  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
+│  3.  Integration Tests    → [JIRA-ID]_[short_name].md each    │
+│  4.  Cross-Verification   → _dependencies_table.md            │
+│      [BLOCKING: user confirms dependencies]                    │
+│                                                                │
+│ TESTS-ONLY MODE:                                                │
+│  1t. Test Infrastructure  → [JIRA-ID]_test_infrastructure.md   │
+│      [BLOCKING: user confirms test scaffold]                   │
+│  3.  Integration Tests    → [JIRA-ID]_[short_name].md each    │
+│  4.  Cross-Verification   → _dependencies_table.md            │
+│      [BLOCKING: user confirms dependencies]                    │
+│                                                                │
+│ SINGLE COMPONENT MODE:                                          │
+│  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
 ├────────────────────────────────────────────────────────────────┤
 │ Principles: Atomic tasks · Behavioral specs · Flat structure   │
 │   Jira inline · Rename to Jira ID · Save now · Ask don't assume│
@@ -0,0 +1,129 @@
+# Test Infrastructure Task Template
+
+Use this template for the test infrastructure bootstrap (Step 1t in tests-only mode). Save as `TASKS_DIR/01_test_infrastructure.md` initially, then rename to `TASKS_DIR/[JIRA-ID]_test_infrastructure.md` after Jira ticket creation.
+
+---
+
+```markdown
+# Test Infrastructure
+
+**Task**: [JIRA-ID]_test_infrastructure
+**Name**: Test Infrastructure
+**Description**: Scaffold the E2E test project — test runner, mock services, Docker test environment, test data fixtures, reporting
+**Complexity**: [3|5] points
+**Dependencies**: None
+**Component**: Integration Tests
+**Jira**: [TASK-ID]
+**Epic**: [EPIC-ID]
+
+## Test Project Folder Layout
+
+```
+e2e/
+├── conftest.py
+├── requirements.txt
+├── Dockerfile
+├── mocks/
+│   ├── [mock_service_1]/
+│   │   ├── Dockerfile
+│   │   └── [entrypoint file]
+│   └── [mock_service_2]/
+│       ├── Dockerfile
+│       └── [entrypoint file]
+├── fixtures/
+│   └── [test data files]
+├── tests/
+│   ├── test_[category_1].py
+│   ├── test_[category_2].py
+│   └── ...
+└── docker-compose.test.yml
+```
+
+### Layout Rationale
+
+[Brief explanation of directory structure choices — framework conventions, separation of mocks from tests, fixture management]
+
+## Mock Services
+
+| Mock Service | Replaces | Endpoints | Behavior |
+|-------------|----------|-----------|----------|
+| [name] | [external service] | [endpoints it serves] | [response behavior, configurable via control API] |
+
+### Mock Control API
+
+Each mock service exposes a `POST /mock/config` endpoint for test-time behavior control (e.g., simulate downtime, inject errors). A `GET /mock/[resource]` endpoint returns recorded interactions for assertion.
+
+## Docker Test Environment
+
+### docker-compose.test.yml Structure
+
+| Service | Image / Build | Purpose | Depends On |
+|---------|--------------|---------|------------|
+| [system-under-test] | [build context] | Main system being tested | [mock services] |
+| [mock-1] | [build context] | Mock for [external service] | — |
+| [e2e-consumer] | [build from e2e/] | Test runner | [system-under-test] |
+
+### Networks and Volumes
+
+[Isolated test network, volume mounts for test data, model files, results output]
+
+## Test Runner Configuration
+
+**Framework**: [e.g., pytest]
+**Plugins**: [e.g., pytest-csv, sseclient-py, requests]
+**Entry point**: [e.g., pytest --csv=/results/report.csv]
+
+### Fixture Strategy
+
+| Fixture | Scope | Purpose |
+|---------|-------|---------|
+| [name] | [session/module/function] | [what it provides] |
+
+## Test Data Fixtures
+
+| Data Set | Source | Format | Used By |
+|----------|--------|--------|---------|
+| [name] | [volume mount / generated / API seed] | [format] | [test categories] |
+
+### Data Isolation
+
+[Strategy: fresh containers per run, volume cleanup, mock state reset]
+
+## Test Reporting
+
+**Format**: [e.g., CSV]
+**Columns**: [e.g., Test ID, Test Name, Execution Time (ms), Result, Error Message]
+**Output path**: [e.g., /results/report.csv → mounted to host]
+
+## Acceptance Criteria
+
+**AC-1: Test environment starts**
+Given the docker-compose.test.yml
+When `docker compose -f docker-compose.test.yml up` is executed
+Then all services start and the system-under-test is reachable
+
+**AC-2: Mock services respond**
+Given the test environment is running
+When the e2e-consumer sends requests to mock services
+Then mock services respond with configured behavior
+
+**AC-3: Test runner executes**
+Given the test environment is running
+When the e2e-consumer starts
+Then the test runner discovers and executes test files
+
+**AC-4: Test report generated**
+Given tests have been executed
+When the test run completes
+Then a report file exists at the configured output path with correct columns
+```
+
+---
+
+## Guidance Notes
+
+- This is a PLAN document, not code. The `/implement` skill executes it.
+- Focus on test infrastructure decisions, not individual test implementations.
+- Reference environment.md and test_data.md from the test specs — don't repeat everything.
+- Mock services must be deterministic: same input always produces same output.
+- The Docker environment must be self-contained: `docker compose up` sufficient.
@@ -45,13 +45,13 @@ Announce the resolved paths to the user before proceeding.

 ### Required Files

-| File | Purpose |
-|------|---------|
-| `_docs/00_problem/problem.md` | Problem description and context |
-| `_docs/00_problem/restrictions.md` | Constraints and limitations |
-| `_docs/01_solution/solution.md` | Finalized solution |
-| `DOCUMENT_DIR/architecture.md` | Architecture from plan skill |
-| `DOCUMENT_DIR/components/` | Component specs |
+| File | Purpose | Required |
+|------|---------|----------|
+| `_docs/00_problem/problem.md` | Problem description and context | Greenfield only |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations | Greenfield only |
+| `_docs/01_solution/solution.md` | Finalized solution | Greenfield only |
+| `DOCUMENT_DIR/architecture.md` | Architecture (from plan or document skill) | Always |
+| `DOCUMENT_DIR/components/` | Component specs | Always |

 ### Prerequisite Checks (BLOCKING)

@@ -0,0 +1,302 @@
+---
+name: new-task
+description: |
+  Interactive skill for adding new functionality to an existing codebase.
+  Guides the user through describing the feature, assessing complexity,
+  optionally running research, analyzing the codebase for insertion points,
+  validating assumptions with the user, and producing a task spec with Jira ticket.
+  Supports a loop — the user can add multiple tasks in one session.
+  Trigger phrases:
+  - "new task", "add feature", "new functionality"
+  - "I want to add", "new component", "extend"
+category: build
+tags: [task, feature, interactive, planning, jira]
+disable-model-invocation: true
+---
+
+# New Task (Interactive Feature Planning)
+
+Guide the user through defining new functionality for an existing codebase. Produces one or more task specifications with Jira tickets, optionally running deep research for complex features.
+
+## Core Principles
+
+- **User-driven**: every task starts with the user's description; never invent requirements
+- **Right-size research**: only invoke the research skill when the change is big enough to warrant it
+- **Validate before committing**: surface all assumptions and uncertainties to the user before writing the task file
+- **Save immediately**: write task files to disk as soon as they are ready; never accumulate unsaved work
+- **Ask, don't assume**: when scope, insertion point, or approach is unclear, STOP and ask the user
+
+## Context Resolution
+
+Fixed paths:
+
+- TASKS_DIR: `_docs/02_tasks/`
+- PLANS_DIR: `_docs/02_task_plans/`
+- DOCUMENT_DIR: `_docs/02_document/`
+- DEPENDENCIES_TABLE: `_docs/02_tasks/_dependencies_table.md`
+
+Create TASKS_DIR and PLANS_DIR if they don't exist.
+
+If TASKS_DIR already contains task files, scan them to determine the next numeric prefix for temporary file naming.
+
+## Workflow
+
+The skill runs as a loop. Each iteration produces one task. After each task the user chooses to add another or finish.
+
+---
+
+### Step 1: Gather Feature Description
+
+**Role**: Product analyst
+**Goal**: Get a clear, detailed description of the new functionality from the user.
+
+Ask the user:
+
+```
+══════════════════════════════════════
+ NEW TASK: Describe the functionality
+══════════════════════════════════════
+ Please describe in detail the new functionality you want to add:
+ - What should it do?
+ - Who is it for?
+ - Any specific requirements or constraints?
+══════════════════════════════════════
+```
+
+**BLOCKING**: Do NOT proceed until the user provides a description.
+
+Record the description verbatim for use in subsequent steps.
+
+---
+
+### Step 2: Analyze Complexity
+
+**Role**: Technical analyst
+**Goal**: Determine whether deep research is needed.
+
+Read the user's description and the existing codebase documentation from DOCUMENT_DIR (architecture.md, components/, system-flows.md).
+
+Assess the change along these dimensions:
+- **Scope**: how many components/files are affected?
+- **Novelty**: does it involve libraries, protocols, or patterns not already in the codebase?
+- **Risk**: could it break existing functionality or require architectural changes?
+
+Classification:
+
+| Category | Criteria | Action |
+|----------|----------|--------|
+| **Needs research** | New libraries/frameworks, unfamiliar protocols, significant architectural change, multiple unknowns | Proceed to Step 3 (Research) |
+| **Skip research** | Extends existing functionality, uses patterns already in codebase, straightforward new component with known tech | Skip to Step 4 (Codebase Analysis) |
+
+Present the assessment to the user:
+
+```
+══════════════════════════════════════
+ COMPLEXITY ASSESSMENT
+══════════════════════════════════════
+ Scope:   [low / medium / high]
+ Novelty: [low / medium / high]
+ Risk:    [low / medium / high]
+══════════════════════════════════════
+ Recommendation: [Research needed / Skip research]
+ Reason: [one-line justification]
+══════════════════════════════════════
+```
+
+**BLOCKING**: Ask the user to confirm or override the recommendation before proceeding.
+
+---
+
+### Step 3: Research (conditional)
+
+**Role**: Researcher
+**Goal**: Investigate unknowns before task specification.
+
+This step only runs if Step 2 determined research is needed.
+
+1. Create a problem description file at `PLANS_DIR/<task_slug>/problem.md` summarizing the feature request and the specific unknowns to investigate
+2. Invoke `.cursor/skills/research/SKILL.md` in standalone mode:
+   - INPUT_FILE: `PLANS_DIR/<task_slug>/problem.md`
+   - BASE_DIR: `PLANS_DIR/<task_slug>/`
+3. After research completes, read the solution draft from `PLANS_DIR/<task_slug>/01_solution/solution_draft01.md`
+4. Extract the key findings relevant to the task specification
+
+The `<task_slug>` is a short kebab-case name derived from the feature description (e.g., `auth-provider-integration`, `real-time-notifications`).
+
+---
+
+### Step 4: Codebase Analysis
+
+**Role**: Software architect
+**Goal**: Determine where and how to insert the new functionality.
+
+1. Read the codebase documentation from DOCUMENT_DIR:
+   - `architecture.md` — overall structure
+   - `components/` — component specs
+   - `system-flows.md` — data flows (if exists)
+   - `data_model.md` — data model (if exists)
+2. If research was performed (Step 3), incorporate findings
+3. Analyze and determine:
+   - Which existing components are affected
+   - Where new code should be inserted (which layers, modules, files)
+   - What interfaces need to change
+   - What new interfaces or models are needed
+   - How data flows through the change
+4. If the change is complex enough, read the actual source files (not just docs) to verify insertion points
+
+Present the analysis:
+
+```
+══════════════════════════════════════
+ CODEBASE ANALYSIS
+══════════════════════════════════════
+ Affected components: [list]
+ Insertion points:    [list of modules/layers]
+ Interface changes:   [list or "None"]
+ New interfaces:      [list or "None"]
+ Data flow impact:    [summary]
+══════════════════════════════════════
+```
+
+---
+
+### Step 5: Validate Assumptions
+
+**Role**: Quality gate
+**Goal**: Surface every uncertainty and get user confirmation.
+
+Review all decisions and assumptions made in Steps 2–4. For each uncertainty:
+1. State the assumption clearly
+2. Propose a solution or approach
+3. List alternatives if they exist
+
+Present using the Choose format for each decision that has meaningful alternatives:
+
+```
+══════════════════════════════════════
+ ASSUMPTION VALIDATION
+══════════════════════════════════════
+ 1. [Assumption]: [proposed approach]
+    Alternative: [other option, if any]
+ 2. [Assumption]: [proposed approach]
+    Alternative: [other option, if any]
+ ...
+══════════════════════════════════════
+ Please confirm or correct these assumptions.
+══════════════════════════════════════
+```
+
+**BLOCKING**: Do NOT proceed until the user confirms or corrects all assumptions.
+
+---
+
+### Step 6: Create Task
+
+**Role**: Technical writer
+**Goal**: Produce the task specification file.
+
+1. Determine the next numeric prefix by scanning TASKS_DIR for existing files
+2. Write the task file using `templates/task.md`:
+   - Fill all fields from the gathered information
+   - Set **Complexity** based on the assessment from Step 2
+   - Set **Dependencies** by cross-referencing existing tasks in TASKS_DIR
+   - Set **Jira** and **Epic** to `pending` (filled in Step 7)
+3. Save as `TASKS_DIR/[##]_[short_name].md`
+
+**Self-verification**:
+- [ ] Problem section clearly describes the user need
+- [ ] Acceptance criteria are testable (Gherkin format)
+- [ ] Scope boundaries are explicit
+- [ ] Complexity points match the assessment
+- [ ] Dependencies reference existing task Jira IDs where applicable
+- [ ] No implementation details leaked into the spec
+
+---
+
+### Step 7: Jira Ticket
+
+**Role**: Project coordinator
+**Goal**: Create a Jira ticket and link it to the task file.
+
+1. Create a Jira ticket for the task:
+   - Summary: the task's **Name** field
+   - Description: the task's **Problem** and **Acceptance Criteria** sections
+   - Story points: the task's **Complexity** value
+   - Link to the appropriate epic (ask user if unclear which epic)
+2. Write the Jira ticket ID and Epic ID back into the task file header:
+   - Update **Task** field: `[JIRA-ID]_[short_name]`
+   - Update **Jira** field: `[JIRA-ID]`
+   - Update **Epic** field: `[EPIC-ID]`
+3. Rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`
+
+If Jira MCP is not authenticated or unavailable:
+- Keep the numeric prefix
+- Set **Jira** to `pending`
+- Set **Epic** to `pending`
+- The task is still valid and can be implemented; Jira sync happens later
+
+---
+
+### Step 8: Loop Gate
+
+Ask the user:
+
+```
+══════════════════════════════════════
+ Task created: [JIRA-ID or ##] — [task name]
+══════════════════════════════════════
+ A) Add another task
+ B) Done — finish and update dependencies
+══════════════════════════════════════
+```
+
+- If **A** → loop back to Step 1
+- If **B** → proceed to Finalize
+
+---
+
+### Finalize
+
+After the user chooses **Done**:
+
+1. Update (or create) `TASKS_DIR/_dependencies_table.md` — add all newly created tasks to the dependencies table
+2. Present a summary of all tasks created in this session:
+
+```
+══════════════════════════════════════
+ NEW TASK SUMMARY
+══════════════════════════════════════
+ Tasks created: N
+ Total complexity: M points
+ ─────────────────────────────────────
+ [JIRA-ID] [name] ([complexity] pts)
+ [JIRA-ID] [name] ([complexity] pts)
+ ...
+══════════════════════════════════════
+```
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| User description is vague or incomplete | **ASK** for more detail — do not guess |
+| Unclear which epic to link to | **ASK** user for the epic |
+| Research skill hits a blocker | Follow research skill's own escalation rules |
+| Codebase analysis reveals conflicting architectures | **ASK** user which pattern to follow |
+| Complexity exceeds 5 points | **WARN** user and suggest splitting into multiple tasks |
+| Jira MCP unavailable | **WARN**, continue with local-only task files |
+
+## Trigger Conditions
+
+When the user wants to:
+- Add new functionality to an existing codebase
+- Plan a new feature or component
+- Create task specifications for upcoming work
+
+**Keywords**: "new task", "add feature", "new functionality", "extend", "I want to add"
+
+**Differentiation**:
+- User wants to decompose an existing plan into tasks → use `/decompose`
+- User wants to research a topic without creating tasks → use `/research`
+- User wants to refactor existing code → use `/refactor`
+- User wants to define and plan a new feature → use this skill
@@ -0,0 +1,113 @@
+# Task Specification Template
+
+Create a focused behavioral specification that describes **what** the system should do, not **how** it should be built.
+Save as `TASKS_DIR/[##]_[short_name].md` initially, then rename to `TASKS_DIR/[JIRA-ID]_[short_name].md` after Jira ticket creation.
+
+---
+
+```markdown
+# [Feature Name]
+
+**Task**: [JIRA-ID]_[short_name]
+**Name**: [short human name]
+**Description**: [one-line description of what this task delivers]
+**Complexity**: [1|2|3|5] points
+**Dependencies**: [AZ-43_shared_models, AZ-44_db_migrations] or "None"
+**Component**: [component name for context]
+**Jira**: [TASK-ID]
+**Epic**: [EPIC-ID]
+
+## Problem
+
+Clear, concise statement of the problem users are facing.
+
+## Outcome
+
+- Measurable or observable goal 1
+- Measurable or observable goal 2
+- ...
+
+## Scope
+
+### Included
+- What's in scope for this task
+
+### Excluded
+- Explicitly what's NOT in scope
+
+## Acceptance Criteria
+
+**AC-1: [Title]**
+Given [precondition]
+When [action]
+Then [expected result]
+
+**AC-2: [Title]**
+Given [precondition]
+When [action]
+Then [expected result]
+
+## Non-Functional Requirements
+
+**Performance**
+- [requirement if relevant]
+
+**Compatibility**
+- [requirement if relevant]
+
+**Reliability**
+- [requirement if relevant]
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | [test subject] | [expected result] |
+
+## Integration Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | [setup] | [test subject] | [expected behavior] | [NFR if any] |
+
+## Constraints
+
+- [Architectural pattern constraint if critical]
+- [Technical limitation]
+- [Integration requirement]
+
+## Risks & Mitigation
+
+**Risk 1: [Title]**
+- *Risk*: [Description]
+- *Mitigation*: [Approach]
+```
+
+---
+
+## Complexity Points Guide
+
+- 1 point: Trivial, self-contained, no dependencies
+- 2 points: Non-trivial, low complexity, minimal coordination
+- 3 points: Multi-step, moderate complexity, potential alignment needed
+- 5 points: Difficult, interconnected logic, medium-high risk
+- 8 points: Too complex — split into smaller tasks
+
+## Output Guidelines
+
+**DO:**
+- Focus on behavior and user experience
+- Use clear, simple language
+- Keep acceptance criteria testable (Gherkin format)
+- Include realistic scope boundaries
+- Write from the user's perspective
+- Include complexity estimation
+- Reference dependencies by Jira ID (e.g., AZ-43_shared_models)
+
+**DON'T:**
+- Include implementation details (file paths, classes, methods)
+- Prescribe technical solutions or libraries
+- Add architectural diagrams or code examples
+- Specify exact API endpoints or data structures
+- Include step-by-step implementation instructions
+- Add "how to build" guidance
@@ -35,9 +35,7 @@ Fixed paths — no mode detection needed:

 Announce the resolved paths to the user before proceeding.

-## Input Specification
-
-### Required Files
+## Required Files

 | File | Purpose |
 |------|---------|
@@ -47,115 +45,13 @@ Announce the resolved paths to the user before proceeding.
 | `_docs/00_problem/input_data/` | Reference data examples |
 | `_docs/01_solution/solution.md` | Finalized solution to decompose |

-### Prerequisite Checks (BLOCKING)
+## Prerequisites

-Run sequentially before any planning step:
-
-**Prereq 1: Data Gate**
-
-1. `_docs/00_problem/acceptance_criteria.md` exists and is non-empty — **STOP if missing**
-2. `_docs/00_problem/restrictions.md` exists and is non-empty — **STOP if missing**
-3. `_docs/00_problem/input_data/` exists and contains at least one data file — **STOP if missing**
-4. `_docs/00_problem/problem.md` exists and is non-empty — **STOP if missing**
-
-All four are mandatory. If any is missing or empty, STOP and ask the user to provide them. If the user cannot provide the required data, planning cannot proceed — just stop.
-
-**Prereq 2: Finalize Solution Draft**
-
-Only runs after the Data Gate passes:
-
-1. Scan `_docs/01_solution/` for files matching `solution_draft*.md`
-2. Identify the highest-numbered draft (e.g. `solution_draft06.md`)
-3. **Rename** it to `_docs/01_solution/solution.md`
-4. If `solution.md` already exists, ask the user whether to overwrite or keep existing
-5. Verify `solution.md` is non-empty — **STOP if missing or empty**
-
-**Prereq 3: Workspace Setup**
-
-1. Create DOCUMENT_DIR if it does not exist
-2. If DOCUMENT_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
+Read and follow `steps/00_prerequisites.md`. All three prerequisite checks are **BLOCKING** — do not start the workflow until they pass.

 ## Artifact Management

-### Directory Structure
-
-All artifacts are written directly under DOCUMENT_DIR:
-
-```
-DOCUMENT_DIR/
-├── integration_tests/
-│   ├── environment.md
-│   ├── test_data.md
-│   ├── functional_tests.md
-│   ├── non_functional_tests.md
-│   └── traceability_matrix.md
-├── architecture.md
-├── system-flows.md
-├── data_model.md
-├── deployment/
-│   ├── containerization.md
-│   ├── ci_cd_pipeline.md
-│   ├── environment_strategy.md
-│   ├── observability.md
-│   └── deployment_procedures.md
-├── risk_mitigations.md
-├── risk_mitigations_02.md          (iterative, ## as sequence)
-├── components/
-│   ├── 01_[name]/
-│   │   ├── description.md
-│   │   └── tests.md
-│   ├── 02_[name]/
-│   │   ├── description.md
-│   │   └── tests.md
-│   └── ...
-├── common-helpers/
-│   ├── 01_helper_[name]/
-│   ├── 02_helper_[name]/
-│   └── ...
-├── diagrams/
-│   ├── components.drawio
-│   └── flows/
-│       ├── flow_[name].md          (Mermaid)
-│       └── ...
-└── FINAL_report.md
-```
-
-### Save Timing
-
-| Step | Save immediately after | Filename |
-|------|------------------------|----------|
-| Step 1 | Integration test environment spec | `integration_tests/environment.md` |
-| Step 1 | Integration test data spec | `integration_tests/test_data.md` |
-| Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
-| Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
-| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
-| Step 2 | Architecture analysis complete | `architecture.md` |
-| Step 2 | System flows documented | `system-flows.md` |
-| Step 2 | Data model documented | `data_model.md` |
-| Step 2 | Deployment plan complete | `deployment/` (5 files) |
-| Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
-| Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
-| Step 3 | Diagrams generated | `diagrams/` |
-| Step 4 | Risk assessment complete | `risk_mitigations.md` |
-| Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
-| Step 6 | Epics created in Jira | Jira via MCP |
-| Final | All steps complete | `FINAL_report.md` |
-
-### Save Principles
-
-1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
-2. **Incremental updates**: same file can be updated multiple times; append or replace
-3. **Preserve process**: keep all intermediate files even after integration into final report
-4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
-
-### Resumability
-
-If DOCUMENT_DIR already contains artifacts:
-
-1. List existing files and match them to the save timing table above
-2. Identify the last completed step based on which artifacts exist
-3. Resume from the next incomplete step
-4. Inform the user which steps are being skipped
+Read `steps/01_artifact-management.md` for directory structure, save timing, save principles, and resumability rules. Refer to it throughout the workflow.

 ## Progress Tracking

@@ -165,52 +61,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 6). Upda

 ### Step 1: Integration Tests

-**Role**: Professional Quality Assurance Engineer
-**Goal**: Analyze input data completeness and produce detailed black-box integration test specifications
-**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
-
-#### Phase 1a: Input Data Completeness Analysis
-
-1. Read `_docs/01_solution/solution.md` (finalized in Prereq 2)
-2. Read `acceptance_criteria.md`, `restrictions.md`
-3. Read testing strategy from solution.md
-4. Analyze `input_data/` contents against:
-   - Coverage of acceptance criteria scenarios
-   - Coverage of restriction edge cases
-   - Coverage of testing strategy requirements
-5. Threshold: at least 70% coverage of the scenarios
-6. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/`
-7. Present coverage assessment to user
-
-**BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient.
-
-#### Phase 1b: Black-Box Test Scenario Specification
-
-Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
-
-1. Define test environment using `templates/integration-environment.md` as structure
-2. Define test data management using `templates/integration-test-data.md` as structure
-3. Write functional test scenarios (positive + negative) using `templates/integration-functional-tests.md` as structure
-4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `templates/integration-non-functional-tests.md` as structure
-5. Build traceability matrix using `templates/integration-traceability-matrix.md` as structure
-
-**Self-verification**:
- [ ] Every acceptance criterion is covered by at least one test scenario
- [ ] Every restriction is verified by at least one test scenario
- [ ] Positive and negative scenarios are balanced
- [ ] Consumer app has no direct access to system internals
- [ ] Docker environment is self-contained (`docker compose up` sufficient)
- [ ] External dependencies have mock/stub services defined
- [ ] Traceability matrix has no uncovered AC or restrictions
-
-**Save action**: Write all files under `integration_tests/`:
- `environment.md`
- `test_data.md`
- `functional_tests.md`
- `non_functional_tests.md`
- `traceability_matrix.md`
-
-**BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed.
+Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`.

 Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3.

@@ -218,285 +69,37 @@ Capture any new questions, findings, or insights that arise during test specific

 ### Step 2: Solution Analysis

-**Role**: Professional software architect
-**Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
-**Constraints**: No code, no component-level detail yet; focus on system-level view
-
-#### Phase 2a: Architecture & Flows
-
-1. Read all input files thoroughly
-2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
-3. Research unknown or questionable topics via internet; ask user about ambiguities
-4. Document architecture using `templates/architecture.md` as structure
-5. Document system flows using `templates/system-flows.md` as structure
-
-**Self-verification**:
- [ ] Architecture covers all capabilities mentioned in solution.md
- [ ] System flows cover all main user/system interactions
- [ ] No contradictions with problem.md or restrictions.md
- [ ] Technology choices are justified
- [ ] Integration test findings are reflected in architecture decisions
-
-**Save action**: Write `architecture.md` and `system-flows.md`
-
-**BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
-
-#### Phase 2b: Data Model
-
-**Role**: Professional software architect
-**Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
-
-1. Extract core entities from architecture.md and solution.md
-2. Define entity attributes, types, and constraints
-3. Define relationships between entities (Mermaid ERD)
-4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
-5. Define seed data requirements per environment (dev, staging)
-6. Define backward compatibility approach for schema changes (additive-only by default)
-
-**Self-verification**:
- [ ] Every entity mentioned in architecture.md is defined
- [ ] Relationships are explicit with cardinality
- [ ] Migration strategy specifies reversibility requirement
- [ ] Seed data requirements defined
- [ ] Backward compatibility approach documented
-
-**Save action**: Write `data_model.md`
-
-#### Phase 2c: Deployment Planning
-
-**Role**: DevOps / Platform engineer
-**Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
-
-Use the `/deploy` skill's templates as structure for each artifact:
-
-1. Read architecture.md and restrictions.md for infrastructure constraints
-2. Research Docker best practices for the project's tech stack
-3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
-4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
-5. Define environment strategy: dev, staging, production with secrets management
-6. Define observability: structured logging, metrics, tracing, alerting
-7. Define deployment procedures: strategy, health checks, rollback, checklist
-
-**Self-verification**:
- [ ] Every component has a Docker specification
- [ ] CI/CD pipeline covers lint, test, security, build, deploy
- [ ] Environment strategy covers dev, staging, production
- [ ] Observability covers logging, metrics, tracing, alerting
- [ ] Deployment procedures include rollback and health checks
-
-**Save action**: Write all 5 files under `deployment/`:
- `containerization.md`
- `ci_cd_pipeline.md`
- `environment_strategy.md`
- `observability.md`
- `deployment_procedures.md`
+Read and follow `steps/02_solution-analysis.md`.

 ---

 ### Step 3: Component Decomposition

-**Role**: Professional software architect
-**Goal**: Decompose the architecture into components with detailed specs
-**Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
-
-1. Identify components from the architecture; think about separation, reusability, and communication patterns
-2. Use integration test scenarios from Step 1 to validate component boundaries
-3. If additional components are needed (data preparation, shared helpers), create them
-4. For each component, write a spec using `templates/component-spec.md` as structure
-5. Generate diagrams:
-   - draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
-   - Mermaid flowchart per main control flow
-6. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified.
-
-**Self-verification**:
- [ ] Each component has a single, clear responsibility
- [ ] No functionality is spread across multiple components
- [ ] All inter-component interfaces are defined (who calls whom, with what)
- [ ] Component dependency graph has no circular dependencies
- [ ] All components from architecture.md are accounted for
- [ ] Every integration test scenario can be traced through component interactions
-
-**Save action**: Write:
- - each component `components/[##]_[name]/description.md`
- - common helper `common-helpers/[##]_helper_[name].md`
- - diagrams `diagrams/`
-
-**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
+Read and follow `steps/03_component-decomposition.md`.

 ---

 ### Step 4: Architecture Review & Risk Assessment

-**Role**: Professional software architect and analyst
-**Goal**: Validate all artifacts for consistency, then identify and mitigate risks
-**Constraints**: This is a review step — fix problems found, do not add new features
-
-#### 4a. Evaluator Pass (re-read ALL artifacts)
-
-Review checklist:
- [ ] All components follow Single Responsibility Principle
- [ ] All components follow dumb code / smart data principle
- [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
- [ ] No circular dependencies in the dependency graph
- [ ] No missing interactions between components
- [ ] No over-engineering — is there a simpler decomposition?
- [ ] Security considerations addressed in component design
- [ ] Performance bottlenecks identified
- [ ] API contracts are consistent across components
-
-Fix any issues found before proceeding to risk identification.
-
-#### 4b. Risk Identification
-
-1. Identify technical and project risks
-2. Assess probability and impact using `templates/risk-register.md`
-3. Define mitigation strategies
-4. Apply mitigations to architecture, flows, and component documents where applicable
-
-**Self-verification**:
- [ ] Every High/Critical risk has a concrete mitigation strategy
- [ ] Mitigations are reflected in the relevant component or architecture docs
- [ ] No new risks introduced by the mitigations themselves
-
-**Save action**: Write `risk_mitigations.md`
-
-**BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
-
-**Iterative**: If user requests another round, repeat Step 4 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
+Read and follow `steps/04_review-risk.md`.

 ---

 ### Step 5: Test Specifications

-**Role**: Professional Quality Assurance Engineer
-
-**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
-
-**Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
-
-1. For each component, write tests using `templates/test-spec.md` as structure
-2. Cover all 4 types: integration, performance, security, acceptance
-3. Include test data management (setup, teardown, isolation)
-4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
-
-**Self-verification**:
- [ ] Every acceptance criterion has at least one test covering it
- [ ] Test inputs are realistic and well-defined
- [ ] Expected results are specific and measurable
- [ ] No component is left without tests
-
-**Save action**: Write each `components/[##]_[name]/tests.md`
+Read and follow `steps/05_test-specifications.md`.

 ---

 ### Step 6: Jira Epics

-**Role**: Professional product manager
-
-**Goal**: Create Jira epics from components, ordered by dependency
-
-**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the Jira epic should understand the full context without needing to open separate files.
-
-1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
-2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md`
-3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
-4. Include effort estimation per epic (T-shirt size or story points range)
-5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
-6. Generate Mermaid diagrams showing component-to-epic mapping and component relationships
-
-**CRITICAL — Epic description richness requirements**:
-
-Each epic description in Jira MUST include ALL of the following sections with substantial content:
- **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections)
- **Problem / Context**: what problem this component solves, why it exists, current pain points
- **Scope**: detailed in-scope and out-of-scope lists
- **Architecture notes**: relevant ADRs, technology choices, patterns used, key design decisions
- **Interface specification**: full method signatures, input/output types, error types (from component description.md)
- **Data flow**: how data enters and exits this component (include Mermaid sequence or flowchart diagram)
- **Dependencies**: epic dependencies (with Jira IDs) and external dependencies (libraries, hardware, services)
- **Acceptance criteria**: measurable criteria with specific thresholds (from component tests.md)
- **Non-functional requirements**: latency, memory, throughput targets with failure thresholds
- **Risks & mitigations**: relevant risks from risk_mitigations.md with concrete mitigation strategies
- **Effort estimation**: T-shirt size and story points range
- **Child issues**: planned task breakdown with complexity points
- **Key constraints**: from restrictions.md that affect this component
- **Testing strategy**: summary of test types and coverage from tests.md
-
-Do NOT create minimal epics with just a summary and short description. The Jira epic is the primary reference document for the implementation team.
-
-**Self-verification**:
- [ ] "Bootstrap & Initial Structure" epic exists and is first in order
- [ ] "Integration Tests" epic exists
- [ ] Every component maps to exactly one epic
- [ ] Dependency order is respected (no epic depends on a later one)
- [ ] Acceptance criteria are measurable
- [ ] Effort estimates are realistic
- [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
- [ ] Epic descriptions are self-contained — readable without opening other files
-
-7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
-
-**Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
+Read and follow `steps/06_jira-epics.md`.

 ---

-## Quality Checklist (before FINAL_report.md)
+### Final: Quality Checklist

-Before writing the final report, verify ALL of the following:
-
-### Integration Tests
- [ ] Every acceptance criterion is covered in traceability_matrix.md
- [ ] Every restriction is verified by at least one test
- [ ] Positive and negative scenarios are balanced
- [ ] Docker environment is self-contained
- [ ] Consumer app treats main system as black box
- [ ] CI/CD integration and reporting defined
-
-### Architecture
- [ ] Covers all capabilities from solution.md
- [ ] Technology choices are justified
- [ ] Deployment model is defined
- [ ] Integration test findings are reflected in architecture decisions
-
-### Data Model
- [ ] Every entity from architecture.md is defined
- [ ] Relationships have explicit cardinality
- [ ] Migration strategy with reversibility requirement
- [ ] Seed data requirements defined
- [ ] Backward compatibility approach documented
-
-### Deployment
- [ ] Containerization plan covers all components
- [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
- [ ] Environment strategy covers dev, staging, production
- [ ] Observability covers logging, metrics, tracing, alerting
- [ ] Deployment procedures include rollback and health checks
-
-### Components
- [ ] Every component follows SRP
- [ ] No circular dependencies
- [ ] All inter-component interfaces are defined and consistent
- [ ] No orphan components (unused by any flow)
- [ ] Every integration test scenario can be traced through component interactions
-
-### Risks
- [ ] All High/Critical risks have mitigations
- [ ] Mitigations are reflected in component/architecture docs
- [ ] User has confirmed risk assessment is sufficient
-
-### Tests
- [ ] Every acceptance criterion is covered by at least one test
- [ ] All 4 test types are represented per component (where applicable)
- [ ] Test data management is defined
-
-### Epics
- [ ] "Bootstrap & Initial Structure" epic exists
- [ ] "Integration Tests" epic exists
- [ ] Every component maps to an epic
- [ ] Dependency order is correct
- [ ] Acceptance criteria are measurable
-
-**Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
+Read and follow `steps/07_quality-checklist.md`.

 ## Common Mistakes

@@ -522,36 +125,3 @@ Before writing the final report, verify ALL of the following:
 | File structure within templates | PROCEED |
 | Contradictions between input files | ASK user |
 | Risk mitigation requires architecture change | ASK user |
-
-## Methodology Quick Reference
-
-```
-┌────────────────────────────────────────────────────────────────┐
-│               Solution Planning (6-Step Method)                │
-├────────────────────────────────────────────────────────────────┤
-│ PREREQ 1: Data Gate (BLOCKING)                                 │
-│   → verify AC, restrictions, input_data exist — STOP if not    │
-│ PREREQ 2: Finalize solution draft                              │
-│   → rename highest solution_draft##.md to solution.md          │
-│ PREREQ 3: Workspace setup                                      │
-│   → create DOCUMENT_DIR/ if needed                                │
-│                                                                │
-│ 1. Integration Tests  → integration_tests/ (5 files)           │
-│    [BLOCKING: user confirms test coverage]                     │
-│ 2a. Architecture      → architecture.md, system-flows.md       │
-│    [BLOCKING: user confirms architecture]                      │
-│ 2b. Data Model        → data_model.md                          │
-│ 2c. Deployment        → deployment/ (5 files)                  │
-│ 3. Component Decompose → components/[##]_[name]/description    │
-│    [BLOCKING: user confirms decomposition]                     │
-│ 4. Review & Risk      → risk_mitigations.md                    │
-│    [BLOCKING: user confirms risks, iterative]                  │
-│ 5. Test Specifications → components/[##]_[name]/tests.md       │
-│ 6. Jira Epics         → Jira via MCP                           │
-│    ─────────────────────────────────────────────────           │
-│    Quality Checklist → FINAL_report.md                         │
-├────────────────────────────────────────────────────────────────┤
-│ Principles: SRP · Dumb code/smart data · Save immediately      │
-│             Ask don't assume · Plan don't code                 │
-└────────────────────────────────────────────────────────────────┘
-```
@@ -0,0 +1,27 @@
+## Prerequisite Checks (BLOCKING)
+
+Run sequentially before any planning step:
+
+### Prereq 1: Data Gate
+
+1. `_docs/00_problem/acceptance_criteria.md` exists and is non-empty — **STOP if missing**
+2. `_docs/00_problem/restrictions.md` exists and is non-empty — **STOP if missing**
+3. `_docs/00_problem/input_data/` exists and contains at least one data file — **STOP if missing**
+4. `_docs/00_problem/problem.md` exists and is non-empty — **STOP if missing**
+
+All four are mandatory. If any is missing or empty, STOP and ask the user to provide them. If the user cannot provide the required data, planning cannot proceed — just stop.
+
+### Prereq 2: Finalize Solution Draft
+
+Only runs after the Data Gate passes:
+
+1. Scan `_docs/01_solution/` for files matching `solution_draft*.md`
+2. Identify the highest-numbered draft (e.g. `solution_draft06.md`)
+3. **Rename** it to `_docs/01_solution/solution.md`
+4. If `solution.md` already exists, ask the user whether to overwrite or keep existing
+5. Verify `solution.md` is non-empty — **STOP if missing or empty**
+
+### Prereq 3: Workspace Setup
+
+1. Create DOCUMENT_DIR if it does not exist
+2. If DOCUMENT_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
@@ -0,0 +1,81 @@
+## Artifact Management
+
+### Directory Structure
+
+All artifacts are written directly under DOCUMENT_DIR:
+
+```
+DOCUMENT_DIR/
+├── integration_tests/
+│   ├── environment.md
+│   ├── test_data.md
+│   ├── functional_tests.md
+│   ├── non_functional_tests.md
+│   └── traceability_matrix.md
+├── architecture.md
+├── system-flows.md
+├── data_model.md
+├── deployment/
+│   ├── containerization.md
+│   ├── ci_cd_pipeline.md
+│   ├── environment_strategy.md
+│   ├── observability.md
+│   └── deployment_procedures.md
+├── risk_mitigations.md
+├── risk_mitigations_02.md          (iterative, ## as sequence)
+├── components/
+│   ├── 01_[name]/
+│   │   ├── description.md
+│   │   └── tests.md
+│   ├── 02_[name]/
+│   │   ├── description.md
+│   │   └── tests.md
+│   └── ...
+├── common-helpers/
+│   ├── 01_helper_[name]/
+│   ├── 02_helper_[name]/
+│   └── ...
+├── diagrams/
+│   ├── components.drawio
+│   └── flows/
+│       ├── flow_[name].md          (Mermaid)
+│       └── ...
+└── FINAL_report.md
+```
+
+### Save Timing
+
+| Step | Save immediately after | Filename |
+|------|------------------------|----------|
+| Step 1 | Integration test environment spec | `integration_tests/environment.md` |
+| Step 1 | Integration test data spec | `integration_tests/test_data.md` |
+| Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
+| Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
+| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
+| Step 2 | Architecture analysis complete | `architecture.md` |
+| Step 2 | System flows documented | `system-flows.md` |
+| Step 2 | Data model documented | `data_model.md` |
+| Step 2 | Deployment plan complete | `deployment/` (5 files) |
+| Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
+| Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
+| Step 3 | Diagrams generated | `diagrams/` |
+| Step 4 | Risk assessment complete | `risk_mitigations.md` |
+| Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
+| Step 6 | Epics created in Jira | Jira via MCP |
+| Final | All steps complete | `FINAL_report.md` |
+
+### Save Principles
+
+1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
+2. **Incremental updates**: same file can be updated multiple times; append or replace
+3. **Preserve process**: keep all intermediate files even after integration into final report
+4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
+
+### Resumability
+
+If DOCUMENT_DIR already contains artifacts:
+
+1. List existing files and match them to the save timing table above
+2. Identify the last completed step based on which artifacts exist
+3. Resume from the next incomplete step
+4. Inform the user which steps are being skipped
@@ -0,0 +1,74 @@
+## Step 2: Solution Analysis
+
+**Role**: Professional software architect
+**Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
+**Constraints**: No code, no component-level detail yet; focus on system-level view
+
+### Phase 2a: Architecture & Flows
+
+1. Read all input files thoroughly
+2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
+3. Research unknown or questionable topics via internet; ask user about ambiguities
+4. Document architecture using `templates/architecture.md` as structure
+5. Document system flows using `templates/system-flows.md` as structure
+
+**Self-verification**:
+- [ ] Architecture covers all capabilities mentioned in solution.md
+- [ ] System flows cover all main user/system interactions
+- [ ] No contradictions with problem.md or restrictions.md
+- [ ] Technology choices are justified
+- [ ] Integration test findings are reflected in architecture decisions
+
+**Save action**: Write `architecture.md` and `system-flows.md`
+
+**BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
+
+### Phase 2b: Data Model
+
+**Role**: Professional software architect
+**Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
+
+1. Extract core entities from architecture.md and solution.md
+2. Define entity attributes, types, and constraints
+3. Define relationships between entities (Mermaid ERD)
+4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
+5. Define seed data requirements per environment (dev, staging)
+6. Define backward compatibility approach for schema changes (additive-only by default)
+
+**Self-verification**:
+- [ ] Every entity mentioned in architecture.md is defined
+- [ ] Relationships are explicit with cardinality
+- [ ] Migration strategy specifies reversibility requirement
+- [ ] Seed data requirements defined
+- [ ] Backward compatibility approach documented
+
+**Save action**: Write `data_model.md`
+
+### Phase 2c: Deployment Planning
+
+**Role**: DevOps / Platform engineer
+**Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
+
+Use the `/deploy` skill's templates as structure for each artifact:
+
+1. Read architecture.md and restrictions.md for infrastructure constraints
+2. Research Docker best practices for the project's tech stack
+3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
+4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
+5. Define environment strategy: dev, staging, production with secrets management
+6. Define observability: structured logging, metrics, tracing, alerting
+7. Define deployment procedures: strategy, health checks, rollback, checklist
+
+**Self-verification**:
+- [ ] Every component has a Docker specification
+- [ ] CI/CD pipeline covers lint, test, security, build, deploy
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Observability covers logging, metrics, tracing, alerting
+- [ ] Deployment procedures include rollback and health checks
+
+**Save action**: Write all 5 files under `deployment/`:
+- `containerization.md`
+- `ci_cd_pipeline.md`
+- `environment_strategy.md`
+- `observability.md`
+- `deployment_procedures.md`
@@ -0,0 +1,29 @@
+## Step 3: Component Decomposition
+
+**Role**: Professional software architect
+**Goal**: Decompose the architecture into components with detailed specs
+**Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
+
+1. Identify components from the architecture; think about separation, reusability, and communication patterns
+2. Use integration test scenarios from Step 1 to validate component boundaries
+3. If additional components are needed (data preparation, shared helpers), create them
+4. For each component, write a spec using `templates/component-spec.md` as structure
+5. Generate diagrams:
+   - draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
+   - Mermaid flowchart per main control flow
+6. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified.
+
+**Self-verification**:
+- [ ] Each component has a single, clear responsibility
+- [ ] No functionality is spread across multiple components
+- [ ] All inter-component interfaces are defined (who calls whom, with what)
+- [ ] Component dependency graph has no circular dependencies
+- [ ] All components from architecture.md are accounted for
+- [ ] Every integration test scenario can be traced through component interactions
+
+**Save action**: Write:
+ - each component `components/[##]_[name]/description.md`
+ - common helper `common-helpers/[##]_helper_[name].md`
+ - diagrams `diagrams/`
+
+**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
@@ -0,0 +1,38 @@
+## Step 4: Architecture Review & Risk Assessment
+
+**Role**: Professional software architect and analyst
+**Goal**: Validate all artifacts for consistency, then identify and mitigate risks
+**Constraints**: This is a review step — fix problems found, do not add new features
+
+### 4a. Evaluator Pass (re-read ALL artifacts)
+
+Review checklist:
+- [ ] All components follow Single Responsibility Principle
+- [ ] All components follow dumb code / smart data principle
+- [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
+- [ ] No circular dependencies in the dependency graph
+- [ ] No missing interactions between components
+- [ ] No over-engineering — is there a simpler decomposition?
+- [ ] Security considerations addressed in component design
+- [ ] Performance bottlenecks identified
+- [ ] API contracts are consistent across components
+
+Fix any issues found before proceeding to risk identification.
+
+### 4b. Risk Identification
+
+1. Identify technical and project risks
+2. Assess probability and impact using `templates/risk-register.md`
+3. Define mitigation strategies
+4. Apply mitigations to architecture, flows, and component documents where applicable
+
+**Self-verification**:
+- [ ] Every High/Critical risk has a concrete mitigation strategy
+- [ ] Mitigations are reflected in the relevant component or architecture docs
+- [ ] No new risks introduced by the mitigations themselves
+
+**Save action**: Write `risk_mitigations.md`
+
+**BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
+
+**Iterative**: If user requests another round, repeat Step 4 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
@@ -0,0 +1,20 @@
+## Step 5: Test Specifications
+
+**Role**: Professional Quality Assurance Engineer
+
+**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
+
+**Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
+
+1. For each component, write tests using `templates/test-spec.md` as structure
+2. Cover all 4 types: integration, performance, security, acceptance
+3. Include test data management (setup, teardown, isolation)
+4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
+
+**Self-verification**:
+- [ ] Every acceptance criterion has at least one test covering it
+- [ ] Test inputs are realistic and well-defined
+- [ ] Expected results are specific and measurable
+- [ ] No component is left without tests
+
+**Save action**: Write each `components/[##]_[name]/tests.md`
@@ -0,0 +1,48 @@
+## Step 6: Jira Epics
+
+**Role**: Professional product manager
+
+**Goal**: Create Jira epics from components, ordered by dependency
+
+**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the Jira epic should understand the full context without needing to open separate files.
+
+1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
+2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md`
+3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
+4. Include effort estimation per epic (T-shirt size or story points range)
+5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
+6. Generate Mermaid diagrams showing component-to-epic mapping and component relationships
+
+**CRITICAL — Epic description richness requirements**:
+
+Each epic description in Jira MUST include ALL of the following sections with substantial content:
+- **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections)
+- **Problem / Context**: what problem this component solves, why it exists, current pain points
+- **Scope**: detailed in-scope and out-of-scope lists
+- **Architecture notes**: relevant ADRs, technology choices, patterns used, key design decisions
+- **Interface specification**: full method signatures, input/output types, error types (from component description.md)
+- **Data flow**: how data enters and exits this component (include Mermaid sequence or flowchart diagram)
+- **Dependencies**: epic dependencies (with Jira IDs) and external dependencies (libraries, hardware, services)
+- **Acceptance criteria**: measurable criteria with specific thresholds (from component tests.md)
+- **Non-functional requirements**: latency, memory, throughput targets with failure thresholds
+- **Risks & mitigations**: relevant risks from risk_mitigations.md with concrete mitigation strategies
+- **Effort estimation**: T-shirt size and story points range
+- **Child issues**: planned task breakdown with complexity points
+- **Key constraints**: from restrictions.md that affect this component
+- **Testing strategy**: summary of test types and coverage from tests.md
+
+Do NOT create minimal epics with just a summary and short description. The Jira epic is the primary reference document for the implementation team.
+
+**Self-verification**:
+- [ ] "Bootstrap & Initial Structure" epic exists and is first in order
+- [ ] "Integration Tests" epic exists
+- [ ] Every component maps to exactly one epic
+- [ ] Dependency order is respected (no epic depends on a later one)
+- [ ] Acceptance criteria are measurable
+- [ ] Effort estimates are realistic
+- [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
+- [ ] Epic descriptions are self-contained — readable without opening other files
+
+7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
+
+**Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
@@ -0,0 +1,57 @@
+## Quality Checklist (before FINAL_report.md)
+
+Before writing the final report, verify ALL of the following:
+
+### Integration Tests
+- [ ] Every acceptance criterion is covered in traceability_matrix.md
+- [ ] Every restriction is verified by at least one test
+- [ ] Positive and negative scenarios are balanced
+- [ ] Docker environment is self-contained
+- [ ] Consumer app treats main system as black box
+- [ ] CI/CD integration and reporting defined
+
+### Architecture
+- [ ] Covers all capabilities from solution.md
+- [ ] Technology choices are justified
+- [ ] Deployment model is defined
+- [ ] Integration test findings are reflected in architecture decisions
+
+### Data Model
+- [ ] Every entity from architecture.md is defined
+- [ ] Relationships have explicit cardinality
+- [ ] Migration strategy with reversibility requirement
+- [ ] Seed data requirements defined
+- [ ] Backward compatibility approach documented
+
+### Deployment
+- [ ] Containerization plan covers all components
+- [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Observability covers logging, metrics, tracing, alerting
+- [ ] Deployment procedures include rollback and health checks
+
+### Components
+- [ ] Every component follows SRP
+- [ ] No circular dependencies
+- [ ] All inter-component interfaces are defined and consistent
+- [ ] No orphan components (unused by any flow)
+- [ ] Every integration test scenario can be traced through component interactions
+
+### Risks
+- [ ] All High/Critical risks have mitigations
+- [ ] Mitigations are reflected in component/architecture docs
+- [ ] User has confirmed risk assessment is sufficient
+
+### Tests
+- [ ] Every acceptance criterion is covered by at least one test
+- [ ] All 4 test types are represented per component (where applicable)
+- [ ] Test data management is defined
+
+### Epics
+- [ ] "Bootstrap & Initial Structure" epic exists
+- [ ] "Integration Tests" epic exists
+- [ ] Every component maps to an epic
+- [ ] Dependency order is correct
+- [ ] Acceptance criteria are measurable
+
+**Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
@@ -43,257 +43,51 @@ Determine the operating mode based on invocation before any other logic runs.

 **Standalone mode** (explicit input file provided, e.g. `/research @some_doc.md`):
 - INPUT_FILE: the provided file (treated as problem description)
- OUTPUT_DIR: `_standalone/01_solution/`
- RESEARCH_DIR: `_standalone/00_research/`
+- BASE_DIR: if specified by the caller, use it; otherwise default to `_standalone/`
+- OUTPUT_DIR: `BASE_DIR/01_solution/`
+- RESEARCH_DIR: `BASE_DIR/00_research/`
 - Guardrails relaxed: only INPUT_FILE must exist and be non-empty
 - `restrictions.md` and `acceptance_criteria.md` are optional — warn if absent, proceed if user confirms
 - Mode detection uses OUTPUT_DIR for `solution_draft*.md` scanning
 - Draft numbering works the same, scoped to OUTPUT_DIR
- **Final step**: after all research is complete, move INPUT_FILE into `_standalone/`
+- **Final step**: after all research is complete, move INPUT_FILE into BASE_DIR

 Announce the detected mode and resolved paths to the user before proceeding.

 ## Project Integration

-### Prerequisite Guardrails (BLOCKING)
-
-Before any research begins, verify the input context exists. **Do not proceed if guardrails fail.**
-
-**Project mode:**
-1. Check INPUT_DIR exists — **STOP if missing**, ask user to create it and provide problem files
-2. Check `problem.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
-3. Check `restrictions.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
-4. Check `acceptance_criteria.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
-5. Check `input_data/` in INPUT_DIR exists and contains at least one file — **STOP if missing**
-6. Read **all** files in INPUT_DIR to ground the investigation in the project context
-7. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
-
-**Standalone mode:**
-1. Check INPUT_FILE exists and is non-empty — **STOP if missing**
-2. Warn if no `restrictions.md` or `acceptance_criteria.md` were provided alongside INPUT_FILE — proceed if user confirms
-3. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
-
-### Mode Detection
-
-After guardrails pass, determine the execution mode:
-
-1. Scan OUTPUT_DIR for files matching `solution_draft*.md`
-2. **No matches found** → **Mode A: Initial Research**
-3. **Matches found** → **Mode B: Solution Assessment** (use the highest-numbered draft as input)
-4. **User override**: if the user explicitly says "research from scratch" or "initial research", force Mode A regardless of existing drafts
-
-Inform the user which mode was detected and confirm before proceeding.
-
-### Solution Draft Numbering
-
-All final output is saved as `OUTPUT_DIR/solution_draft##.md` with a 2-digit zero-padded number:
-
-1. Scan existing files in OUTPUT_DIR matching `solution_draft*.md`
-2. Extract the highest existing number
-3. Increment by 1
-4. Zero-pad to 2 digits (e.g., `01`, `02`, ..., `10`, `11`)
-
-Example: if `solution_draft01.md` through `solution_draft10.md` exist, the next output is `solution_draft11.md`.
-
-### Working Directory & Intermediate Artifact Management
-
-#### Directory Structure
-
-At the start of research, **must** create a working directory under RESEARCH_DIR:
-
-```
-RESEARCH_DIR/
-├── 00_ac_assessment.md            # Mode A Phase 1 output: AC & restrictions assessment
-├── 00_question_decomposition.md   # Step 0-1 output
-├── 01_source_registry.md          # Step 2 output: all consulted source links
-├── 02_fact_cards.md               # Step 3 output: extracted facts
-├── 03_comparison_framework.md     # Step 4 output: selected framework and populated data
-├── 04_reasoning_chain.md          # Step 6 output: fact → conclusion reasoning
-├── 05_validation_log.md           # Step 7 output: use-case validation results
-└── raw/                           # Raw source archive (optional)
-    ├── source_1.md
-    └── source_2.md
-```
-
-### Save Timing & Content
-
-| Step | Save immediately after completion | Filename |
-|------|-----------------------------------|----------|
-| Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
-| Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
-| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
-| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
-| Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
-| Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
-| Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
-| Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
-
-### Save Principles
-
-1. **Save immediately**: Write to the corresponding file as soon as a step is completed; don't wait until the end
-2. **Incremental updates**: Same file can be updated multiple times; append or replace new content
-3. **Preserve process**: Keep intermediate files even after their content is integrated into the final report
-4. **Enable recovery**: If research is interrupted, progress can be recovered from intermediate files
+Read and follow `steps/00_project-integration.md` for prerequisite guardrails, mode detection, draft numbering, working directory setup, save timing, and output file inventory.

 ## Execution Flow

 ### Mode A: Initial Research

-Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the user explicitly requests initial research.
+Read and follow `steps/01_mode-a-initial-research.md`.

-#### Phase 1: AC & Restrictions Assessment (BLOCKING)
-
-**Role**: Professional software architect
-
-A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
-
-**Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
-
-**Task**:
-1. Read all problem context files thoroughly
-2. **ASK the user about every unclear aspect** — do not assume:
-   - Unclear problem boundaries → ask
-   - Ambiguous acceptance criteria values → ask
-   - Missing context (no `security_approach.md`, no `input_data/`) → ask what they have
-   - Conflicting restrictions → ask which takes priority
-3. Research in internet **extensively** — use multiple search queries per question, rephrase, and search from different angles:
-   - How realistic are the acceptance criteria for this specific domain? Search for industry benchmarks, standards, and typical values
-   - How critical is each criterion? Search for case studies where criteria were relaxed or tightened
-   - What domain-specific acceptance criteria are we missing? Search for industry standards, regulatory requirements, and best practices in the specific domain
-   - Impact of each criterion value on the whole system quality — search for research papers and engineering reports
-   - Cost/budget implications of each criterion — search for pricing, total cost of ownership analyses, and comparable project budgets
-   - Timeline implications — search for project timelines, development velocity reports, and comparable implementations
-   - What do practitioners in this domain consider the most important criteria? Search forums, conference talks, and experience reports
-4. Research restrictions from multiple perspectives:
-   - Are the restrictions realistic? Search for comparable projects that operated under similar constraints
-   - Should any be tightened or relaxed? Search for what constraints similar projects actually ended up with
-   - Are there additional restrictions we should add? Search for regulatory, compliance, and safety requirements in this domain
-   - What restrictions do practitioners wish they had defined earlier? Search for post-mortem reports and lessons learned
-5. Verify findings with authoritative sources (official docs, papers, benchmarks) — each key finding must have at least 2 independent sources
-
-**Uses Steps 0-3 of the 8-step engine** (question classification, decomposition, source tiering, fact extraction) scoped to AC and restrictions assessment.
-
-**📁 Save action**: Write `RESEARCH_DIR/00_ac_assessment.md` with format:
-
-```markdown
-# Acceptance Criteria Assessment
-
-## Acceptance Criteria
-
-| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
-|-----------|-----------|-------------------|---------------------|--------|
-| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
-
-## Restrictions Assessment
-
-| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
-|-------------|-----------|-------------------|---------------------|--------|
-| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
-
-## Key Findings
-[Summary of critical findings]
-
-## Sources
-[Key references used]
-```
-
-**BLOCKING**: Present the AC assessment tables to the user. Wait for confirmation or adjustments before proceeding to Phase 2. The user may update `acceptance_criteria.md` or `restrictions.md` based on findings.
-
---
-
-#### Phase 2: Problem Research & Solution Draft
-
-**Role**: Professional researcher and software architect
-
-Full 8-step research methodology. Produces the first solution draft.
-
-**Input**: All files from INPUT_DIR (possibly updated after Phase 1) + Phase 1 artifacts
-
-**Task** (drives the 8-step engine):
-1. Research existing/competitor solutions for similar problems — search broadly across industries and adjacent domains, not just the obvious competitors
-2. Research the problem thoroughly — all possible ways to solve it, split into components; search for how different fields approach analogous problems
-3. For each component, research all possible solutions and find the most efficient state-of-the-art approaches — use multiple query variants and perspectives from Step 1
-4. For each promising approach, search for real-world deployment experience: success stories, failure reports, lessons learned, and practitioner opinions
-5. Search for contrarian viewpoints — who argues against the common approaches and why? What failure modes exist?
-6. Verify that suggested tools/libraries actually exist and work as described — check official repos, latest releases, and community health (stars, recent commits, open issues)
-7. Include security considerations in each component analysis
-8. Provide rough cost estimates for proposed solutions
-
-Be concise in formulating. The fewer words, the better, but do not miss any important details.
-
-**📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
-
---
-
-#### Phase 3: Tech Stack Consolidation (OPTIONAL)
-
-**Role**: Software architect evaluating technology choices
-
-Focused synthesis step — no new 8-step cycle. Uses research already gathered in Phase 2 to make concrete technology decisions.
-
-**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + all files from INPUT_DIR
-
-**Task**:
-1. Extract technology options from the solution draft's component comparison tables
-2. Score each option against: fitness for purpose, maturity, security track record, team expertise, cost, scalability
-3. Produce a tech stack summary with selection rationale
-4. Assess risks and learning requirements per technology choice
-
-**📁 Save action**: Write `OUTPUT_DIR/tech_stack.md` with:
- Requirements analysis (functional, non-functional, constraints)
- Technology evaluation tables (language, framework, database, infrastructure, key libraries) with scores
- Tech stack summary block
- Risk assessment and learning requirements tables
-
---
-
-#### Phase 4: Security Deep Dive (OPTIONAL)
-
-**Role**: Security architect
-
-Focused analysis step — deepens the security column from the solution draft into a proper threat model and controls specification.
-
-**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + `security_approach.md` from INPUT_DIR + problem context
-
-**Task**:
-1. Build threat model: asset inventory, threat actors, attack vectors
-2. Define security requirements and proposed controls per component (with risk level)
-3. Summarize authentication/authorization, data protection, secure communication, and logging/monitoring approach
-
-**📁 Save action**: Write `OUTPUT_DIR/security_analysis.md` with:
- Threat model (assets, actors, vectors)
- Per-component security requirements and controls table
- Security controls summary
+Phases: AC Assessment (BLOCKING) → Problem Research → Tech Stack (optional) → Security (optional).

 ---

 ### Mode B: Solution Assessment

-Triggered when `solution_draft*.md` files exist in OUTPUT_DIR.
+Read and follow `steps/02_mode-b-solution-assessment.md`.

-**Role**: Professional software architect
+---

-Full 8-step research methodology applied to assessing and improving an existing solution draft.
+## Research Engine (8-Step Method)

-**Input**: All files from INPUT_DIR + the latest (highest-numbered) `solution_draft##.md` from OUTPUT_DIR
+The 8-step method is the core research engine used by both modes. Steps 0-1 and Step 8 have mode-specific behavior; Steps 2-7 are identical regardless of mode.

-**Task** (drives the 8-step engine):
-1. Read the existing solution draft thoroughly
-2. Research in internet extensively — for each component/decision in the draft, search for:
-   - Known problems and limitations of the chosen approach
-   - What practitioners say about using it in production
-   - Better alternatives that may have emerged recently
-   - Common failure modes and edge cases
-   - How competitors/similar projects solve the same problem differently
-3. Search specifically for contrarian views: "why not [chosen approach]", "[chosen approach] criticism", "[chosen approach] failure"
-4. Identify security weak points and vulnerabilities — search for CVEs, security advisories, and known attack vectors for each technology in the draft
-5. Identify performance bottlenecks — search for benchmarks, load test results, and scalability reports
-6. For each identified weak point, search for multiple solution approaches and compare them
-7. Based on findings, form a new solution draft in the same format
+**Investigation phase** (Steps 0–3.5): Read and follow `steps/03_engine-investigation.md`.
+Covers: question classification, novelty sensitivity, question decomposition, perspective rotation, exhaustive web search, fact extraction, iterative deepening.

-**📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
+**Analysis phase** (Steps 4–8): Read and follow `steps/04_engine-analysis.md`.
+Covers: comparison framework, baseline alignment, reasoning chain, use-case validation, deliverable formatting.

-**Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions above.
+## Solution Draft Output Templates
+
+- Mode A: `templates/solution_draft_mode_a.md`
+- Mode B: `templates/solution_draft_mode_b.md`

 ## Escalation Rules

@@ -317,389 +111,12 @@ When the user wants to:
 - Gather information and evidence for a decision
 - Assess or improve an existing solution draft

-**Keywords**:
- "deep research", "deep dive", "in-depth analysis"
- "research this", "investigate", "look into"
- "assess solution", "review draft", "improve solution"
- "comparative analysis", "concept comparison", "technical comparison"
-
 **Differentiation from other Skills**:
 - Needs a **visual knowledge graph** → use `research-to-diagram`
 - Needs **written output** (articles/tutorials) → use `wsy-writer`
 - Needs **material organization** → use `material-to-markdown`
 - Needs **research + solution draft** → use this Skill

-## Research Engine (8-Step Method)
-
-The 8-step method is the core research engine used by both modes. Steps 0-1 and Step 8 have mode-specific behavior; Steps 2-7 are identical regardless of mode.
-
-### Step 0: Question Type Classification
-
-First, classify the research question type and select the corresponding strategy:
-
-| Question Type | Core Task | Focus Dimensions |
-|---------------|-----------|------------------|
-| **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
-| **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
-| **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
-| **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
-| **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |
-
-**Mode-specific classification**:
-
-| Mode / Phase | Typical Question Type |
-|--------------|----------------------|
-| Mode A Phase 1 | Knowledge Organization + Decision Support |
-| Mode A Phase 2 | Decision Support |
-| Mode B | Problem Diagnosis + Decision Support |
-
-### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)
-
-Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.
-
-**For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`
-
-Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.
-
-**📁 Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`
-
---
-
-### Step 1: Question Decomposition & Boundary Definition
-
-**Mode-specific sub-questions**:
-
-**Mode A Phase 2** (Initial Research — Problem & Solution):
- "What existing/competitor solutions address this problem?"
- "What are the component parts of this problem?"
- "For each component, what are the state-of-the-art solutions?"
- "What are the security considerations per component?"
- "What are the cost implications of each approach?"
-
-**Mode B** (Solution Assessment):
- "What are the weak points and potential problems in the existing draft?"
- "What are the security vulnerabilities in the proposed architecture?"
- "Where are the performance bottlenecks?"
- "What solutions exist for each identified issue?"
-
-**General sub-question patterns** (use when applicable):
- **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
- **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
- **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
- **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)
-
-#### Perspective Rotation (MANDATORY)
-
-For each research problem, examine it from **at least 3 different perspectives**. Each perspective generates its own sub-questions and search queries.
-
-| Perspective | What it asks | Example queries |
-|-------------|-------------|-----------------|
-| **End-user / Consumer** | What problems do real users encounter? What do they wish were different? | "X problems", "X frustrations reddit", "X user complaints" |
-| **Implementer / Engineer** | What are the technical challenges, gotchas, hidden complexities? | "X implementation challenges", "X pitfalls", "X lessons learned" |
-| **Business / Decision-maker** | What are the costs, ROI, strategic implications? | "X total cost of ownership", "X ROI case study", "X vs Y business comparison" |
-| **Contrarian / Devil's advocate** | What could go wrong? Why might this fail? What are critics saying? | "X criticism", "why not X", "X failures", "X disadvantages real world" |
-| **Domain expert / Academic** | What does peer-reviewed research say? What are theoretical limits? | "X research paper", "X systematic review", "X benchmarks academic" |
-| **Practitioner / Field** | What do people who actually use this daily say? What works in practice vs theory? | "X in production", "X experience report", "X after 1 year" |
-
-Select at least 3 perspectives relevant to the problem. Document the chosen perspectives in `00_question_decomposition.md`.
-
-#### Question Explosion (MANDATORY)
-
-For **each sub-question**, generate **at least 3-5 search query variants** before searching. This ensures broad coverage and avoids missing relevant information due to terminology differences.
-
-**Query variant strategies**:
- **Specificity ladder**: broad ("indoor navigation systems") → narrow ("UWB-based indoor drone navigation accuracy")
- **Negation/failure**: "X limitations", "X failure modes", "when X doesn't work"
- **Comparison framing**: "X vs Y for Z", "X alternative for Z", "X or Y which is better for Z"
- **Practitioner voice**: "X in production experience", "X real-world results", "X lessons learned"
- **Temporal**: "X 2025", "X latest developments", "X roadmap"
- **Geographic/domain**: "X in Europe", "X for defense applications", "X in agriculture"
-
-Record all planned queries in `00_question_decomposition.md` alongside each sub-question.
-
-**⚠️ Research Subject Boundary Definition (BLOCKING - must be explicit)**:
-
-When decomposing questions, you must explicitly define the **boundaries of the research subject**:
-
-| Dimension | Boundary to define | Example |
-|-----------|--------------------|---------|
-| **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
-| **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
-| **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
-| **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
-
-**Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.
-
-**📁 Save action**:
-1. Read all files from INPUT_DIR to ground the research in the project context
-2. Create working directory `RESEARCH_DIR/`
-3. Write `00_question_decomposition.md`, including:
-   - Original question
-   - Active mode (A Phase 2 or B) and rationale
-   - Summary of relevant problem context from INPUT_DIR
-   - Classified question type and rationale
-   - **Research subject boundary definition** (population, geography, timeframe, level)
-   - List of decomposed sub-questions
-   - **Chosen perspectives** (at least 3 from the Perspective Rotation table) with rationale
-   - **Search query variants** for each sub-question (at least 3-5 per sub-question)
-4. Write TodoWrite to track progress
-
-### Step 2: Source Tiering & Exhaustive Web Investigation
-
-Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.
-
-**For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`
-
-**Tool Usage**:
- Use `WebSearch` for broad searches; `WebFetch` to read specific pages
- Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
- Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
- When citing web sources, include the URL and date accessed
-
-#### Exhaustive Search Requirements (MANDATORY)
-
-Do not stop at the first few results. The goal is to build a comprehensive evidence base.
-
-**Minimum search effort per sub-question**:
- Execute **all** query variants generated in Step 1's Question Explosion (at least 3-5 per sub-question)
- Consult at least **2 different source tiers** per sub-question (e.g., L1 official docs + L4 community discussion)
- If initial searches yield fewer than 3 relevant sources for a sub-question, **broaden the search** with alternative terms, related domains, or analogous problems
-
-**Search broadening strategies** (use when results are thin):
- Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
- Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
- Try different geographies: search in English + search for European/Asian approaches if relevant
- Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
- Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"
-
-**Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.
-
-**📁 Save action**:
-For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
-
-### Step 3: Fact Extraction & Evidence Cards
-
-Transform sources into **verifiable fact cards**:
-
-```markdown
-## Fact Cards
-
-### Fact 1
- **Statement**: [specific fact description]
- **Source**: [link/document section]
- **Confidence**: High/Medium/Low
-
-### Fact 2
-...
-```
-
-**Key discipline**:
- Pin down facts first, then reason
- Distinguish "what officials said" from "what I infer"
- When conflicting information is found, annotate and preserve both sides
- Annotate confidence level:
-  - ✅ High: Explicitly stated in official documentation
-  - ⚠️ Medium: Mentioned in official blog but not formally documented
-  - ❓ Low: Inference or from unofficial sources
-
-**📁 Save action**:
-For each extracted fact, **immediately** append to `02_fact_cards.md`:
-```markdown
-## Fact #[number]
- **Statement**: [specific fact description]
- **Source**: [Source #number] [link]
- **Phase**: [Phase 1 / Phase 2 / Assessment]
- **Target Audience**: [which group this fact applies to, inherited from source or further refined]
- **Confidence**: ✅/⚠️/❓
- **Related Dimension**: [corresponding comparison dimension]
-```
-
-**⚠️ Target audience in fact statements**:
- If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
- Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
- Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"
-
-### Step 3.5: Iterative Deepening — Follow-Up Investigation
-
-After initial fact extraction, review what you have found and identify **knowledge gaps and new questions** that emerged from the initial research. This step ensures the research doesn't stop at surface-level findings.
-
-**Process**:
-
-1. **Gap analysis**: Review fact cards and identify:
-   - Sub-questions with fewer than 3 high-confidence facts → need more searching
-   - Contradictions between sources → need tie-breaking evidence
-   - Perspectives (from Step 1) that have no or weak coverage → need targeted search
-   - Claims that rely only on L3/L4 sources → need L1/L2 verification
-
-2. **Follow-up question generation**: Based on initial findings, generate new questions:
-   - "Source X claims [fact] — is this consistent with other evidence?"
-   - "If [approach A] has [limitation], how do practitioners work around it?"
-   - "What are the second-order effects of [finding]?"
-   - "Who disagrees with [common finding] and why?"
-   - "What happened when [solution] was deployed at scale?"
-
-3. **Targeted deep-dive searches**: Execute follow-up searches focusing on:
-   - Specific claims that need verification
-   - Alternative viewpoints not yet represented
-   - Real-world case studies and experience reports
-   - Failure cases and edge conditions
-   - Recent developments that may change the picture
-
-4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md`
-
-**Exit criteria**: Proceed to Step 4 when:
- Every sub-question has at least 3 facts with at least one from L1/L2
- At least 3 perspectives from Step 1 have supporting evidence
- No unresolved contradictions remain (or they are explicitly documented as open questions)
- Follow-up searches are no longer producing new substantive information
-
-### Step 4: Build Comparison/Analysis Framework
-
-Based on the question type, select fixed analysis dimensions. **For dimension lists** (General, Concept Comparison, Decision Support): Read `references/comparison-frameworks.md`
-
-**📁 Save action**:
-Write to `03_comparison_framework.md`:
-```markdown
-# Comparison Framework
-
-## Selected Framework Type
-[Concept Comparison / Decision Support / ...]
-
-## Selected Dimensions
-1. [Dimension 1]
-2. [Dimension 2]
-...
-
-## Initial Population
-| Dimension | X | Y | Factual Basis |
-|-----------|---|---|---------------|
-| [Dimension 1] | [description] | [description] | Fact #1, #3 |
-| ... | | | |
-```
-
-### Step 5: Reference Point Baseline Alignment
-
-Ensure all compared parties have clear, consistent definitions:
-
-**Checklist**:
- [ ] Is the reference point's definition stable/widely accepted?
- [ ] Does it need verification, or can domain common knowledge be used?
- [ ] Does the reader's understanding of the reference point match mine?
- [ ] Are there ambiguities that need to be clarified first?
-
-### Step 6: Fact-to-Conclusion Reasoning Chain
-
-Explicitly write out the "fact → comparison → conclusion" reasoning process:
-
-```markdown
-## Reasoning Process
-
-### Regarding [Dimension Name]
-
-1. **Fact confirmation**: According to [source], X's mechanism is...
-2. **Compare with reference**: While Y's mechanism is...
-3. **Conclusion**: Therefore, the difference between X and Y on this dimension is...
-```
-
-**Key discipline**:
- Conclusions come from mechanism comparison, not "gut feelings"
- Every conclusion must be traceable to specific facts
- Uncertain conclusions must be annotated
-
-**📁 Save action**:
-Write to `04_reasoning_chain.md`:
-```markdown
-# Reasoning Chain
-
-## Dimension 1: [Dimension Name]
-
-### Fact Confirmation
-According to [Fact #X], X's mechanism is...
-
-### Reference Comparison
-While Y's mechanism is... (Source: [Fact #Y])
-
-### Conclusion
-Therefore, the difference between X and Y on this dimension is...
-
-### Confidence
-✅/⚠️/❓ + rationale
-
---
-## Dimension 2: [Dimension Name]
-...
-```
-
-### Step 7: Use-Case Validation (Sanity Check)
-
-Validate conclusions against a typical scenario:
-
-**Validation questions**:
- Based on my conclusions, how should this scenario be handled?
- Is that actually the case?
- Are there counterexamples that need to be addressed?
-
-**Review checklist**:
- [ ] Are draft conclusions consistent with Step 3 fact cards?
- [ ] Are there any important dimensions missed?
- [ ] Is there any over-extrapolation?
- [ ] Are conclusions actionable/verifiable?
-
-**📁 Save action**:
-Write to `05_validation_log.md`:
-```markdown
-# Validation Log
-
-## Validation Scenario
-[Scenario description]
-
-## Expected Based on Conclusions
-If using X: [expected behavior]
-If using Y: [expected behavior]
-
-## Actual Validation Results
-[actual situation]
-
-## Counterexamples
-[yes/no, describe if yes]
-
-## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [ ] Issue found: [if any]
-
-## Conclusions Requiring Revision
-[if any]
-```
-
-### Step 8: Deliverable Formatting
-
-Make the output **readable, traceable, and actionable**.
-
-**📁 Save action**:
-Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` using the appropriate output template based on active mode:
- Mode A: `templates/solution_draft_mode_a.md`
- Mode B: `templates/solution_draft_mode_b.md`
-
-Sources to integrate:
- Extract background from `00_question_decomposition.md`
- Reference key facts from `02_fact_cards.md`
- Organize conclusions from `04_reasoning_chain.md`
- Generate references from `01_source_registry.md`
- Supplement with use cases from `05_validation_log.md`
- For Mode A: include AC assessment from `00_ac_assessment.md`
-
-## Solution Draft Output Templates
-
-### Mode A: Initial Research Output
-
-Use template: `templates/solution_draft_mode_a.md`
-
-### Mode B: Solution Assessment Output
-
-Use template: `templates/solution_draft_mode_b.md`
-
 ## Stakeholder Perspectives

 Adjust content depth based on audience:
@@ -710,75 +127,6 @@ Adjust content depth based on audience:
 | **Implementers** | Specific mechanisms, how-to | Detailed, emphasize how to do it |
 | **Technical experts** | Details, boundary conditions, limitations | In-depth, emphasize accuracy |

-## Output Files
-
-Default intermediate artifacts location: `RESEARCH_DIR/`
-
-**Required files** (automatically generated through the process):
-
-| File | Content | When Generated |
-|------|---------|----------------|
-| `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
-| `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
-| `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
-| `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
-| `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
-| `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
-| `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
-| `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
-| `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
-| `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
-
-**Optional files**:
- `raw/*.md` - Raw source archives (saved when content is lengthy)
-
-## Methodology Quick Reference Card
-
-```
-┌──────────────────────────────────────────────────────────────────┐
-│              Deep Research — Mode-Aware 8-Step Method            │
-├──────────────────────────────────────────────────────────────────┤
-│ CONTEXT: Resolve mode (project vs standalone) + set paths        │
-│ GUARDRAILS: Check INPUT_DIR/INPUT_FILE exists + required files   │
-│ MODE DETECT: solution_draft*.md in 01_solution? → A or B         │
-│                                                                  │
-│ MODE A: Initial Research                                         │
-│   Phase 1: AC & Restrictions Assessment (BLOCKING)               │
-│   Phase 2: Full 8-step → solution_draft##.md                     │
-│   Phase 3: Tech Stack Consolidation (OPTIONAL) → tech_stack.md   │
-│   Phase 4: Security Deep Dive (OPTIONAL) → security_analysis.md  │
-│                                                                  │
-│ MODE B: Solution Assessment                                      │
-│   Read latest draft → Full 8-step → solution_draft##.md (N+1)    │
-│   Optional: Phase 3 / Phase 4 on revised draft                   │
-│                                                                  │
-│ 8-STEP ENGINE:                                                   │
-│  0. Classify question type → Select framework template           │
-│  0.5 Novelty sensitivity → Time windows for sources              │
-│  1. Decompose question → sub-questions + perspectives + queries  │
-│     → Perspective Rotation (3+ viewpoints, MANDATORY)            │
-│     → Question Explosion (3-5 query variants per sub-Q)          │
-│  2. Exhaustive web search → L1 > L2 > L3 > L4, broad coverage   │
-│     → Execute ALL query variants, search until saturation        │
-│  3. Extract facts → Each with source, confidence level           │
-│  3.5 Iterative deepening → gaps, contradictions, follow-ups     │
-│     → Keep searching until exit criteria met                     │
-│  4. Build framework → Fixed dimensions, structured compare       │
-│  5. Align references → Ensure unified definitions                │
-│  6. Reasoning chain → Fact→Compare→Conclude, explicit            │
-│  7. Use-case validation → Sanity check, prevent armchairing      │
-│  8. Deliverable → solution_draft##.md (mode-specific format)     │
-├──────────────────────────────────────────────────────────────────┤
-│ Key discipline: Ask don't assume · Facts before reasoning        │
-│   Conclusions from mechanism, not gut feelings                   │
-│   Search broadly, from multiple perspectives, until saturation   │
-└──────────────────────────────────────────────────────────────────┘
-```
-
-## Usage Examples
-
-For detailed execution flow examples (Mode A initial, Mode B assessment, standalone, force override): Read `references/usage-examples.md`
-
 ## Source Verifiability Requirements

 Every cited piece of external information must be directly verifiable by the user. All links must be publicly accessible (annotate `[login required]` if not), citations must include exact section/page/timestamp, and unverifiable information must be annotated `[limited source]`. Full checklist in `references/quality-checklists.md`.
@@ -796,7 +144,7 @@ Before completing the solution draft, run through the checklists in `references/

 When replying to the user after research is complete:

-**✅ Should include**:
+**Should include**:
 - Active mode used (A or B) and which optional phases were executed
 - One-sentence core conclusion
 - Key findings summary (3-5 points)
@@ -804,7 +152,7 @@ When replying to the user after research is complete:
 - Paths to optional artifacts if produced: `tech_stack.md`, `security_analysis.md`
 - If there are significant uncertainties, annotate points requiring further verification

-**❌ Must not include**:
+**Must not include**:
 - Process file listings (e.g., `00_question_decomposition.md`, `01_source_registry.md`, etc.)
 - Detailed research step descriptions
 - Working directory structure display
@@ -0,0 +1,103 @@
+## Project Integration
+
+### Prerequisite Guardrails (BLOCKING)
+
+Before any research begins, verify the input context exists. **Do not proceed if guardrails fail.**
+
+**Project mode:**
+1. Check INPUT_DIR exists — **STOP if missing**, ask user to create it and provide problem files
+2. Check `problem.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
+3. Check `restrictions.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
+4. Check `acceptance_criteria.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
+5. Check `input_data/` in INPUT_DIR exists and contains at least one file — **STOP if missing**
+6. Read **all** files in INPUT_DIR to ground the investigation in the project context
+7. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
+
+**Standalone mode:**
+1. Check INPUT_FILE exists and is non-empty — **STOP if missing**
+2. Resolve BASE_DIR: use the caller-specified directory if provided; otherwise default to `_standalone/`
+3. Resolve OUTPUT_DIR (`BASE_DIR/01_solution/`) and RESEARCH_DIR (`BASE_DIR/00_research/`)
+4. Warn if no `restrictions.md` or `acceptance_criteria.md` were provided alongside INPUT_FILE — proceed if user confirms
+5. Create BASE_DIR, OUTPUT_DIR, and RESEARCH_DIR if they don't exist
+
+### Mode Detection
+
+After guardrails pass, determine the execution mode:
+
+1. Scan OUTPUT_DIR for files matching `solution_draft*.md`
+2. **No matches found** → **Mode A: Initial Research**
+3. **Matches found** → **Mode B: Solution Assessment** (use the highest-numbered draft as input)
+4. **User override**: if the user explicitly says "research from scratch" or "initial research", force Mode A regardless of existing drafts
+
+Inform the user which mode was detected and confirm before proceeding.
+
+### Solution Draft Numbering
+
+All final output is saved as `OUTPUT_DIR/solution_draft##.md` with a 2-digit zero-padded number:
+
+1. Scan existing files in OUTPUT_DIR matching `solution_draft*.md`
+2. Extract the highest existing number
+3. Increment by 1
+4. Zero-pad to 2 digits (e.g., `01`, `02`, ..., `10`, `11`)
+
+Example: if `solution_draft01.md` through `solution_draft10.md` exist, the next output is `solution_draft11.md`.
+
+### Working Directory & Intermediate Artifact Management
+
+#### Directory Structure
+
+At the start of research, **must** create a working directory under RESEARCH_DIR:
+
+```
+RESEARCH_DIR/
+├── 00_ac_assessment.md            # Mode A Phase 1 output: AC & restrictions assessment
+├── 00_question_decomposition.md   # Step 0-1 output
+├── 01_source_registry.md          # Step 2 output: all consulted source links
+├── 02_fact_cards.md               # Step 3 output: extracted facts
+├── 03_comparison_framework.md     # Step 4 output: selected framework and populated data
+├── 04_reasoning_chain.md          # Step 6 output: fact → conclusion reasoning
+├── 05_validation_log.md           # Step 7 output: use-case validation results
+└── raw/                           # Raw source archive (optional)
+    ├── source_1.md
+    └── source_2.md
+```
+
+### Save Timing & Content
+
+| Step | Save immediately after completion | Filename |
+|------|-----------------------------------|----------|
+| Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
+| Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
+| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
+| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
+| Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
+| Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
+| Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
+| Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
+
+### Save Principles
+
+1. **Save immediately**: Write to the corresponding file as soon as a step is completed; don't wait until the end
+2. **Incremental updates**: Same file can be updated multiple times; append or replace new content
+3. **Preserve process**: Keep intermediate files even after their content is integrated into the final report
+4. **Enable recovery**: If research is interrupted, progress can be recovered from intermediate files
+
+### Output Files
+
+**Required files** (automatically generated through the process):
+
+| File | Content | When Generated |
+|------|---------|----------------|
+| `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
+| `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
+| `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
+| `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
+| `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
+| `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
+| `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
+| `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
+| `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
+| `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
+
+**Optional files**:
+- `raw/*.md` - Raw source archives (saved when content is lengthy)
@@ -0,0 +1,127 @@
+## Mode A: Initial Research
+
+Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the user explicitly requests initial research.
+
+### Phase 1: AC & Restrictions Assessment (BLOCKING)
+
+**Role**: Professional software architect
+
+A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
+
+**Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
+
+**Task**:
+1. Read all problem context files thoroughly
+2. **ASK the user about every unclear aspect** — do not assume:
+   - Unclear problem boundaries → ask
+   - Ambiguous acceptance criteria values → ask
+   - Missing context (no `security_approach.md`, no `input_data/`) → ask what they have
+   - Conflicting restrictions → ask which takes priority
+3. Research in internet **extensively** — use multiple search queries per question, rephrase, and search from different angles:
+   - How realistic are the acceptance criteria for this specific domain? Search for industry benchmarks, standards, and typical values
+   - How critical is each criterion? Search for case studies where criteria were relaxed or tightened
+   - What domain-specific acceptance criteria are we missing? Search for industry standards, regulatory requirements, and best practices in the specific domain
+   - Impact of each criterion value on the whole system quality — search for research papers and engineering reports
+   - Cost/budget implications of each criterion — search for pricing, total cost of ownership analyses, and comparable project budgets
+   - Timeline implications — search for project timelines, development velocity reports, and comparable implementations
+   - What do practitioners in this domain consider the most important criteria? Search forums, conference talks, and experience reports
+4. Research restrictions from multiple perspectives:
+   - Are the restrictions realistic? Search for comparable projects that operated under similar constraints
+   - Should any be tightened or relaxed? Search for what constraints similar projects actually ended up with
+   - Are there additional restrictions we should add? Search for regulatory, compliance, and safety requirements in this domain
+   - What restrictions do practitioners wish they had defined earlier? Search for post-mortem reports and lessons learned
+5. Verify findings with authoritative sources (official docs, papers, benchmarks) — each key finding must have at least 2 independent sources
+
+**Uses Steps 0-3 of the 8-step engine** (question classification, decomposition, source tiering, fact extraction) scoped to AC and restrictions assessment.
+
+**Save action**: Write `RESEARCH_DIR/00_ac_assessment.md` with format:
+
+```markdown
+# Acceptance Criteria Assessment
+
+## Acceptance Criteria
+
+| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
+|-----------|-----------|-------------------|---------------------|--------|
+| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
+
+## Restrictions Assessment
+
+| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
+|-------------|-----------|-------------------|---------------------|--------|
+| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
+
+## Key Findings
+[Summary of critical findings]
+
+## Sources
+[Key references used]
+```
+
+**BLOCKING**: Present the AC assessment tables to the user. Wait for confirmation or adjustments before proceeding to Phase 2. The user may update `acceptance_criteria.md` or `restrictions.md` based on findings.
+
+---
+
+### Phase 2: Problem Research & Solution Draft
+
+**Role**: Professional researcher and software architect
+
+Full 8-step research methodology. Produces the first solution draft.
+
+**Input**: All files from INPUT_DIR (possibly updated after Phase 1) + Phase 1 artifacts
+
+**Task** (drives the 8-step engine):
+1. Research existing/competitor solutions for similar problems — search broadly across industries and adjacent domains, not just the obvious competitors
+2. Research the problem thoroughly — all possible ways to solve it, split into components; search for how different fields approach analogous problems
+3. For each component, research all possible solutions and find the most efficient state-of-the-art approaches — use multiple query variants and perspectives from Step 1
+4. For each promising approach, search for real-world deployment experience: success stories, failure reports, lessons learned, and practitioner opinions
+5. Search for contrarian viewpoints — who argues against the common approaches and why? What failure modes exist?
+6. Verify that suggested tools/libraries actually exist and work as described — check official repos, latest releases, and community health (stars, recent commits, open issues)
+7. Include security considerations in each component analysis
+8. Provide rough cost estimates for proposed solutions
+
+Be concise in formulating. The fewer words, the better, but do not miss any important details.
+
+**Save action**: Write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
+
+---
+
+### Phase 3: Tech Stack Consolidation (OPTIONAL)
+
+**Role**: Software architect evaluating technology choices
+
+Focused synthesis step — no new 8-step cycle. Uses research already gathered in Phase 2 to make concrete technology decisions.
+
+**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + all files from INPUT_DIR
+
+**Task**:
+1. Extract technology options from the solution draft's component comparison tables
+2. Score each option against: fitness for purpose, maturity, security track record, team expertise, cost, scalability
+3. Produce a tech stack summary with selection rationale
+4. Assess risks and learning requirements per technology choice
+
+**Save action**: Write `OUTPUT_DIR/tech_stack.md` with:
+- Requirements analysis (functional, non-functional, constraints)
+- Technology evaluation tables (language, framework, database, infrastructure, key libraries) with scores
+- Tech stack summary block
+- Risk assessment and learning requirements tables
+
+---
+
+### Phase 4: Security Deep Dive (OPTIONAL)
+
+**Role**: Security architect
+
+Focused analysis step — deepens the security column from the solution draft into a proper threat model and controls specification.
+
+**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + `security_approach.md` from INPUT_DIR + problem context
+
+**Task**:
+1. Build threat model: asset inventory, threat actors, attack vectors
+2. Define security requirements and proposed controls per component (with risk level)
+3. Summarize authentication/authorization, data protection, secure communication, and logging/monitoring approach
+
+**Save action**: Write `OUTPUT_DIR/security_analysis.md` with:
+- Threat model (assets, actors, vectors)
+- Per-component security requirements and controls table
+- Security controls summary
@@ -0,0 +1,27 @@
+## Mode B: Solution Assessment
+
+Triggered when `solution_draft*.md` files exist in OUTPUT_DIR.
+
+**Role**: Professional software architect
+
+Full 8-step research methodology applied to assessing and improving an existing solution draft.
+
+**Input**: All files from INPUT_DIR + the latest (highest-numbered) `solution_draft##.md` from OUTPUT_DIR
+
+**Task** (drives the 8-step engine):
+1. Read the existing solution draft thoroughly
+2. Research in internet extensively — for each component/decision in the draft, search for:
+   - Known problems and limitations of the chosen approach
+   - What practitioners say about using it in production
+   - Better alternatives that may have emerged recently
+   - Common failure modes and edge cases
+   - How competitors/similar projects solve the same problem differently
+3. Search specifically for contrarian views: "why not [chosen approach]", "[chosen approach] criticism", "[chosen approach] failure"
+4. Identify security weak points and vulnerabilities — search for CVEs, security advisories, and known attack vectors for each technology in the draft
+5. Identify performance bottlenecks — search for benchmarks, load test results, and scalability reports
+6. For each identified weak point, search for multiple solution approaches and compare them
+7. Based on findings, form a new solution draft in the same format
+
+**Save action**: Write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
+
+**Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions in `steps/01_mode-a-initial-research.md`.
@@ -0,0 +1,227 @@
+## Research Engine — Investigation Phase (Steps 0–3.5)
+
+### Step 0: Question Type Classification
+
+First, classify the research question type and select the corresponding strategy:
+
+| Question Type | Core Task | Focus Dimensions |
+|---------------|-----------|------------------|
+| **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
+| **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
+| **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
+| **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
+| **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |
+
+**Mode-specific classification**:
+
+| Mode / Phase | Typical Question Type |
+|--------------|----------------------|
+| Mode A Phase 1 | Knowledge Organization + Decision Support |
+| Mode A Phase 2 | Decision Support |
+| Mode B | Problem Diagnosis + Decision Support |
+
+### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)
+
+Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.
+
+**For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`
+
+Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.
+
+**Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`
+
+---
+
+### Step 1: Question Decomposition & Boundary Definition
+
+**Mode-specific sub-questions**:
+
+**Mode A Phase 2** (Initial Research — Problem & Solution):
+- "What existing/competitor solutions address this problem?"
+- "What are the component parts of this problem?"
+- "For each component, what are the state-of-the-art solutions?"
+- "What are the security considerations per component?"
+- "What are the cost implications of each approach?"
+
+**Mode B** (Solution Assessment):
+- "What are the weak points and potential problems in the existing draft?"
+- "What are the security vulnerabilities in the proposed architecture?"
+- "Where are the performance bottlenecks?"
+- "What solutions exist for each identified issue?"
+
+**General sub-question patterns** (use when applicable):
+- **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
+- **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
+- **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
+- **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)
+
+#### Perspective Rotation (MANDATORY)
+
+For each research problem, examine it from **at least 3 different perspectives**. Each perspective generates its own sub-questions and search queries.
+
+| Perspective | What it asks | Example queries |
+|-------------|-------------|-----------------|
+| **End-user / Consumer** | What problems do real users encounter? What do they wish were different? | "X problems", "X frustrations reddit", "X user complaints" |
+| **Implementer / Engineer** | What are the technical challenges, gotchas, hidden complexities? | "X implementation challenges", "X pitfalls", "X lessons learned" |
+| **Business / Decision-maker** | What are the costs, ROI, strategic implications? | "X total cost of ownership", "X ROI case study", "X vs Y business comparison" |
+| **Contrarian / Devil's advocate** | What could go wrong? Why might this fail? What are critics saying? | "X criticism", "why not X", "X failures", "X disadvantages real world" |
+| **Domain expert / Academic** | What does peer-reviewed research say? What are theoretical limits? | "X research paper", "X systematic review", "X benchmarks academic" |
+| **Practitioner / Field** | What do people who actually use this daily say? What works in practice vs theory? | "X in production", "X experience report", "X after 1 year" |
+
+Select at least 3 perspectives relevant to the problem. Document the chosen perspectives in `00_question_decomposition.md`.
+
+#### Question Explosion (MANDATORY)
+
+For **each sub-question**, generate **at least 3-5 search query variants** before searching. This ensures broad coverage and avoids missing relevant information due to terminology differences.
+
+**Query variant strategies**:
+- **Specificity ladder**: broad ("indoor navigation systems") → narrow ("UWB-based indoor drone navigation accuracy")
+- **Negation/failure**: "X limitations", "X failure modes", "when X doesn't work"
+- **Comparison framing**: "X vs Y for Z", "X alternative for Z", "X or Y which is better for Z"
+- **Practitioner voice**: "X in production experience", "X real-world results", "X lessons learned"
+- **Temporal**: "X 2025", "X latest developments", "X roadmap"
+- **Geographic/domain**: "X in Europe", "X for defense applications", "X in agriculture"
+
+Record all planned queries in `00_question_decomposition.md` alongside each sub-question.
+
+**Research Subject Boundary Definition (BLOCKING - must be explicit)**:
+
+When decomposing questions, you must explicitly define the **boundaries of the research subject**:
+
+| Dimension | Boundary to define | Example |
+|-----------|--------------------|---------|
+| **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
+| **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
+| **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
+| **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
+
+**Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.
+
+**Save action**:
+1. Read all files from INPUT_DIR to ground the research in the project context
+2. Create working directory `RESEARCH_DIR/`
+3. Write `00_question_decomposition.md`, including:
+   - Original question
+   - Active mode (A Phase 2 or B) and rationale
+   - Summary of relevant problem context from INPUT_DIR
+   - Classified question type and rationale
+   - **Research subject boundary definition** (population, geography, timeframe, level)
+   - List of decomposed sub-questions
+   - **Chosen perspectives** (at least 3 from the Perspective Rotation table) with rationale
+   - **Search query variants** for each sub-question (at least 3-5 per sub-question)
+4. Write TodoWrite to track progress
+
+---
+
+### Step 2: Source Tiering & Exhaustive Web Investigation
+
+Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.
+
+**For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`
+
+**Tool Usage**:
+- Use `WebSearch` for broad searches; `WebFetch` to read specific pages
+- Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
+- Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
+- When citing web sources, include the URL and date accessed
+
+#### Exhaustive Search Requirements (MANDATORY)
+
+Do not stop at the first few results. The goal is to build a comprehensive evidence base.
+
+**Minimum search effort per sub-question**:
+- Execute **all** query variants generated in Step 1's Question Explosion (at least 3-5 per sub-question)
+- Consult at least **2 different source tiers** per sub-question (e.g., L1 official docs + L4 community discussion)
+- If initial searches yield fewer than 3 relevant sources for a sub-question, **broaden the search** with alternative terms, related domains, or analogous problems
+
+**Search broadening strategies** (use when results are thin):
+- Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
+- Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
+- Try different geographies: search in English + search for European/Asian approaches if relevant
+- Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
+- Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"
+
+**Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.
+
+**Save action**:
+For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
+
+---
+
+### Step 3: Fact Extraction & Evidence Cards
+
+Transform sources into **verifiable fact cards**:
+
+```markdown
+## Fact Cards
+
+### Fact 1
+- **Statement**: [specific fact description]
+- **Source**: [link/document section]
+- **Confidence**: High/Medium/Low
+
+### Fact 2
+...
+```
+
+**Key discipline**:
+- Pin down facts first, then reason
+- Distinguish "what officials said" from "what I infer"
+- When conflicting information is found, annotate and preserve both sides
+- Annotate confidence level:
+  - ✅ High: Explicitly stated in official documentation
+  - ⚠️ Medium: Mentioned in official blog but not formally documented
+  - ❓ Low: Inference or from unofficial sources
+
+**Save action**:
+For each extracted fact, **immediately** append to `02_fact_cards.md`:
+```markdown
+## Fact #[number]
+- **Statement**: [specific fact description]
+- **Source**: [Source #number] [link]
+- **Phase**: [Phase 1 / Phase 2 / Assessment]
+- **Target Audience**: [which group this fact applies to, inherited from source or further refined]
+- **Confidence**: ✅/⚠️/❓
+- **Related Dimension**: [corresponding comparison dimension]
+```
+
+**Target audience in fact statements**:
+- If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
+- Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
+- Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"
+
+---
+
+### Step 3.5: Iterative Deepening — Follow-Up Investigation
+
+After initial fact extraction, review what you have found and identify **knowledge gaps and new questions** that emerged from the initial research. This step ensures the research doesn't stop at surface-level findings.
+
+**Process**:
+
+1. **Gap analysis**: Review fact cards and identify:
+   - Sub-questions with fewer than 3 high-confidence facts → need more searching
+   - Contradictions between sources → need tie-breaking evidence
+   - Perspectives (from Step 1) that have no or weak coverage → need targeted search
+   - Claims that rely only on L3/L4 sources → need L1/L2 verification
+
+2. **Follow-up question generation**: Based on initial findings, generate new questions:
+   - "Source X claims [fact] — is this consistent with other evidence?"
+   - "If [approach A] has [limitation], how do practitioners work around it?"
+   - "What are the second-order effects of [finding]?"
+   - "Who disagrees with [common finding] and why?"
+   - "What happened when [solution] was deployed at scale?"
+
+3. **Targeted deep-dive searches**: Execute follow-up searches focusing on:
+   - Specific claims that need verification
+   - Alternative viewpoints not yet represented
+   - Real-world case studies and experience reports
+   - Failure cases and edge conditions
+   - Recent developments that may change the picture
+
+4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md`
+
+**Exit criteria**: Proceed to Step 4 when:
+- Every sub-question has at least 3 facts with at least one from L1/L2
+- At least 3 perspectives from Step 1 have supporting evidence
+- No unresolved contradictions remain (or they are explicitly documented as open questions)
+- Follow-up searches are no longer producing new substantive information
@@ -0,0 +1,146 @@
+## Research Engine — Analysis Phase (Steps 4–8)
+
+### Step 4: Build Comparison/Analysis Framework
+
+Based on the question type, select fixed analysis dimensions. **For dimension lists** (General, Concept Comparison, Decision Support): Read `references/comparison-frameworks.md`
+
+**Save action**:
+Write to `03_comparison_framework.md`:
+```markdown
+# Comparison Framework
+
+## Selected Framework Type
+[Concept Comparison / Decision Support / ...]
+
+## Selected Dimensions
+1. [Dimension 1]
+2. [Dimension 2]
+...
+
+## Initial Population
+| Dimension | X | Y | Factual Basis |
+|-----------|---|---|---------------|
+| [Dimension 1] | [description] | [description] | Fact #1, #3 |
+| ... | | | |
+```
+
+---
+
+### Step 5: Reference Point Baseline Alignment
+
+Ensure all compared parties have clear, consistent definitions:
+
+**Checklist**:
+- [ ] Is the reference point's definition stable/widely accepted?
+- [ ] Does it need verification, or can domain common knowledge be used?
+- [ ] Does the reader's understanding of the reference point match mine?
+- [ ] Are there ambiguities that need to be clarified first?
+
+---
+
+### Step 6: Fact-to-Conclusion Reasoning Chain
+
+Explicitly write out the "fact → comparison → conclusion" reasoning process:
+
+```markdown
+## Reasoning Process
+
+### Regarding [Dimension Name]
+
+1. **Fact confirmation**: According to [source], X's mechanism is...
+2. **Compare with reference**: While Y's mechanism is...
+3. **Conclusion**: Therefore, the difference between X and Y on this dimension is...
+```
+
+**Key discipline**:
+- Conclusions come from mechanism comparison, not "gut feelings"
+- Every conclusion must be traceable to specific facts
+- Uncertain conclusions must be annotated
+
+**Save action**:
+Write to `04_reasoning_chain.md`:
+```markdown
+# Reasoning Chain
+
+## Dimension 1: [Dimension Name]
+
+### Fact Confirmation
+According to [Fact #X], X's mechanism is...
+
+### Reference Comparison
+While Y's mechanism is... (Source: [Fact #Y])
+
+### Conclusion
+Therefore, the difference between X and Y on this dimension is...
+
+### Confidence
+✅/⚠️/❓ + rationale
+
+---
+## Dimension 2: [Dimension Name]
+...
+```
+
+---
+
+### Step 7: Use-Case Validation (Sanity Check)
+
+Validate conclusions against a typical scenario:
+
+**Validation questions**:
+- Based on my conclusions, how should this scenario be handled?
+- Is that actually the case?
+- Are there counterexamples that need to be addressed?
+
+**Review checklist**:
+- [ ] Are draft conclusions consistent with Step 3 fact cards?
+- [ ] Are there any important dimensions missed?
+- [ ] Is there any over-extrapolation?
+- [ ] Are conclusions actionable/verifiable?
+
+**Save action**:
+Write to `05_validation_log.md`:
+```markdown
+# Validation Log
+
+## Validation Scenario
+[Scenario description]
+
+## Expected Based on Conclusions
+If using X: [expected behavior]
+If using Y: [expected behavior]
+
+## Actual Validation Results
+[actual situation]
+
+## Counterexamples
+[yes/no, describe if yes]
+
+## Review Checklist
+- [x] Draft conclusions consistent with fact cards
+- [x] No important dimensions missed
+- [x] No over-extrapolation
+- [ ] Issue found: [if any]
+
+## Conclusions Requiring Revision
+[if any]
+```
+
+---
+
+### Step 8: Deliverable Formatting
+
+Make the output **readable, traceable, and actionable**.
+
+**Save action**:
+Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` using the appropriate output template based on active mode:
+- Mode A: `templates/solution_draft_mode_a.md`
+- Mode B: `templates/solution_draft_mode_b.md`
+
+Sources to integrate:
+- Extract background from `00_question_decomposition.md`
+- Reference key facts from `02_fact_cards.md`
+- Organize conclusions from `04_reasoning_chain.md`
+- Generate references from `01_source_registry.md`
+- Supplement with use cases from `05_validation_log.md`
+- For Mode A: include AC assessment from `00_ac_assessment.md`