Update .gitignore to include additional file types and directories for Python projects, enhancing environment management and build artifacts exclusion.

2026-06-22 10:21:08 +00:00 · 2026-03-20 21:28:16 +02:00
parent 9e5b0f2cc2
commit 7556f3b012
65 changed files with 9165 additions and 7 deletions
@@ -0,0 +1,460 @@
+---
+name: autopilot
+description: |
+  Auto-chaining orchestrator that drives the full BUILD-SHIP workflow from problem gathering through deployment.
+  Detects current project state from _docs/ folder, resumes from where it left off, and flows through
+  problem → research → plan → decompose → implement → deploy without manual skill invocation.
+  Maximizes work per conversation by auto-transitioning between skills.
+  Trigger phrases:
+  - "autopilot", "auto", "start", "continue"
+  - "what's next", "where am I", "project status"
+category: meta
+tags: [orchestrator, workflow, auto-chain, state-machine, meta-skill]
+disable-model-invocation: true
+---
+
+# Autopilot Orchestrator
+
+Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Detects project state from `_docs/`, resumes from where work stopped, and flows through skills automatically. The user invokes `/autopilot` once — the engine handles sequencing, transitions, and re-entry.
+
+## Core Principles
+
+- **Auto-chain**: when a skill completes, immediately start the next one — no pause between skills
+- **Only pause at decision points**: BLOCKING gates inside sub-skills are the natural pause points; do not add artificial stops between steps
+- **State from disk**: all progress is persisted to `_docs/_autopilot_state.md` and cross-checked against `_docs/` folder structure
+- **Rich re-entry**: on every invocation, read the state file for full context before continuing
+- **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here
+- **Sound on pause**: follow `.cursor/rules/human-input-sound.mdc` — play a notification sound before every pause that requires human input
+- **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically
+- **Jira MCP required**: steps that create Jira artifacts (Plan Step 6, Decompose) must have authenticated Jira MCP — never skip or substitute with local files
+
+## Jira MCP Authentication
+
+Several workflow steps create Jira artifacts (epics, tasks, links). The Jira MCP server must be authenticated **before** any step that writes to Jira.
+
+### Steps That Require Jira MCP
+
+| Step | Sub-Step | Jira Action |
+|------|----------|-------------|
+| 2 (Plan) | Step 6 — Jira Epics | Create epics for each component |
+| 3 (Decompose) | Step 1–3 — All tasks | Create Jira ticket per task, link to epic |
+
+### Authentication Gate
+
+Before entering **Step 2 (Plan)** or **Step 3 (Decompose)** for the first time, the autopilot must:
+
+1. Call `mcp_auth` on the Jira MCP server
+2. If authentication succeeds → proceed normally
+3. If the user **skips** authentication → **STOP**. Present using Choose format:
+
+```
+══════════════════════════════════════
+ BLOCKER: Jira MCP authentication required
+══════════════════════════════════════
+ A) Authenticate now (retry mcp_auth)
+ B) Pause autopilot — resume after configuring Jira MCP
+══════════════════════════════════════
+ Note: Jira integration is mandatory. Plan and Decompose
+ steps create epics and tasks that drive implementation.
+ Local-only workarounds are not acceptable.
+══════════════════════════════════════
+```
+
+Do NOT offer a "skip Jira" or "save locally" option. The workflow depends on Jira IDs for task referencing, dependency tracking, and implementation batching.
+
+### Re-Authentication
+
+If Jira MCP was already authenticated in a previous invocation (verify by listing available Jira tools beyond `mcp_auth`), skip the auth gate.
+
+## User Interaction Protocol
+
+Every time the autopilot or a sub-skill needs a user decision, use the **Choose A / B / C / D** format. This applies to:
+
+- State transitions where multiple valid next actions exist
+- Sub-skill BLOCKING gates that require user judgment
+- Any fork where the autopilot cannot confidently pick the right path
+- Trade-off decisions (tech choices, scope, risk acceptance)
+
+### When to Ask (MUST ask)
+
+- The next action is ambiguous (e.g., "another research round or proceed?")
+- The decision has irreversible consequences (e.g., architecture choices, skipping a step)
+- The user's intent or preference cannot be inferred from existing artifacts
+- A sub-skill's BLOCKING gate explicitly requires user confirmation
+- Multiple valid approaches exist with meaningfully different trade-offs
+
+### When NOT to Ask (auto-transition)
+
+- Only one logical next step exists (e.g., Problem complete → Research is the only option)
+- The transition is deterministic from the state (e.g., Plan complete → Decompose)
+- The decision is low-risk and reversible
+- Existing artifacts or prior decisions already imply the answer
+
+### Choice Format
+
+Always present decisions in this format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: [brief context]
+══════════════════════════════════════
+ A) [Option A — short description]
+ B) [Option B — short description]
+ C) [Option C — short description, if applicable]
+ D) [Option D — short description, if applicable]
+══════════════════════════════════════
+ Recommendation: [A/B/C/D] — [one-line reason]
+══════════════════════════════════════
+```
+
+Rules:
+1. Always provide 2–4 concrete options (never open-ended questions)
+2. Always include a recommendation with a brief justification
+3. Keep option descriptions to one line each
+4. If only 2 options make sense, use A/B only — do not pad with filler options
+5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice
+6. Record every user decision in the state file's `Key Decisions` section
+7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
+
+## State File: `_docs/_autopilot_state.md`
+
+The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
+
+### Format
+
+```markdown
+# Autopilot State
+
+## Current Step
+step: [0-5 or "done"]
+name: [Problem / Research / Plan / Decompose / Implement / Deploy / Done]
+status: [not_started / in_progress / completed]
+sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
+
+## Step ↔ SubStep Reference
+| Step | Name       | Sub-Skill              | Internal SubSteps                        |
+|------|------------|------------------------|------------------------------------------|
+| 0    | Problem    | problem/SKILL.md       | Phase 1–4                                |
+| 1    | Research   | research/SKILL.md      | Mode A: Phase 1–4 · Mode B: Step 0–8    |
+| 2    | Plan       | plan/SKILL.md          | Step 1–6                                 |
+| 3    | Decompose  | decompose/SKILL.md     | Step 1–4                                 |
+| 4    | Implement  | implement/SKILL.md     | (batch-driven, no fixed sub-steps)       |
+| 5    | Deploy     | deploy/SKILL.md        | Step 1–7                                 |
+
+When updating `Current Step`, always write it as:
+  step: N          ← autopilot step (0–5)
+  sub_step: M      ← sub-skill's own internal step/phase number + name
+Example:
+  step: 2
+  name: Plan
+  status: in_progress
+  sub_step: 4 — Architecture Review & Risk Assessment
+
+## Completed Steps
+
+| Step | Name | Completed | Key Outcome |
+|------|------|-----------|-------------|
+| 0 | Problem | [date] | [one-line summary] |
+| 1 | Research | [date] | [N drafts, final approach summary] |
+| 2 | Plan | [date] | [N components, architecture summary] |
+| 3 | Decompose | [date] | [N tasks, total complexity points] |
+| 4 | Implement | [date] | [N batches, pass/fail summary] |
+| 5 | Deploy | [date] | [artifacts produced] |
+
+## Key Decisions
+- [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
+- [decision 2: e.g. "6 research rounds, final draft: solution_draft06.md"]
+- [decision N]
+
+## Last Session
+date: [date]
+ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
+reason: [completed step / session boundary / user paused / context limit]
+notes: [any context for next session, e.g. "User asked to revisit risk assessment"]
+
+## Blockers
+- [blocker 1, if any]
+- [none]
+```
+
+### State File Rules
+
+1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 0)
+2. **Update** the state file after every step completion, every session boundary, and every BLOCKING gate confirmation
+3. **Read** the state file as the first action on every invocation — before folder scanning
+4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 2 but `_docs/02_plans/architecture.md` already exists), trust the folder structure and update the state file to match
+5. **Never delete** the state file. It accumulates history across the entire project lifecycle
+
+## Execution Entry Point
+
+Every invocation of this skill follows the same sequence:
+
+```
+1. Read _docs/_autopilot_state.md (if exists)
+2. Cross-check state file against _docs/ folder structure
+3. Resolve current step (state file + folder scan)
+4. Present Status Summary (from state file context)
+5. Enter Execution Loop:
+   a. Read and execute the current skill's SKILL.md
+   b. When skill completes → update state file
+   c. Re-detect next step
+   d. If next skill is ready → auto-chain (go to 5a with next skill)
+   e. If session boundary reached → update state file with session notes → suggest new conversation
+   f. If all steps done → update state file → report completion
+```
+
+## State Detection
+
+Read `_docs/_autopilot_state.md` first. If it exists and is consistent with the folder structure, use the `Current Step` from the state file. If the state file doesn't exist or is inconsistent, fall back to folder scanning.
+
+### Folder Scan Rules (fallback)
+
+Scan `_docs/` to determine the current workflow position. Check rules in order — first match wins.
+
+### Detection Rules
+
+**Step 0 — Problem Gathering**
+Condition: `_docs/00_problem/` does not exist, OR any of these are missing/empty:
+- `problem.md`
+- `restrictions.md`
+- `acceptance_criteria.md`
+- `input_data/` (must contain at least one file)
+
+Action: Read and execute `.cursor/skills/problem/SKILL.md`
+
+---
+
+**Step 1 — Research (Initial)**
+Condition: `_docs/00_problem/` is complete AND `_docs/01_solution/` has no `solution_draft*.md` files
+
+Action: Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode A)
+
+---
+
+**Step 1b — Research Decision**
+Condition: `_docs/01_solution/` contains `solution_draft*.md` files AND `_docs/01_solution/solution.md` does not exist AND `_docs/02_plans/architecture.md` does not exist
+
+Action: Present the current research state to the user:
+- How many solution drafts exist
+- Whether tech_stack.md and security_analysis.md exist
+- One-line summary from the latest draft
+
+Then present using the **Choose format**:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Research complete — next action?
+══════════════════════════════════════
+ A) Run another research round (Mode B assessment)
+ B) Proceed to planning with current draft
+══════════════════════════════════════
+ Recommendation: [A or B] — [reason based on draft quality]
+══════════════════════════════════════
+```
+
+- If user picks A → Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode B)
+- If user picks B → auto-chain to Step 2 (Plan)
+
+---
+
+**Step 2 — Plan**
+Condition: `_docs/01_solution/` has `solution_draft*.md` files AND `_docs/02_plans/architecture.md` does not exist
+
+Action:
+1. The plan skill's Prereq 2 will rename the latest draft to `solution.md` — this is handled by the plan skill itself
+2. Read and execute `.cursor/skills/plan/SKILL.md`
+
+If `_docs/02_plans/` exists but is incomplete (has some artifacts but no `FINAL_report.md`), the plan skill's built-in resumability handles it.
+
+---
+
+**Step 3 — Decompose**
+Condition: `_docs/02_plans/` contains `architecture.md` AND `_docs/02_plans/components/` has at least one component AND `_docs/02_tasks/` does not exist or has no task files (excluding `_dependencies_table.md`)
+
+Action: Read and execute `.cursor/skills/decompose/SKILL.md`
+
+If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
+
+---
+
+**Step 4 — Implement**
+Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
+
+Action: Read and execute `.cursor/skills/implement/SKILL.md`
+
+If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
+
+---
+
+**Step 5 — Deploy**
+Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND `_docs/04_deploy/` does not exist or is incomplete
+
+Action: Read and execute `.cursor/skills/deploy/SKILL.md`
+
+---
+
+**Done**
+Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md)
+
+Action: Report project completion with summary.
+
+## Status Summary
+
+On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback).
+
+Format:
+
+```
+═══════════════════════════════════════════════════
+ AUTOPILOT STATUS
+═══════════════════════════════════════════════════
+ Step 0  Problem      [DONE / IN PROGRESS / NOT STARTED]
+ Step 1  Research     [DONE (N drafts) / IN PROGRESS / NOT STARTED]
+ Step 2  Plan         [DONE / IN PROGRESS / NOT STARTED]
+ Step 3  Decompose    [DONE (N tasks) / IN PROGRESS / NOT STARTED]
+ Step 4  Implement    [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
+ Step 5  Deploy       [DONE / IN PROGRESS / NOT STARTED]
+═══════════════════════════════════════════════════
+ Current: Step N — Name
+ SubStep: M — [sub-skill internal step name]
+ Action:  [what will happen next]
+═══════════════════════════════════════════════════
+```
+
+For re-entry (state file exists), also include:
+- Key decisions from the state file's `Key Decisions` section
+- Last session context from the `Last Session` section
+- Any blockers from the `Blockers` section
+
+## Auto-Chain Rules
+
+After a skill completes, apply these rules:
+
+| Completed Step | Next Action |
+|---------------|-------------|
+| Problem Gathering | Auto-chain → Research (Mode A) |
+| Research (any round) | Auto-chain → Research Decision (ask user: another round or proceed?) |
+| Research Decision → proceed | Auto-chain → Plan |
+| Plan | Auto-chain → Decompose |
+| Decompose | **Session boundary** — suggest new conversation before Implement |
+| Implement | Auto-chain → Deploy |
+| Deploy | Report completion |
+
+### Session Boundary: Decompose → Implement
+
+After decompose completes, **do not auto-chain to implement**. Instead:
+
+1. Update state file: mark Decompose as completed, set current step to 4 (Implement) with status `not_started`
+2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
+3. Present a summary: number of tasks, estimated batches, total complexity points
+4. Use Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Decompose complete — start implementation?
+══════════════════════════════════════
+ A) Start a new conversation for implementation (recommended for context freshness)
+ B) Continue implementation in this conversation
+══════════════════════════════════════
+ Recommendation: A — implementation is the longest phase, fresh context helps
+══════════════════════════════════════
+```
+
+This is the only hard session boundary. All other transitions auto-chain.
+
+## Skill Delegation
+
+For each step, the delegation pattern is:
+
+1. Update state file: set `step` to the autopilot step number (0–5), status to `in_progress`, set `sub_step` to the sub-skill's current internal step/phase number and name
+2. Announce: "Starting [Skill Name]..."
+3. Read the skill file: `.cursor/skills/[name]/SKILL.md`
+4. Execute the skill's workflow exactly as written, including:
+   - All BLOCKING gates (present to user, wait for confirmation)
+   - All self-verification checklists
+   - All save actions
+   - All escalation rules
+   - Update `sub_step` in the state file each time the sub-skill advances to a new internal step/phase
+5. When the skill's workflow is fully complete:
+   - Update state file: mark step as `completed`, record date, write one-line key outcome
+   - Add any key decisions made during this step to the `Key Decisions` section
+   - Return to the auto-chain rules
+
+Do NOT modify, skip, or abbreviate any part of the sub-skill's workflow. The autopilot is a sequencer, not an optimizer.
+
+## Re-Entry Protocol
+
+When the user invokes `/autopilot` and work already exists:
+
+1. Read `_docs/_autopilot_state.md`
+2. Cross-check against `_docs/` folder structure
+3. Present Status Summary with context from state file (key decisions, last session, blockers)
+4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
+5. Continue execution from detected state
+
+## Error Handling
+
+All error situations that require user input MUST use the **Choose A / B / C / D** format.
+
+| Situation | Action |
+|-----------|--------|
+| State detection is ambiguous (artifacts suggest two different steps) | Present findings and use Choose format with the candidate steps as options |
+| Sub-skill fails or hits an unrecoverable blocker | Use Choose format: A) retry, B) skip with warning, C) abort and fix manually |
+| User wants to skip a step | Use Choose format: A) skip (with dependency warning), B) execute the step |
+| User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
+| User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |
+
+## Trigger Conditions
+
+This skill activates when the user wants to:
+- Start a new project from scratch
+- Continue an in-progress project
+- Check project status
+- Let the AI guide them through the full workflow
+
+**Keywords**: "autopilot", "auto", "start", "continue", "what's next", "where am I", "project status"
+
+**Differentiation**:
+- User wants only research → use `/research` directly
+- User wants only planning → use `/plan` directly
+- User wants the full guided workflow → use `/autopilot`
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│              Autopilot (Auto-Chain Orchestrator)                │
+├────────────────────────────────────────────────────────────────┤
+│ EVERY INVOCATION:                                              │
+│   1. State Detection (scan _docs/)                             │
+│   2. Status Summary (show progress)                            │
+│   3. Execute current skill                                     │
+│   4. Auto-chain to next skill (loop)                           │
+│                                                                │
+│ WORKFLOW:                                                       │
+│   Step 0  Problem    → .cursor/skills/problem/SKILL.md         │
+│     ↓ auto-chain                                               │
+│   Step 1  Research   → .cursor/skills/research/SKILL.md        │
+│     ↓ auto-chain (ask: another round?)                         │
+│   Step 2  Plan       → .cursor/skills/plan/SKILL.md            │
+│     ↓ auto-chain                                               │
+│   Step 3  Decompose  → .cursor/skills/decompose/SKILL.md       │
+│     ↓ SESSION BOUNDARY (suggest new conversation)              │
+│   Step 4  Implement  → .cursor/skills/implement/SKILL.md       │
+│     ↓ auto-chain                                               │
+│   Step 5  Deploy     → .cursor/skills/deploy/SKILL.md          │
+│     ↓                                                          │
+│   DONE                                                         │
+│                                                                │
+│ STATE FILE: _docs/_autopilot_state.md                          │
+│ FALLBACK: _docs/ folder structure scan                         │
+│ PAUSE POINTS: sub-skill BLOCKING gates only                    │
+│ SESSION BREAK: after Decompose (before Implement)              │
+│ USER INPUT: Choose A/B/C/D format at genuine decisions only    │
+│ AUTO-TRANSITION: when path is unambiguous, don't ask            │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Auto-chain · State to file · Rich re-entry         │
+│             Delegate don't duplicate · Pause at decisions only  │
+│             Minimize interruptions · Choose format for decisions │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,154 @@
+---
+name: code-review
+description: |
+  Multi-phase code review against task specs with structured findings output.
+  6-phase workflow: context loading, spec compliance, code quality, security quick-scan, performance scan, cross-task consistency.
+  Produces a structured report with severity-ranked findings and a PASS/FAIL/PASS_WITH_WARNINGS verdict.
+  Invoked by /implement skill after each batch, or manually.
+  Trigger phrases:
+  - "code review", "review code", "review implementation"
+  - "check code quality", "review against specs"
+category: review
+tags: [code-review, quality, security-scan, performance, SOLID]
+disable-model-invocation: true
+---
+
+# Code Review
+
+Multi-phase code review that verifies implementation against task specs, checks code quality, and produces structured findings.
+
+## Core Principles
+
+- **Understand intent first**: read the task specs before reviewing code — know what it should do before judging how
+- **Structured output**: every finding has severity, category, location, description, and suggestion
+- **Deduplicate**: same issue at the same location is reported once using `{file}:{line}:{title}` as key
+- **Severity-ranked**: findings sorted Critical > High > Medium > Low
+- **Verdict-driven**: clear PASS/FAIL/PASS_WITH_WARNINGS drives automation decisions
+
+## Input
+
+- List of task spec files that were just implemented (paths to `[JIRA-ID]_[short_name].md`)
+- Changed files (detected via `git diff` or provided by the `/implement` skill)
+- Project context: `_docs/00_problem/restrictions.md`, `_docs/01_solution/solution.md`
+
+## Phase 1: Context Loading
+
+Before reviewing code, build understanding of intent:
+
+1. Read each task spec — acceptance criteria, scope, constraints, dependencies
+2. Read project restrictions and solution overview
+3. Map which changed files correspond to which task specs
+4. Understand what the code is supposed to do before judging how it does it
+
+## Phase 2: Spec Compliance Review
+
+For each task, verify implementation satisfies every acceptance criterion:
+
+- Walk through each AC (Given/When/Then) and trace it in the code
+- Check that unit tests cover each AC
+- Check that integration tests exist where specified in the task spec
+- Flag any AC that is not demonstrably satisfied as a **Spec-Gap** finding (severity: High)
+- Flag any scope creep (implementation beyond what the spec asked for) as a **Scope** finding (severity: Low)
+
+## Phase 3: Code Quality Review
+
+Check implemented code against quality standards:
+
+- **SOLID principles** — single responsibility, open/closed, Liskov, interface segregation, dependency inversion
+- **Error handling** — consistent strategy, no bare catch/except, meaningful error messages
+- **Naming** — clear intent, follows project conventions
+- **Complexity** — functions longer than 50 lines or cyclomatic complexity > 10
+- **DRY** — duplicated logic across files
+- **Test quality** — tests assert meaningful behavior, not just "no error thrown"
+- **Dead code** — unused imports, unreachable branches
+
+## Phase 4: Security Quick-Scan
+
+Lightweight security checks (defer deep analysis to the `/security` skill):
+
+- SQL injection via string interpolation
+- Command injection (subprocess with shell=True, exec, eval)
+- Hardcoded secrets, API keys, passwords
+- Missing input validation on external inputs
+- Sensitive data in logs or error messages
+- Insecure deserialization
+
+## Phase 5: Performance Scan
+
+Check for common performance anti-patterns:
+
+- O(n^2) or worse algorithms where O(n) is possible
+- N+1 query patterns
+- Unbounded data fetching (missing pagination/limits)
+- Blocking I/O in async contexts
+- Unnecessary memory copies or allocations in hot paths
+
+## Phase 6: Cross-Task Consistency
+
+When multiple tasks were implemented in the same batch:
+
+- Interfaces between tasks are compatible (method signatures, DTOs match)
+- No conflicting patterns (e.g., one task uses repository pattern, another does raw SQL)
+- Shared code is not duplicated across task implementations
+- Dependencies declared in task specs are properly wired
+
+## Output Format
+
+Produce a structured report with findings deduplicated and sorted by severity:
+
+```markdown
+# Code Review Report
+
+**Batch**: [task list]
+**Date**: [YYYY-MM-DD]
+**Verdict**: PASS | PASS_WITH_WARNINGS | FAIL
+
+## Findings
+
+| # | Severity | Category | File:Line | Title |
+|---|----------|----------|-----------|-------|
+| 1 | Critical | Security | src/api/auth.py:42 | SQL injection via f-string |
+| 2 | High | Spec-Gap | src/service/orders.py | AC-3 not satisfied |
+
+### Finding Details
+
+**F1: SQL injection via f-string** (Critical / Security)
+- Location: `src/api/auth.py:42`
+- Description: User input interpolated directly into SQL query
+- Suggestion: Use parameterized query via bind parameters
+- Task: 04_auth_service
+
+**F2: AC-3 not satisfied** (High / Spec-Gap)
+- Location: `src/service/orders.py`
+- Description: AC-3 requires order total recalculation on item removal, but no such logic exists
+- Suggestion: Add recalculation in remove_item() method
+- Task: 07_order_processing
+```
+
+## Severity Definitions
+
+| Severity | Meaning | Blocks? |
+|----------|---------|---------|
+| Critical | Security vulnerability, data loss, crash | Yes — verdict FAIL |
+| High | Spec gap, logic bug, broken test | Yes — verdict FAIL |
+| Medium | Performance issue, maintainability concern, missing validation | No — verdict PASS_WITH_WARNINGS |
+| Low | Style, minor improvement, scope creep | No — verdict PASS_WITH_WARNINGS |
+
+## Category Values
+
+Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope
+
+## Verdict Logic
+
+- **FAIL**: any Critical or High finding exists
+- **PASS_WITH_WARNINGS**: only Medium or Low findings
+- **PASS**: no findings
+
+## Integration with /implement
+
+The `/implement` skill invokes this skill after each batch completes:
+
+1. Collects changed files from all implementer agents in the batch
+2. Passes task spec paths + changed files to this skill
+3. If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms
+4. If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info)
@@ -0,0 +1,295 @@
+---
+name: decompose
+description: |
+  Decompose planned components into atomic implementable tasks with bootstrap structure plan.
+  4-step workflow: bootstrap structure plan, component task decomposition, integration test task decomposition, and cross-task verification.
+  Supports full decomposition (_docs/ structure) and single component mode.
+  Trigger phrases:
+  - "decompose", "decompose features", "feature decomposition"
+  - "task decomposition", "break down components"
+  - "prepare for implementation"
+category: build
+tags: [decomposition, tasks, dependencies, jira, implementation-prep]
+disable-model-invocation: true
+---
+
+# Task Decomposition
+
+Decompose planned components into atomic, implementable task specs with a bootstrap structure plan through a systematic workflow. All tasks are named with their Jira ticket ID prefix in a flat directory.
+
+## Core Principles
+
+- **Atomic tasks**: each task does one thing; if it exceeds 5 complexity points, split it
+- **Behavioral specs, not implementation plans**: describe what the system should do, not how to build it
+- **Flat structure**: all tasks are Jira-ID-prefixed files in TASKS_DIR — no component subdirectories
+- **Save immediately**: write artifacts to disk after each task; never accumulate unsaved work
+- **Jira inline**: create Jira ticket immediately after writing each task file
+- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
+- **Plan, don't code**: this workflow produces documents and Jira tasks, never implementation code
+
+## Context Resolution
+
+Determine the operating mode based on invocation before any other logic runs.
+
+**Default** (no explicit input file provided):
+- PLANS_DIR: `_docs/02_plans/`
+- TASKS_DIR: `_docs/02_tasks/`
+- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, PLANS_DIR
+- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (integration tests) + Step 4 (cross-verification)
+
+**Single component mode** (provided file is within `_docs/02_plans/` and inside a `components/` subdirectory):
+- PLANS_DIR: `_docs/02_plans/`
+- TASKS_DIR: `_docs/02_tasks/`
+- Derive component number and component name from the file path
+- Ask user for the parent Epic ID
+- Runs Step 2 (that component only, appending to existing task numbering)
+
+Announce the detected mode and resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+**Default:**
+
+| File | Purpose |
+|------|---------|
+| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
+| `_docs/01_solution/solution.md` | Finalized solution |
+| `PLANS_DIR/architecture.md` | Architecture from plan skill |
+| `PLANS_DIR/system-flows.md` | System flows from plan skill |
+| `PLANS_DIR/components/[##]_[name]/description.md` | Component specs from plan skill |
+| `PLANS_DIR/integration_tests/` | Integration test specs from plan skill |
+
+**Single component mode:**
+
+| File | Purpose |
+|------|---------|
+| The provided component `description.md` | Component spec to decompose |
+| Corresponding `tests.md` in the same directory (if available) | Test specs for context |
+
+### Prerequisite Checks (BLOCKING)
+
+**Default:**
+1. PLANS_DIR contains `architecture.md` and `components/` — **STOP if missing**
+2. Create TASKS_DIR if it does not exist
+3. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?**
+
+**Single component mode:**
+1. The provided component file exists and is non-empty — **STOP if missing**
+
+## Artifact Management
+
+### Directory Structure
+
+```
+TASKS_DIR/
+├── [JIRA-ID]_initial_structure.md
+├── [JIRA-ID]_[short_name].md
+├── [JIRA-ID]_[short_name].md
+├── ...
+└── _dependencies_table.md
+```
+
+**Naming convention**: Each task file is initially saved with a temporary numeric prefix (`[##]_[short_name].md`). After creating the Jira ticket, rename the file to use the Jira ticket ID as prefix (`[JIRA-ID]_[short_name].md`). For example: `01_initial_structure.md` → `AZ-42_initial_structure.md`.
+
+### Save Timing
+
+| Step | Save immediately after | Filename |
+|------|------------------------|----------|
+| Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` |
+| Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
+| Step 3 | Each integration test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
+| Step 4 | Cross-task verification complete | `_dependencies_table.md` |
+
+### Resumability
+
+If TASKS_DIR already contains task files:
+
+1. List existing `*_*.md` files (excluding `_dependencies_table.md`) and count them
+2. Resume numbering from the next number (for temporary numeric prefix before Jira rename)
+3. Inform the user which tasks already exist and are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all applicable steps. Update status as each step/component completes.
+
+## Workflow
+
+### Step 1: Bootstrap Structure Plan (default mode only)
+
+**Role**: Professional software architect
+**Goal**: Produce `01_initial_structure.md` — the first task describing the project skeleton
+**Constraints**: This is a plan document, not code. The `/implement` skill executes it.
+
+1. Read architecture.md, all component specs, system-flows.md, data_model.md, and `deployment/` from PLANS_DIR
+2. Read problem, solution, and restrictions from `_docs/00_problem/` and `_docs/01_solution/`
+3. Research best implementation patterns for the identified tech stack
+4. Document the structure plan using `templates/initial-structure-task.md`
+
+The bootstrap structure plan must include:
+- Project folder layout with all component directories
+- Shared models, interfaces, and DTOs
+- Dockerfile per component (multi-stage, non-root, health checks, pinned base images)
+- `docker-compose.yml` for local development (all components + database + dependencies)
+- `docker-compose.test.yml` for integration test environment (black-box test runner)
+- `.dockerignore`
+- CI/CD pipeline file (`.github/workflows/ci.yml` or `azure-pipelines.yml`) with stages from `deployment/ci_cd_pipeline.md`
+- Database migration setup and initial seed data scripts
+- Observability configuration: structured logging setup, health check endpoints (`/health/live`, `/health/ready`), metrics endpoint (`/metrics`)
+- Environment variable documentation (`.env.example`)
+- Test structure with unit and integration test locations
+
+**Self-verification**:
+- [ ] All components have corresponding folders in the layout
+- [ ] All inter-component interfaces have DTOs defined
+- [ ] Dockerfile defined for each component
+- [ ] `docker-compose.yml` covers all components and dependencies
+- [ ] `docker-compose.test.yml` enables black-box integration testing
+- [ ] CI/CD pipeline file defined with lint, test, security, build, deploy stages
+- [ ] Database migration setup included
+- [ ] Health check endpoints specified for each service
+- [ ] Structured logging configuration included
+- [ ] `.env.example` with all required environment variables
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Test structure includes unit and integration test locations
+
+**Save action**: Write `01_initial_structure.md` (temporary numeric name)
+
+**Jira action**: Create a Jira ticket for this task under the "Bootstrap & Initial Structure" epic. Write the Jira ticket ID and Epic ID back into the task header.
+
+**Rename action**: Rename the file from `01_initial_structure.md` to `[JIRA-ID]_initial_structure.md` (e.g., `AZ-42_initial_structure.md`). Update the **Task** field inside the file to match the new filename.
+
+**BLOCKING**: Present structure plan summary to user. Do NOT proceed until user confirms.
+
+---
+
+### Step 2: Task Decomposition (all modes)
+
+**Role**: Professional software architect
+**Goal**: Decompose each component into atomic, implementable task specs — numbered sequentially starting from 02
+**Constraints**: Behavioral specs only — describe what, not how. No implementation code.
+
+**Numbering**: Tasks are numbered sequentially across all components in dependency order. Start from 02 (01 is initial_structure). In single component mode, start from the next available number in TASKS_DIR.
+
+**Component ordering**: Process components in dependency order — foundational components first (shared models, database), then components that depend on them.
+
+For each component (or the single provided component):
+
+1. Read the component's `description.md` and `tests.md` (if available)
+2. Decompose into atomic tasks; create only 1 task if the component is simple or atomic
+3. Split into multiple tasks only when it is necessary and would be easier to implement
+4. Do not create tasks for other components — only tasks for the current component
+5. Each task should be atomic, containing 0 APIs or a list of semantically connected APIs
+6. Write each task spec using `templates/task.md`
+7. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
+8. Note task dependencies (referencing Jira IDs of already-created dependency tasks, e.g., `AZ-42_initial_structure`)
+9. **Immediately after writing each task file**: create a Jira ticket, link it to the component's epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
+
+**Self-verification** (per component):
+- [ ] Every task is atomic (single concern)
+- [ ] No task exceeds 5 complexity points
+- [ ] Task dependencies reference correct Jira IDs
+- [ ] Tasks cover all interfaces defined in the component spec
+- [ ] No tasks duplicate work from other components
+- [ ] Every task has a Jira ticket linked to the correct epic
+
+**Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename the file to `[JIRA-ID]_[short_name].md`. Update the **Task** field inside the file to match the new filename. Update **Dependencies** references in the file to use Jira IDs of the dependency tasks.
+
+---
+
+### Step 3: Integration Test Task Decomposition (default mode only)
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Decompose integration test specs into atomic, implementable task specs
+**Constraints**: Behavioral specs only — describe what, not how. No test code.
+
+**Numbering**: Continue sequential numbering from where Step 2 left off.
+
+1. Read all test specs from `PLANS_DIR/integration_tests/` (functional_tests.md, non_functional_tests.md)
+2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
+3. Each task should reference the specific test scenarios it implements and the environment/test_data specs
+4. Dependencies: integration test tasks depend on the component implementation tasks they exercise
+5. Write each task spec using `templates/task.md`
+6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
+7. Note task dependencies (referencing Jira IDs of already-created dependency tasks)
+8. **Immediately after writing each task file**: create a Jira ticket under the "Integration Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
+
+**Self-verification**:
+- [ ] Every functional test scenario from `integration_tests/functional_tests.md` is covered by a task
+- [ ] Every non-functional test scenario from `integration_tests/non_functional_tests.md` is covered by a task
+- [ ] No task exceeds 5 complexity points
+- [ ] Dependencies correctly reference the component tasks being tested
+- [ ] Every task has a Jira ticket linked to the "Integration Tests" epic
+
+**Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`.
+
+---
+
+### Step 4: Cross-Task Verification (default mode only)
+
+**Role**: Professional software architect and analyst
+**Goal**: Verify task consistency and produce `_dependencies_table.md`
+**Constraints**: Review step — fix gaps found, do not add new tasks
+
+1. Verify task dependencies across all tasks are consistent
+2. Check no gaps: every interface in architecture.md has tasks covering it
+3. Check no overlaps: tasks don't duplicate work across components
+4. Check no circular dependencies in the task graph
+5. Produce `_dependencies_table.md` using `templates/dependencies-table.md`
+
+**Self-verification**:
+- [ ] Every architecture interface is covered by at least one task
+- [ ] No circular dependencies in the task graph
+- [ ] Cross-component dependencies are explicitly noted in affected task specs
+- [ ] `_dependencies_table.md` contains every task with correct dependencies
+
+**Save action**: Write `_dependencies_table.md`
+
+**BLOCKING**: Present dependency summary to user. Do NOT proceed until user confirms.
+
+---
+
+## Common Mistakes
+
+- **Coding during decomposition**: this workflow produces specs, never code
+- **Over-splitting**: don't create many tasks if the component is simple — 1 task is fine
+- **Tasks exceeding 5 points**: split them; no task should be too complex for a single implementer
+- **Cross-component tasks**: each task belongs to exactly one component
+- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
+- **Creating git branches**: branch creation is an implementation concern, not a decomposition one
+- **Creating component subdirectories**: all tasks go flat in TASKS_DIR
+- **Forgetting Jira**: every task must have a Jira ticket created inline — do not defer to a separate step
+- **Forgetting to rename**: after Jira ticket creation, always rename the file from numeric prefix to Jira ID prefix
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Ambiguous component boundaries | ASK user |
+| Task complexity exceeds 5 points after splitting | ASK user |
+| Missing component specs in PLANS_DIR | ASK user |
+| Cross-component dependency conflict | ASK user |
+| Jira epic not found for a component | ASK user for Epic ID |
+| Task naming | PROCEED, confirm at next BLOCKING gate |
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│          Task Decomposition (4-Step Method)                     │
+├────────────────────────────────────────────────────────────────┤
+│ CONTEXT: Resolve mode (default / single component)             │
+│ 1. Bootstrap Structure  → [JIRA-ID]_initial_structure.md       │
+│    [BLOCKING: user confirms structure]                         │
+│ 2. Component Tasks      → [JIRA-ID]_[short_name].md each      │
+│ 3. Integration Tests    → [JIRA-ID]_[short_name].md each      │
+│ 4. Cross-Verification   → _dependencies_table.md              │
+│    [BLOCKING: user confirms dependencies]                      │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Atomic tasks · Behavioral specs · Flat structure   │
+│   Jira inline · Rename to Jira ID · Save now · Ask don't assume│
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,31 @@
+# Dependencies Table Template
+
+Use this template after cross-task verification. Save as `TASKS_DIR/_dependencies_table.md`.
+
+---
+
+```markdown
+# Dependencies Table
+
+**Date**: [YYYY-MM-DD]
+**Total Tasks**: [N]
+**Total Complexity Points**: [N]
+
+| Task | Name | Complexity | Dependencies | Epic |
+|------|------|-----------|-------------|------|
+| [JIRA-ID] | initial_structure | [points] | None | [EPIC-ID] |
+| [JIRA-ID] | [short_name] | [points] | [JIRA-ID] | [EPIC-ID] |
+| [JIRA-ID] | [short_name] | [points] | [JIRA-ID] | [EPIC-ID] |
+| [JIRA-ID] | [short_name] | [points] | [JIRA-ID], [JIRA-ID] | [EPIC-ID] |
+| ... | ... | ... | ... | ... |
+```
+
+---
+
+## Guidelines
+
+- Every task from TASKS_DIR must appear in this table
+- Dependencies column lists Jira IDs (e.g., "AZ-43, AZ-44") or "None"
+- No circular dependencies allowed
+- Tasks should be listed in recommended execution order
+- The `/implement` skill reads this table to compute parallel batches
@@ -0,0 +1,135 @@
+# Initial Structure Task Template
+
+Use this template for the bootstrap structure plan. Save as `TASKS_DIR/01_initial_structure.md` initially, then rename to `TASKS_DIR/[JIRA-ID]_initial_structure.md` after Jira ticket creation.
+
+---
+
+```markdown
+# Initial Project Structure
+
+**Task**: [JIRA-ID]_initial_structure
+**Name**: Initial Structure
+**Description**: Scaffold the project skeleton — folders, shared models, interfaces, stubs, CI/CD, DB migrations, test structure
+**Complexity**: [3|5] points
+**Dependencies**: None
+**Component**: Bootstrap
+**Jira**: [TASK-ID]
+**Epic**: [EPIC-ID]
+
+## Project Folder Layout
+
+```
+project-root/
+├── [folder structure based on tech stack and components]
+└── ...
+```
+
+### Layout Rationale
+
+[Brief explanation of why this structure was chosen — language conventions, framework patterns, etc.]
+
+## DTOs and Interfaces
+
+### Shared DTOs
+
+| DTO Name | Used By Components | Fields Summary |
+|----------|-------------------|---------------|
+| [name] | [component list] | [key fields] |
+
+### Component Interfaces
+
+| Component | Interface | Methods | Exposed To |
+|-----------|-----------|---------|-----------|
+| [name] | [InterfaceName] | [method list] | [consumers] |
+
+## CI/CD Pipeline
+
+| Stage | Purpose | Trigger |
+|-------|---------|---------|
+| Build | Compile/bundle the application | Every push |
+| Lint / Static Analysis | Code quality and style checks | Every push |
+| Unit Tests | Run unit test suite | Every push |
+| Integration Tests | Run integration test suite | Every push |
+| Security Scan | SAST / dependency check | Every push |
+| Deploy to Staging | Deploy to staging environment | Merge to staging branch |
+
+### Pipeline Configuration Notes
+
+[Framework-specific notes: CI tool, runners, caching, parallelism, etc.]
+
+## Environment Strategy
+
+| Environment | Purpose | Configuration Notes |
+|-------------|---------|-------------------|
+| Development | Local development | [local DB, mock services, debug flags] |
+| Staging | Pre-production testing | [staging DB, staging services, production-like config] |
+| Production | Live system | [production DB, real services, optimized config] |
+
+### Environment Variables
+
+| Variable | Dev | Staging | Production | Description |
+|----------|-----|---------|------------|-------------|
+| [VAR_NAME] | [value/source] | [value/source] | [value/source] | [purpose] |
+
+## Database Migration Approach
+
+**Migration tool**: [tool name]
+**Strategy**: [migration strategy — e.g., versioned scripts, ORM migrations]
+
+### Initial Schema
+
+[Key tables/collections that need to be created, referencing component data access patterns]
+
+## Test Structure
+
+```
+tests/
+├── unit/
+│   ├── [component_1]/
+│   ├── [component_2]/
+│   └── ...
+├── integration/
+│   ├── test_data/
+│   └── [test files]
+└── ...
+```
+
+### Test Configuration Notes
+
+[Test runner, fixtures, test data management, isolation strategy]
+
+## Implementation Order
+
+| Order | Component | Reason |
+|-------|-----------|--------|
+| 1 | [name] | [why first — foundational, no dependencies] |
+| 2 | [name] | [depends on #1] |
+| ... | ... | ... |
+
+## Acceptance Criteria
+
+**AC-1: Project scaffolded**
+Given the structure plan above
+When the implementer executes this task
+Then all folders, stubs, and configuration files exist
+
+**AC-2: Tests runnable**
+Given the scaffolded project
+When the test suite is executed
+Then all stub tests pass (even if they only assert true)
+
+**AC-3: CI/CD configured**
+Given the scaffolded project
+When CI pipeline runs
+Then build, lint, and test stages complete successfully
+```
+
+---
+
+## Guidance Notes
+
+- This is a PLAN document, not code. The `/implement` skill executes it.
+- Focus on structure and organization decisions, not implementation details.
+- Reference component specs for interface and DTO details — don't repeat everything.
+- The folder layout should follow conventions of the identified tech stack.
+- Environment strategy should account for secrets management and configuration.
@@ -0,0 +1,113 @@
+# Task Specification Template
+
+Create a focused behavioral specification that describes **what** the system should do, not **how** it should be built.
+Save as `TASKS_DIR/[##]_[short_name].md` initially, then rename to `TASKS_DIR/[JIRA-ID]_[short_name].md` after Jira ticket creation.
+
+---
+
+```markdown
+# [Feature Name]
+
+**Task**: [JIRA-ID]_[short_name]
+**Name**: [short human name]
+**Description**: [one-line description of what this task delivers]
+**Complexity**: [1|2|3|5] points
+**Dependencies**: [AZ-43_shared_models, AZ-44_db_migrations] or "None"
+**Component**: [component name for context]
+**Jira**: [TASK-ID]
+**Epic**: [EPIC-ID]
+
+## Problem
+
+Clear, concise statement of the problem users are facing.
+
+## Outcome
+
+- Measurable or observable goal 1
+- Measurable or observable goal 2
+- ...
+
+## Scope
+
+### Included
+- What's in scope for this task
+
+### Excluded
+- Explicitly what's NOT in scope
+
+## Acceptance Criteria
+
+**AC-1: [Title]**
+Given [precondition]
+When [action]
+Then [expected result]
+
+**AC-2: [Title]**
+Given [precondition]
+When [action]
+Then [expected result]
+
+## Non-Functional Requirements
+
+**Performance**
+- [requirement if relevant]
+
+**Compatibility**
+- [requirement if relevant]
+
+**Reliability**
+- [requirement if relevant]
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | [test subject] | [expected result] |
+
+## Integration Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | [setup] | [test subject] | [expected behavior] | [NFR if any] |
+
+## Constraints
+
+- [Architectural pattern constraint if critical]
+- [Technical limitation]
+- [Integration requirement]
+
+## Risks & Mitigation
+
+**Risk 1: [Title]**
+- *Risk*: [Description]
+- *Mitigation*: [Approach]
+```
+
+---
+
+## Complexity Points Guide
+
+- 1 point: Trivial, self-contained, no dependencies
+- 2 points: Non-trivial, low complexity, minimal coordination
+- 3 points: Multi-step, moderate complexity, potential alignment needed
+- 5 points: Difficult, interconnected logic, medium-high risk
+- 8 points: Too complex — split into smaller tasks
+
+## Output Guidelines
+
+**DO:**
+- Focus on behavior and user experience
+- Use clear, simple language
+- Keep acceptance criteria testable (Gherkin format)
+- Include realistic scope boundaries
+- Write from the user's perspective
+- Include complexity estimation
+- Reference dependencies by Jira ID (e.g., AZ-43_shared_models)
+
+**DON'T:**
+- Include implementation details (file paths, classes, methods)
+- Prescribe technical solutions or libraries
+- Add architectural diagrams or code examples
+- Specify exact API endpoints or data structures
+- Include step-by-step implementation instructions
+- Add "how to build" guidance
@@ -0,0 +1,491 @@
+---
+name: deploy
+description: |
+  Comprehensive deployment skill covering status check, env setup, containerization, CI/CD pipeline, environment strategy, observability, deployment procedures, and deployment scripts.
+  7-step workflow: Status & env check, Docker containerization, CI/CD pipeline definition, environment strategy, observability planning, deployment procedures, deployment scripts.
+  Uses _docs/04_deploy/ structure.
+  Trigger phrases:
+  - "deploy", "deployment", "deployment strategy"
+  - "CI/CD", "pipeline", "containerize"
+  - "observability", "monitoring", "logging"
+  - "dockerize", "docker compose"
+category: ship
+tags: [deployment, docker, ci-cd, observability, monitoring, containerization, scripts]
+disable-model-invocation: true
+---
+
+# Deployment Planning
+
+Plan and document the full deployment lifecycle: check deployment status and environment requirements, containerize the application, define CI/CD pipelines, configure environments, set up observability, document deployment procedures, and generate deployment scripts.
+
+## Core Principles
+
+- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker
+- **Infrastructure as code**: all deployment configuration is version-controlled
+- **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
+- **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
+- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
+- **Ask, don't assume**: when infrastructure constraints or preferences are unclear, ask the user
+- **Plan, don't code**: this workflow produces deployment documents and specifications, not implementation code (except deployment scripts in Step 7)
+
+## Context Resolution
+
+Fixed paths:
+
+- PLANS_DIR: `_docs/02_plans/`
+- DEPLOY_DIR: `_docs/04_deploy/`
+- REPORTS_DIR: `_docs/04_deploy/reports/`
+- SCRIPTS_DIR: `scripts/`
+- ARCHITECTURE: `_docs/02_plans/architecture.md`
+- COMPONENTS_DIR: `_docs/02_plans/components/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+| File | Purpose |
+|------|---------|
+| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations |
+| `_docs/01_solution/solution.md` | Finalized solution |
+| `PLANS_DIR/architecture.md` | Architecture from plan skill |
+| `PLANS_DIR/components/` | Component specs |
+
+### Prerequisite Checks (BLOCKING)
+
+1. `architecture.md` exists — **STOP if missing**, run `/plan` first
+2. At least one component spec exists in `PLANS_DIR/components/` — **STOP if missing**
+3. Create DEPLOY_DIR, REPORTS_DIR, and SCRIPTS_DIR if they do not exist
+4. If DEPLOY_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
+
+## Artifact Management
+
+### Directory Structure
+
+```
+DEPLOY_DIR/
+├── containerization.md
+├── ci_cd_pipeline.md
+├── environment_strategy.md
+├── observability.md
+├── deployment_procedures.md
+├── deploy_scripts.md
+└── reports/
+    └── deploy_status_report.md
+
+SCRIPTS_DIR/           (project root)
+├── deploy.sh
+├── pull-images.sh
+├── start-services.sh
+├── stop-services.sh
+└── health-check.sh
+
+.env                   (project root, git-ignored)
+.env.example           (project root, committed)
+```
+
+### Save Timing
+
+| Step | Save immediately after | Filename |
+|------|------------------------|----------|
+| Step 1 | Status check & env setup complete | `reports/deploy_status_report.md` + `.env` + `.env.example` |
+| Step 2 | Containerization plan complete | `containerization.md` |
+| Step 3 | CI/CD pipeline defined | `ci_cd_pipeline.md` |
+| Step 4 | Environment strategy documented | `environment_strategy.md` |
+| Step 5 | Observability plan complete | `observability.md` |
+| Step 6 | Deployment procedures documented | `deployment_procedures.md` |
+| Step 7 | Deployment scripts created | `deploy_scripts.md` + scripts in `SCRIPTS_DIR/` |
+
+### Resumability
+
+If DEPLOY_DIR already contains artifacts:
+
+1. List existing files and match to the save timing table
+2. Identify the last completed step
+3. Resume from the next incomplete step
+4. Inform the user which steps are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all steps (1 through 7). Update status as each step completes.
+
+## Workflow
+
+### Step 1: Deployment Status & Environment Setup
+
+**Role**: DevOps / Platform engineer
+**Goal**: Assess current deployment readiness, identify all required environment variables, and create `.env` files
+**Constraints**: Must complete before any other step
+
+1. Read architecture.md, all component specs, and restrictions.md
+2. Assess deployment readiness:
+   - List all components and their current state (planned / implemented / tested)
+   - Identify external dependencies (databases, APIs, message queues, cloud services)
+   - Identify infrastructure prerequisites (container registry, cloud accounts, DNS, SSL certificates)
+   - Check if any deployment blockers exist
+3. Identify all required environment variables by scanning:
+   - Component specs for configuration needs
+   - Database connection requirements
+   - External API endpoints and credentials
+   - Feature flags and runtime configuration
+   - Container registry credentials
+   - Cloud provider credentials
+   - Monitoring/logging service endpoints
+4. Generate `.env.example` in project root with all variables and placeholder values (committed to VCS)
+5. Generate `.env` in project root with development defaults filled in where safe (git-ignored)
+6. Ensure `.gitignore` includes `.env` (but NOT `.env.example`)
+7. Produce a deployment status report summarizing readiness, blockers, and required setup
+
+**Self-verification**:
+- [ ] All components assessed for deployment readiness
+- [ ] External dependencies catalogued
+- [ ] Infrastructure prerequisites identified
+- [ ] All required environment variables discovered
+- [ ] `.env.example` created with placeholder values
+- [ ] `.env` created with safe development defaults
+- [ ] `.gitignore` updated to exclude `.env`
+- [ ] Status report written to `reports/deploy_status_report.md`
+
+**Save action**: Write `reports/deploy_status_report.md` using `templates/deploy_status_report.md`, create `.env` and `.env.example` in project root
+
+**BLOCKING**: Present status report and environment variables to user. Do NOT proceed until confirmed.
+
+---
+
+### Step 2: Containerization
+
+**Role**: DevOps / Platform engineer
+**Goal**: Define Docker configuration for every component, local development, and integration test environments
+**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
+
+1. Read architecture.md and all component specs
+2. Read restrictions.md for infrastructure constraints
+3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization)
+4. For each component, define:
+   - Base image (pinned version, prefer alpine/distroless for production)
+   - Build stages (dependency install, build, production)
+   - Non-root user configuration
+   - Health check endpoint and command
+   - Exposed ports
+   - `.dockerignore` contents
+5. Define `docker-compose.yml` for local development:
+   - All application components
+   - Database (Postgres) with named volume
+   - Any message queues, caches, or external service mocks
+   - Shared network
+   - Environment variable files (`.env`)
+6. Define `docker-compose.test.yml` for integration tests:
+   - Application components under test
+   - Test runner container (black-box, no internal imports)
+   - Isolated database with seed data
+   - All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
+7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only
+
+**Self-verification**:
+- [ ] Every component has a Dockerfile specification
+- [ ] Multi-stage builds specified for all production images
+- [ ] Non-root user for all containers
+- [ ] Health checks defined for every service
+- [ ] docker-compose.yml covers all components + dependencies
+- [ ] docker-compose.test.yml enables black-box integration testing
+- [ ] `.dockerignore` defined
+
+**Save action**: Write `containerization.md` using `templates/containerization.md`
+
+**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed.
+
+---
+
+### Step 3: CI/CD Pipeline
+
+**Role**: DevOps engineer
+**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment
+**Constraints**: Pipeline definition only — produce YAML specification, not implementation
+
+1. Read architecture.md for tech stack and deployment targets
+2. Read restrictions.md for CI/CD constraints (cloud provider, registry, etc.)
+3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines)
+4. Define pipeline stages:
+
+| Stage | Trigger | Steps | Quality Gate |
+|-------|---------|-------|-------------|
+| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
+| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage |
+| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
+| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
+| **Push** | After build | Push to container registry | Push succeeds |
+| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass |
+| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass |
+| **Deploy Production** | Manual approval | Deploy to production | Health checks pass |
+
+5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches
+6. Define parallelization: which stages can run concurrently
+7. Define notifications: build failures, deployment status, security alerts
+
+**Self-verification**:
+- [ ] All pipeline stages defined with triggers and gates
+- [ ] Coverage threshold enforced (75%+)
+- [ ] Security scanning included (dependencies + images + SAST)
+- [ ] Caching configured for dependencies and Docker layers
+- [ ] Multi-environment deployment (staging → production)
+- [ ] Rollback procedure referenced
+- [ ] Notifications configured
+
+**Save action**: Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md`
+
+---
+
+### Step 4: Environment Strategy
+
+**Role**: Platform engineer
+**Goal**: Define environment configuration, secrets management, and environment parity
+**Constraints**: Strategy document — no secrets or credentials in output
+
+1. Define environments:
+
+| Environment | Purpose | Infrastructure | Data |
+|-------------|---------|---------------|------|
+| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs |
+| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data |
+| **Production** | Live system | Full infrastructure | Real data |
+
+2. Define environment variable management:
+   - Reference `.env.example` created in Step 1
+   - Per-environment variable sources (`.env` for dev, secret manager for staging/prod)
+   - Validation: fail fast on missing required variables at startup
+3. Define secrets management:
+   - Never commit secrets to version control
+   - Development: `.env` files (git-ignored)
+   - Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault)
+   - Rotation policy
+4. Define database management per environment:
+   - Development: Docker Postgres with named volume, seed data
+   - Staging: managed Postgres, migrations applied via CI/CD
+   - Production: managed Postgres, migrations require approval
+
+**Self-verification**:
+- [ ] All three environments defined with clear purpose
+- [ ] Environment variable documentation complete (references `.env.example` from Step 1)
+- [ ] No secrets in any output document
+- [ ] Secret manager specified for staging/production
+- [ ] Database strategy per environment
+
+**Save action**: Write `environment_strategy.md` using `templates/environment_strategy.md`
+
+---
+
+### Step 5: Observability
+
+**Role**: Site Reliability Engineer (SRE)
+**Goal**: Define logging, metrics, tracing, and alerting strategy
+**Constraints**: Strategy document — describe what to implement, not how to wire it
+
+1. Read architecture.md and component specs for service boundaries
+2. Research observability best practices for the tech stack
+
+**Logging**:
+- Structured JSON to stdout/stderr (no file logging in containers)
+- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context`
+- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
+- No PII in logs
+- Retention: dev = console, staging = 7 days, production = 30 days
+
+**Metrics**:
+- Expose Prometheus-compatible `/metrics` endpoint per service
+- System metrics: CPU, memory, disk, network
+- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections`
+- Business metrics: derived from acceptance criteria
+- Collection interval: 15s
+
+**Distributed Tracing**:
+- OpenTelemetry SDK integration
+- Trace context propagation via HTTP headers and message queue metadata
+- Span naming: `<service>.<operation>`
+- Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
+
+**Alerting**:
+
+| Severity | Response Time | Condition Examples |
+|----------|---------------|-------------------|
+| Critical | 5 min | Service down, data loss, health check failed |
+| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
+| Medium | 4 hours | Disk > 80%, elevated latency |
+| Low | Next business day | Non-critical warnings |
+
+**Dashboards**:
+- Operations: service health, request rate, error rate, response time percentiles, resource utilization
+- Business: key business metrics from acceptance criteria
+
+**Self-verification**:
+- [ ] Structured logging format defined with required fields
+- [ ] Metrics endpoint specified per service
+- [ ] OpenTelemetry tracing configured
+- [ ] Alert severities with response times defined
+- [ ] Dashboards cover operations and business metrics
+- [ ] PII exclusion from logs addressed
+
+**Save action**: Write `observability.md` using `templates/observability.md`
+
+---
+
+### Step 6: Deployment Procedures
+
+**Role**: DevOps / Platform engineer
+**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist
+**Constraints**: Procedures document — no implementation
+
+1. Define deployment strategy:
+   - Preferred pattern: blue-green / rolling / canary (choose based on architecture)
+   - Zero-downtime requirement for production
+   - Graceful shutdown: 30-second grace period for in-flight requests
+   - Database migration ordering: migrate before deploy, backward-compatible only
+
+2. Define health checks:
+
+| Check | Type | Endpoint | Interval | Threshold |
+|-------|------|----------|----------|-----------|
+| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart |
+| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB |
+| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max |
+
+3. Define rollback procedures:
+   - Trigger criteria: health check failures, error rate spike, critical alert
+   - Rollback steps: redeploy previous image tag, verify health, rollback database if needed
+   - Communication: notify stakeholders during rollback
+   - Post-mortem: required after every production rollback
+
+4. Define deployment checklist:
+   - [ ] All tests pass in CI
+   - [ ] Security scan clean (zero critical/high CVEs)
+   - [ ] Database migrations reviewed and tested
+   - [ ] Environment variables configured
+   - [ ] Health check endpoints responding
+   - [ ] Monitoring alerts configured
+   - [ ] Rollback plan documented and tested
+   - [ ] Stakeholders notified
+
+**Self-verification**:
+- [ ] Deployment strategy chosen and justified
+- [ ] Zero-downtime approach specified
+- [ ] Health checks defined (liveness, readiness, startup)
+- [ ] Rollback trigger criteria and steps documented
+- [ ] Deployment checklist complete
+
+**Save action**: Write `deployment_procedures.md` using `templates/deployment_procedures.md`
+
+**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed.
+
+---
+
+### Step 7: Deployment Scripts
+
+**Role**: DevOps / Platform engineer
+**Goal**: Create executable deployment scripts for pulling Docker images and running services on the remote target machine
+**Constraints**: Produce real, executable shell scripts. This is the ONLY step that creates implementation artifacts.
+
+1. Read containerization.md and deployment_procedures.md from previous steps
+2. Read `.env.example` for required variables
+3. Create the following scripts in `SCRIPTS_DIR/`:
+
+**`deploy.sh`** — Main deployment orchestrator:
+   - Validates that required environment variables are set (sources `.env` if present)
+   - Calls `pull-images.sh`, then `stop-services.sh`, then `start-services.sh`, then `health-check.sh`
+   - Exits with non-zero code on any failure
+   - Supports `--rollback` flag to redeploy previous image tags
+
+**`pull-images.sh`** — Pull Docker images to target machine:
+   - Reads image list and tags from environment or config
+   - Authenticates with container registry
+   - Pulls all required images
+   - Verifies image integrity (digest check)
+
+**`start-services.sh`** — Start services on target machine:
+   - Runs `docker compose up -d` or individual `docker run` commands
+   - Applies environment variables from `.env`
+   - Configures networks and volumes
+   - Waits for containers to reach healthy state
+
+**`stop-services.sh`** — Graceful shutdown:
+   - Stops services with graceful shutdown period
+   - Saves current image tags for rollback reference
+   - Cleans up orphaned containers/networks
+
+**`health-check.sh`** — Verify deployment health:
+   - Checks all health endpoints
+   - Reports status per service
+   - Returns non-zero if any service is unhealthy
+
+4. All scripts must:
+   - Be POSIX-compatible (#!/bin/bash with set -euo pipefail)
+   - Source `.env` from project root or accept env vars from the environment
+   - Include usage/help output (`--help` flag)
+   - Be idempotent where possible
+   - Handle SSH connection to remote target (configurable via `DEPLOY_HOST` env var)
+
+5. Document all scripts in `deploy_scripts.md`
+
+**Self-verification**:
+- [ ] All five scripts created and executable
+- [ ] Scripts source environment variables correctly
+- [ ] `deploy.sh` orchestrates the full flow
+- [ ] `pull-images.sh` handles registry auth and image pull
+- [ ] `start-services.sh` starts containers with correct config
+- [ ] `stop-services.sh` handles graceful shutdown
+- [ ] `health-check.sh` validates all endpoints
+- [ ] Rollback supported via `deploy.sh --rollback`
+- [ ] Scripts work for remote deployment via SSH (DEPLOY_HOST)
+- [ ] `deploy_scripts.md` documents all scripts
+
+**Save action**: Write scripts to `SCRIPTS_DIR/`, write `deploy_scripts.md` using `templates/deploy_scripts.md`
+
+---
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Unknown cloud provider or hosting | **ASK user** |
+| Container registry not specified | **ASK user** |
+| CI/CD platform preference unclear | **ASK user** — default to GitHub Actions |
+| Secret manager not chosen | **ASK user** |
+| Deployment pattern trade-offs | **ASK user** with recommendation |
+| Missing architecture.md | **STOP** — run `/plan` first |
+| Remote target machine details unknown | **ASK user** for SSH access, OS, and specs |
+
+## Common Mistakes
+
+- **Implementing during planning**: Steps 1–6 produce documents, not code (Step 7 is the exception — it creates scripts)
+- **Hardcoding secrets**: never include real credentials in deployment documents or scripts
+- **Ignoring integration test containerization**: the test environment must be containerized alongside the app
+- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
+- **Using `:latest` tags**: always pin base image versions
+- **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
+- **Committing `.env`**: only `.env.example` goes to version control; `.env` must be in `.gitignore`
+- **Non-portable scripts**: deployment scripts must work across environments; avoid hardcoded paths
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│            Deployment Planning (7-Step Method)                  │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: architecture.md + component specs exist                │
+│                                                                │
+│ 1. Status & Env     → reports/deploy_status_report.md          │
+│                       + .env + .env.example                    │
+│    [BLOCKING: user confirms status & env vars]                 │
+│ 2. Containerization  → containerization.md                     │
+│    [BLOCKING: user confirms Docker plan]                       │
+│ 3. CI/CD Pipeline    → ci_cd_pipeline.md                       │
+│ 4. Environment       → environment_strategy.md                 │
+│ 5. Observability     → observability.md                        │
+│ 6. Procedures        → deployment_procedures.md                │
+│    [BLOCKING: user confirms deployment plan]                   │
+│ 7. Scripts           → deploy_scripts.md + scripts/            │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Docker-first · IaC · Observability built-in        │
+│             Environment parity · Save immediately              │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,87 @@
+# CI/CD Pipeline Template
+
+Save as `_docs/04_deploy/ci_cd_pipeline.md`.
+
+---
+
+```markdown
+# [System Name] — CI/CD Pipeline
+
+## Pipeline Overview
+
+| Stage | Trigger | Quality Gate |
+|-------|---------|-------------|
+| Lint | Every push | Zero lint errors |
+| Test | Every push | 75%+ coverage, all tests pass |
+| Security | Every push | Zero critical/high CVEs |
+| Build | PR merge to dev | Docker build succeeds |
+| Push | After build | Images pushed to registry |
+| Deploy Staging | After push | Health checks pass |
+| Smoke Tests | After staging deploy | Critical paths pass |
+| Deploy Production | Manual approval | Health checks pass |
+
+## Stage Details
+
+### Lint
+- [Language-specific linters and formatters]
+- Runs in parallel per language
+
+### Test
+- Unit tests: [framework and command]
+- Integration tests: [framework and command, uses docker-compose.test.yml]
+- Coverage threshold: 75% overall, 90% critical paths
+- Coverage report published as pipeline artifact
+
+### Security
+- Dependency audit: [tool, e.g., npm audit / pip-audit / dotnet list package --vulnerable]
+- SAST scan: [tool, e.g., Semgrep / SonarQube]
+- Image scan: Trivy on built Docker images
+- Block on: critical or high severity findings
+
+### Build
+- Docker images built using multi-stage Dockerfiles
+- Tagged with git SHA: `<registry>/<component>:<sha>`
+- Build cache: Docker layer cache via CI cache action
+
+### Push
+- Registry: [container registry URL]
+- Authentication: [method]
+
+### Deploy Staging
+- Deployment method: [docker compose / Kubernetes / cloud service]
+- Pre-deploy: run database migrations
+- Post-deploy: verify health check endpoints
+- Automated rollback on health check failure
+
+### Smoke Tests
+- Subset of integration tests targeting staging environment
+- Validates critical user flows
+- Timeout: [maximum duration]
+
+### Deploy Production
+- Requires manual approval via [mechanism]
+- Deployment strategy: [blue-green / rolling / canary]
+- Pre-deploy: database migration review
+- Post-deploy: health checks + monitoring for 15 min
+
+## Caching Strategy
+
+| Cache | Key | Restore Keys |
+|-------|-----|-------------|
+| Dependencies | [lockfile hash] | [partial match] |
+| Docker layers | [Dockerfile hash] | [partial match] |
+| Build artifacts | [source hash] | [partial match] |
+
+## Parallelization
+
+[Diagram or description of which stages run concurrently]
+
+## Notifications
+
+| Event | Channel | Recipients |
+|-------|---------|-----------|
+| Build failure | [Slack/email] | [team] |
+| Security alert | [Slack/email] | [team + security] |
+| Deploy success | [Slack] | [team] |
+| Deploy failure | [Slack/email + PagerDuty] | [on-call] |
+```
@@ -0,0 +1,94 @@
+# Containerization Plan Template
+
+Save as `_docs/04_deploy/containerization.md`.
+
+---
+
+```markdown
+# [System Name] — Containerization
+
+## Component Dockerfiles
+
+### [Component Name]
+
+| Property | Value |
+|----------|-------|
+| Base image | [e.g., mcr.microsoft.com/dotnet/aspnet:8.0-alpine] |
+| Build image | [e.g., mcr.microsoft.com/dotnet/sdk:8.0-alpine] |
+| Stages | [dependency install → build → production] |
+| User | [non-root user name] |
+| Health check | [endpoint and command] |
+| Exposed ports | [port list] |
+| Key build args | [if any] |
+
+### [Repeat for each component]
+
+## Docker Compose — Local Development
+
+```yaml
+# docker-compose.yml structure
+services:
+  [component]:
+    build: ./[path]
+    ports: ["host:container"]
+    environment: [reference .env.dev]
+    depends_on: [dependencies with health condition]
+    healthcheck: [command, interval, timeout, retries]
+
+  db:
+    image: [postgres:version-alpine]
+    volumes: [named volume]
+    environment: [credentials from .env.dev]
+    healthcheck: [pg_isready]
+
+volumes:
+  [named volumes]
+
+networks:
+  [shared network]
+```
+
+## Docker Compose — Integration Tests
+
+```yaml
+# docker-compose.test.yml structure
+services:
+  [app components under test]
+
+  test-runner:
+    build: ./tests/integration
+    depends_on: [app components with health condition]
+    environment: [test configuration]
+    # Exit code determines test pass/fail
+
+  db:
+    image: [postgres:version-alpine]
+    volumes: [seed data mount]
+```
+
+Run: `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
+
+## Image Tagging Strategy
+
+| Context | Tag Format | Example |
+|---------|-----------|---------|
+| CI build | `<registry>/<project>/<component>:<git-sha>` | `ghcr.io/org/api:a1b2c3d` |
+| Release | `<registry>/<project>/<component>:<semver>` | `ghcr.io/org/api:1.2.0` |
+| Local dev | `<component>:latest` | `api:latest` |
+
+## .dockerignore
+
+```
+.git
+.cursor
+_docs
+_standalone
+node_modules
+**/bin
+**/obj
+**/__pycache__
+*.md
+.env*
+docker-compose*.yml
+```
+```
@@ -0,0 +1,73 @@
+# Deployment Status Report Template
+
+Save as `_docs/04_deploy/reports/deploy_status_report.md`.
+
+---
+
+```markdown
+# [System Name] — Deployment Status Report
+
+## Deployment Readiness Summary
+
+| Aspect | Status | Notes |
+|--------|--------|-------|
+| Architecture defined | ✅ / ❌ | |
+| Component specs complete | ✅ / ❌ | |
+| Infrastructure prerequisites met | ✅ / ❌ | |
+| External dependencies identified | ✅ / ❌ | |
+| Blockers | [count] | [summary] |
+
+## Component Status
+
+| Component | State | Docker-ready | Notes |
+|-----------|-------|-------------|-------|
+| [Component 1] | planned / implemented / tested | yes / no | |
+| [Component 2] | planned / implemented / tested | yes / no | |
+
+## External Dependencies
+
+| Dependency | Type | Required For | Status |
+|------------|------|-------------|--------|
+| [e.g., PostgreSQL] | Database | Data persistence | [available / needs setup] |
+| [e.g., Redis] | Cache | Session management | [available / needs setup] |
+| [e.g., External API] | API | [purpose] | [available / needs setup] |
+
+## Infrastructure Prerequisites
+
+| Prerequisite | Status | Action Needed |
+|-------------|--------|--------------|
+| Container registry | [ready / not set up] | [action] |
+| Cloud account | [ready / not set up] | [action] |
+| DNS configuration | [ready / not set up] | [action] |
+| SSL certificates | [ready / not set up] | [action] |
+| CI/CD platform | [ready / not set up] | [action] |
+| Secret manager | [ready / not set up] | [action] |
+
+## Deployment Blockers
+
+| Blocker | Severity | Resolution |
+|---------|----------|-----------|
+| [blocker description] | critical / high / medium | [resolution steps] |
+
+## Required Environment Variables
+
+| Variable | Purpose | Required In | Default (Dev) | Source (Staging/Prod) |
+|----------|---------|------------|---------------|----------------------|
+| `DATABASE_URL` | Postgres connection string | All components | `postgres://dev:dev@db:5432/app` | Secret manager |
+| `DEPLOY_HOST` | Remote target machine | Deployment scripts | `localhost` | Environment |
+| `REGISTRY_URL` | Container registry URL | CI/CD, deploy scripts | `localhost:5000` | Environment |
+| `REGISTRY_USER` | Registry username | CI/CD, deploy scripts | — | Secret manager |
+| `REGISTRY_PASS` | Registry password | CI/CD, deploy scripts | — | Secret manager |
+| [add all required variables] | | | | |
+
+## .env Files Created
+
+- `.env.example` — committed to VCS, contains all variable names with placeholder values
+- `.env` — git-ignored, contains development defaults
+
+## Next Steps
+
+1. [Resolve any blockers listed above]
+2. [Set up missing infrastructure prerequisites]
+3. [Proceed to containerization planning]
+```
@@ -0,0 +1,103 @@
+# Deployment Procedures Template
+
+Save as `_docs/04_deploy/deployment_procedures.md`.
+
+---
+
+```markdown
+# [System Name] — Deployment Procedures
+
+## Deployment Strategy
+
+**Pattern**: [blue-green / rolling / canary]
+**Rationale**: [why this pattern fits the architecture]
+**Zero-downtime**: required for production deployments
+
+### Graceful Shutdown
+
+- Grace period: 30 seconds for in-flight requests
+- Sequence: stop accepting new requests → drain connections → shutdown
+- Container orchestrator: `terminationGracePeriodSeconds: 40`
+
+### Database Migration Ordering
+
+- Migrations run **before** new code deploys
+- All migrations must be backward-compatible (old code works with new schema)
+- Irreversible migrations require explicit approval
+
+## Health Checks
+
+| Check | Type | Endpoint | Interval | Failure Threshold | Action |
+|-------|------|----------|----------|-------------------|--------|
+| Liveness | HTTP GET | `/health/live` | 10s | 3 failures | Restart container |
+| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures | Remove from load balancer |
+| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts | Kill and recreate |
+
+### Health Check Responses
+
+- `/health/live`: returns 200 if process is running (no dependency checks)
+- `/health/ready`: returns 200 if all dependencies (DB, cache, queues) are reachable
+
+## Staging Deployment
+
+1. CI/CD builds and pushes Docker images tagged with git SHA
+2. Run database migrations against staging
+3. Deploy new images to staging environment
+4. Wait for health checks to pass (readiness probe)
+5. Run smoke tests against staging
+6. If smoke tests fail: automatic rollback to previous image
+
+## Production Deployment
+
+1. **Approval**: manual approval required via [mechanism]
+2. **Pre-deploy checks**:
+   - [ ] Staging smoke tests passed
+   - [ ] Security scan clean
+   - [ ] Database migration reviewed
+   - [ ] Monitoring alerts configured
+   - [ ] Rollback plan confirmed
+3. **Deploy**: apply deployment strategy (blue-green / rolling / canary)
+4. **Verify**: health checks pass, error rate stable, latency within baseline
+5. **Monitor**: observe dashboards for 15 minutes post-deploy
+6. **Finalize**: mark deployment as successful or trigger rollback
+
+## Rollback Procedures
+
+### Trigger Criteria
+
+- Health check failures persist after deploy
+- Error rate exceeds 5% for more than 5 minutes
+- Critical alert fires within 15 minutes of deploy
+- Manual decision by on-call engineer
+
+### Rollback Steps
+
+1. Redeploy previous Docker image tag (from CI/CD artifact)
+2. Verify health checks pass
+3. If database migration was applied:
+   - Run DOWN migration if reversible
+   - If irreversible: assess data impact, escalate if needed
+4. Notify stakeholders
+5. Schedule post-mortem within 24 hours
+
+### Post-Mortem
+
+Required after every production rollback:
+- Timeline of events
+- Root cause
+- What went wrong
+- Prevention measures
+
+## Deployment Checklist
+
+- [ ] All tests pass in CI
+- [ ] Security scan clean (zero critical/high CVEs)
+- [ ] Docker images built and pushed
+- [ ] Database migrations reviewed and tested
+- [ ] Environment variables configured for target environment
+- [ ] Health check endpoints verified
+- [ ] Monitoring alerts configured
+- [ ] Rollback plan documented and tested
+- [ ] Stakeholders notified of deployment window
+- [ ] On-call engineer available during deployment
+```
@@ -0,0 +1,61 @@
+# Environment Strategy Template
+
+Save as `_docs/04_deploy/environment_strategy.md`.
+
+---
+
+```markdown
+# [System Name] — Environment Strategy
+
+## Environments
+
+| Environment | Purpose | Infrastructure | Data Source |
+|-------------|---------|---------------|-------------|
+| Development | Local developer workflow | docker-compose | Seed data, mocked externals |
+| Staging | Pre-production validation | [mirrors production] | Anonymized production-like data |
+| Production | Live system | [full infrastructure] | Real data |
+
+## Environment Variables
+
+### Required Variables
+
+| Variable | Purpose | Dev Default | Staging/Prod Source |
+|----------|---------|-------------|-------------------|
+| `DATABASE_URL` | Postgres connection | `postgres://dev:dev@db:5432/app` | Secret manager |
+| [add all required variables] | | | |
+
+### `.env.example`
+
+```env
+# Copy to .env and fill in values
+DATABASE_URL=postgres://user:pass@host:5432/dbname
+# [all required variables with placeholder values]
+```
+
+### Variable Validation
+
+All services validate required environment variables at startup and fail fast with a clear error message if any are missing.
+
+## Secrets Management
+
+| Environment | Method | Tool |
+|-------------|--------|------|
+| Development | `.env` file (git-ignored) | dotenv |
+| Staging | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
+| Production | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
+
+Rotation policy: [frequency and procedure]
+
+## Database Management
+
+| Environment | Type | Migrations | Data |
+|-------------|------|-----------|------|
+| Development | Docker Postgres, named volume | Applied on container start | Seed data via init script |
+| Staging | Managed Postgres | Applied via CI/CD pipeline | Anonymized production snapshot |
+| Production | Managed Postgres | Applied via CI/CD with approval | Live data |
+
+Migration rules:
+- All migrations must be backward-compatible (support old and new code simultaneously)
+- Reversible migrations required (DOWN/rollback script)
+- Production migrations require review before apply
+```
@@ -0,0 +1,132 @@
+# Observability Template
+
+Save as `_docs/04_deploy/observability.md`.
+
+---
+
+```markdown
+# [System Name] — Observability
+
+## Logging
+
+### Format
+
+Structured JSON to stdout/stderr. No file-based logging in containers.
+
+```json
+{
+  "timestamp": "ISO8601",
+  "level": "INFO",
+  "service": "service-name",
+  "correlation_id": "uuid",
+  "message": "Event description",
+  "context": {}
+}
+```
+
+### Log Levels
+
+| Level | Usage | Example |
+|-------|-------|---------|
+| ERROR | Exceptions, failures requiring attention | Database connection failed |
+| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
+| INFO | Significant business events | User registered, Order placed |
+| DEBUG | Detailed diagnostics (dev/staging only) | Request payload, Query params |
+
+### Retention
+
+| Environment | Destination | Retention |
+|-------------|-------------|-----------|
+| Development | Console | Session |
+| Staging | [log aggregator] | 7 days |
+| Production | [log aggregator] | 30 days |
+
+### PII Rules
+
+- Never log passwords, tokens, or session IDs
+- Mask email addresses and personal identifiers
+- Log user IDs (opaque) instead of usernames
+
+## Metrics
+
+### Endpoints
+
+Every service exposes Prometheus-compatible metrics at `/metrics`.
+
+### Application Metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `request_count` | Counter | Total HTTP requests by method, path, status |
+| `request_duration_seconds` | Histogram | Response time by method, path |
+| `error_count` | Counter | Failed requests by type |
+| `active_connections` | Gauge | Current open connections |
+
+### System Metrics
+
+- CPU usage, Memory usage, Disk I/O, Network I/O
+
+### Business Metrics
+
+| Metric | Type | Description | Source |
+|--------|------|-------------|--------|
+| [from acceptance criteria] | | | |
+
+Collection interval: 15 seconds
+
+## Distributed Tracing
+
+### Configuration
+
+- SDK: OpenTelemetry
+- Propagation: W3C Trace Context via HTTP headers
+- Span naming: `<service>.<operation>`
+
+### Sampling
+
+| Environment | Rate | Rationale |
+|-------------|------|-----------|
+| Development | 100% | Full visibility |
+| Staging | 100% | Full visibility |
+| Production | 10% | Balance cost vs observability |
+
+### Integration Points
+
+- HTTP requests: automatic instrumentation
+- Database queries: automatic instrumentation
+- Message queues: manual span creation on publish/consume
+
+## Alerting
+
+| Severity | Response Time | Conditions |
+|----------|---------------|-----------|
+| Critical | 5 min | Service unreachable, health check failed for 1 min, data loss detected |
+| High | 30 min | Error rate > 5% for 5 min, P95 latency > 2x baseline for 10 min |
+| Medium | 4 hours | Disk usage > 80%, elevated latency, connection pool exhaustion |
+| Low | Next business day | Non-critical warnings, deprecated API usage |
+
+### Notification Channels
+
+| Severity | Channel |
+|----------|---------|
+| Critical | [PagerDuty / phone] |
+| High | [Slack + email] |
+| Medium | [Slack] |
+| Low | [Dashboard only] |
+
+## Dashboards
+
+### Operations Dashboard
+
+- Service health status (up/down per component)
+- Request rate and error rate
+- Response time percentiles (P50, P95, P99)
+- Resource utilization (CPU, memory per container)
+- Active alerts
+
+### Business Dashboard
+
+- [Key business metrics from acceptance criteria]
+- [User activity indicators]
+- [Transaction volumes]
+```
@@ -0,0 +1,177 @@
+---
+name: implement
+description: |
+  Orchestrate task implementation with dependency-aware batching, parallel subagents, and integrated code review.
+  Reads flat task files and _dependencies_table.md from TASKS_DIR, computes execution batches via topological sort,
+  launches up to 4 implementer subagents in parallel, runs code-review skill after each batch, and loops until done.
+  Use after /decompose has produced task files.
+  Trigger phrases:
+  - "implement", "start implementation", "implement tasks"
+  - "run implementers", "execute tasks"
+category: build
+tags: [implementation, orchestration, batching, parallel, code-review]
+disable-model-invocation: true
+---
+
+# Implementation Orchestrator
+
+Orchestrate the implementation of all tasks produced by the `/decompose` skill. This skill is a **pure orchestrator** — it does NOT write implementation code itself. It reads task specs, computes execution order, delegates to `implementer` subagents, validates results via the `/code-review` skill, and escalates issues.
+
+The `implementer` agent is the specialist that writes all the code — it receives a task spec, analyzes the codebase, implements the feature, writes tests, and verifies acceptance criteria.
+
+## Core Principles
+
+- **Orchestrate, don't implement**: this skill delegates all coding to `implementer` subagents
+- **Dependency-aware batching**: tasks run only when all their dependencies are satisfied
+- **Max 4 parallel agents**: never launch more than 4 implementer subagents simultaneously
+- **File isolation**: no two parallel agents may write to the same file
+- **Integrated review**: `/code-review` skill runs automatically after each batch
+- **Auto-start**: batches launch immediately — no user confirmation before a batch
+- **Gate on failure**: user confirmation is required only when code review returns FAIL
+- **Commit and push per batch**: after each batch is confirmed, commit and push to remote
+
+## Context Resolution
+
+- TASKS_DIR: `_docs/02_tasks/`
+- Task files: all `*.md` files in TASKS_DIR (excluding files starting with `_`)
+- Dependency table: `TASKS_DIR/_dependencies_table.md`
+
+## Prerequisite Checks (BLOCKING)
+
+1. TASKS_DIR exists and contains at least one task file — **STOP if missing**
+2. `_dependencies_table.md` exists — **STOP if missing**
+3. At least one task is not yet completed — **STOP if all done**
+
+## Algorithm
+
+### 1. Parse
+
+- Read all task `*.md` files from TASKS_DIR (excluding files starting with `_`)
+- Read `_dependencies_table.md` — parse into a dependency graph (DAG)
+- Validate: no circular dependencies, all referenced dependencies exist
+
+### 2. Detect Progress
+
+- Scan the codebase to determine which tasks are already completed
+- Match implemented code against task acceptance criteria
+- Mark completed tasks as done in the DAG
+- Report progress to user: "X of Y tasks completed"
+
+### 3. Compute Next Batch
+
+- Topological sort remaining tasks
+- Select tasks whose dependencies are ALL satisfied (completed)
+- If a ready task depends on any task currently being worked on in this batch, it must wait for the next batch
+- Cap the batch at 4 parallel agents
+- If the batch would exceed 20 total complexity points, suggest splitting and let the user decide
+
+### 4. Assign File Ownership
+
+For each task in the batch:
+- Parse the task spec's Component field and Scope section
+- Map the component to directories/files in the project
+- Determine: files OWNED (exclusive write), files READ-ONLY (shared interfaces, types), files FORBIDDEN (other agents' owned files)
+- If two tasks in the same batch would modify the same file, schedule them sequentially instead of in parallel
+
+### 5. Update Jira Status → In Progress
+
+For each task in the batch, transition its Jira ticket status to **In Progress** via Jira MCP before launching the implementer.
+
+### 6. Launch Implementer Subagents
+
+For each task in the batch, launch an `implementer` subagent with:
+- Path to the task spec file
+- List of files OWNED (exclusive write access)
+- List of files READ-ONLY
+- List of files FORBIDDEN
+
+Launch all subagents immediately — no user confirmation.
+
+### 7. Monitor
+
+- Wait for all subagents to complete
+- Collect structured status reports from each implementer
+- If any implementer reports "Blocked", log the blocker and continue with others
+
+### 8. Code Review
+
+- Run `/code-review` skill on the batch's changed files + corresponding task specs
+- The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL
+
+### 9. Gate
+
+- If verdict is **FAIL**: present findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding.
+- If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically.
+
+### 10. Test
+
+- Run the full test suite
+- If failures: report to user with details
+
+### 11. Commit and Push
+
+- After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
+  - `git add` all changed files from the batch
+  - `git commit` with a message that includes ALL JIRA-IDs of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[JIRA-ID-1] [JIRA-ID-2] ... Summary of changes`
+  - `git push` to the remote branch
+
+### 12. Update Jira Status → In Testing
+
+After the batch is committed and pushed, transition the Jira ticket status of each task in the batch to **In Testing** via Jira MCP.
+
+### 13. Loop
+
+- Go back to step 2 until all tasks are done
+- When all tasks are complete, report final summary
+
+## Batch Report Persistence
+
+After each batch completes, save the batch report to `_docs/03_implementation/batch_[NN]_report.md`. Create the directory if it doesn't exist. When all tasks are complete, produce `_docs/03_implementation/FINAL_implementation_report.md` with a summary of all batches.
+
+## Batch Report
+
+After each batch, produce a structured report:
+
+```markdown
+# Batch Report
+
+**Batch**: [N]
+**Tasks**: [list]
+**Date**: [YYYY-MM-DD]
+
+## Task Results
+
+| Task | Status | Files Modified | Tests | Issues |
+|------|--------|---------------|-------|--------|
+| [JIRA-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |
+
+## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
+
+## Next Batch: [task list] or "All tasks complete"
+```
+
+## Stop Conditions and Escalation
+
+| Situation | Action |
+|-----------|--------|
+| Implementer fails same approach 3+ times | Stop it, escalate to user |
+| Task blocked on external dependency (not in task list) | Report and skip |
+| File ownership conflict unresolvable | ASK user |
+| Test failures exceed 50% of suite after a batch | Stop and escalate |
+| All tasks complete | Report final summary, suggest final commit |
+| `_dependencies_table.md` missing | STOP — run `/decompose` first |
+
+## Recovery
+
+Each batch commit serves as a rollback checkpoint. If recovery is needed:
+
+- **Tests fail after a batch commit**: `git revert <batch-commit-hash>` using the hash from the batch report in `_docs/03_implementation/`
+- **Resuming after interruption**: Read `_docs/03_implementation/batch_*_report.md` files to determine which batches completed, then continue from the next batch
+- **Multiple consecutive batches fail**: Stop and escalate to user with links to batch reports and commit hashes
+
+## Safety Rules
+
+- Never launch tasks whose dependencies are not yet completed
+- Never allow two parallel agents to write to the same file
+- If a subagent fails, do NOT retry automatically — report and let user decide
+- Always run tests after each batch completes
@@ -0,0 +1,31 @@
+# Batching Algorithm Reference
+
+## Topological Sort with Batch Grouping
+
+The `/implement` skill uses a topological sort to determine execution order,
+then groups tasks into batches for parallel execution.
+
+## Algorithm
+
+1. Build adjacency list from `_dependencies_table.md`
+2. Compute in-degree for each task node
+3. Initialize batch 0 with all nodes that have in-degree 0
+4. For each batch:
+   a. Select up to 4 tasks from the ready set
+   b. Check file ownership — if two tasks would write the same file, defer one to the next batch
+   c. Launch selected tasks as parallel implementer subagents
+   d. When all complete, remove them from the graph and decrement in-degrees of dependents
+   e. Add newly zero-in-degree nodes to the next batch's ready set
+5. Repeat until the graph is empty
+
+## File Ownership Conflict Resolution
+
+When two tasks in the same batch map to overlapping files:
+- Prefer to run the lower-numbered task first (it's more foundational)
+- Defer the higher-numbered task to the next batch
+- If both have equal priority, ask the user
+
+## Complexity Budget
+
+Each batch should not exceed 20 total complexity points.
+If it does, split the batch and let the user choose which tasks to include.
@@ -0,0 +1,36 @@
+# Batch Report Template
+
+Use this template after each implementation batch completes.
+
+---
+
+```markdown
+# Batch Report
+
+**Batch**: [N]
+**Tasks**: [list of task names]
+**Date**: [YYYY-MM-DD]
+
+## Task Results
+
+| Task | Status | Files Modified | Tests | Issues |
+|------|--------|---------------|-------|--------|
+| [JIRA-ID]_[name] | Done/Blocked/Partial | [count] files | [X/Y pass] | [count or None] |
+
+## Code Review Verdict: [PASS / FAIL / PASS_WITH_WARNINGS]
+
+[Link to code review report if FAIL or PASS_WITH_WARNINGS]
+
+## Test Suite
+
+- Total: [N] tests
+- Passed: [N]
+- Failed: [N]
+- Skipped: [N]
+
+## Commit
+
+[Suggested commit message]
+
+## Next Batch: [task list] or "All tasks complete"
+```
@@ -0,0 +1,557 @@
+---
+name: plan
+description: |
+  Decompose a solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics.
+  Systematic 6-step planning workflow with BLOCKING gates, self-verification, and structured artifact management.
+  Uses _docs/ + _docs/02_plans/ structure.
+  Trigger phrases:
+  - "plan", "decompose solution", "architecture planning"
+  - "break down the solution", "create planning documents"
+  - "component decomposition", "solution analysis"
+category: build
+tags: [planning, architecture, components, testing, jira, epics]
+disable-model-invocation: true
+---
+
+# Solution Planning
+
+Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics through a systematic 6-step workflow.
+
+## Core Principles
+
+- **Single Responsibility**: each component does one thing well; do not spread related logic across components
+- **Dumb code, smart data**: keep logic simple, push complexity into data structures and configuration
+- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
+- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
+- **Plan, don't code**: this workflow produces documents and specs, never implementation code
+
+## Context Resolution
+
+Fixed paths — no mode detection needed:
+
+- PROBLEM_FILE: `_docs/00_problem/problem.md`
+- SOLUTION_FILE: `_docs/01_solution/solution.md`
+- PLANS_DIR: `_docs/02_plans/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+| File | Purpose |
+|------|---------|
+| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations |
+| `_docs/00_problem/input_data/` | Reference data examples |
+| `_docs/01_solution/solution.md` | Finalized solution to decompose |
+
+### Prerequisite Checks (BLOCKING)
+
+Run sequentially before any planning step:
+
+**Prereq 1: Data Gate**
+
+1. `_docs/00_problem/acceptance_criteria.md` exists and is non-empty — **STOP if missing**
+2. `_docs/00_problem/restrictions.md` exists and is non-empty — **STOP if missing**
+3. `_docs/00_problem/input_data/` exists and contains at least one data file — **STOP if missing**
+4. `_docs/00_problem/problem.md` exists and is non-empty — **STOP if missing**
+
+All four are mandatory. If any is missing or empty, STOP and ask the user to provide them. If the user cannot provide the required data, planning cannot proceed — just stop.
+
+**Prereq 2: Finalize Solution Draft**
+
+Only runs after the Data Gate passes:
+
+1. Scan `_docs/01_solution/` for files matching `solution_draft*.md`
+2. Identify the highest-numbered draft (e.g. `solution_draft06.md`)
+3. **Rename** it to `_docs/01_solution/solution.md`
+4. If `solution.md` already exists, ask the user whether to overwrite or keep existing
+5. Verify `solution.md` is non-empty — **STOP if missing or empty**
+
+**Prereq 3: Workspace Setup**
+
+1. Create PLANS_DIR if it does not exist
+2. If PLANS_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
+
+## Artifact Management
+
+### Directory Structure
+
+All artifacts are written directly under PLANS_DIR:
+
+```
+PLANS_DIR/
+├── integration_tests/
+│   ├── environment.md
+│   ├── test_data.md
+│   ├── functional_tests.md
+│   ├── non_functional_tests.md
+│   └── traceability_matrix.md
+├── architecture.md
+├── system-flows.md
+├── data_model.md
+├── deployment/
+│   ├── containerization.md
+│   ├── ci_cd_pipeline.md
+│   ├── environment_strategy.md
+│   ├── observability.md
+│   └── deployment_procedures.md
+├── risk_mitigations.md
+├── risk_mitigations_02.md          (iterative, ## as sequence)
+├── components/
+│   ├── 01_[name]/
+│   │   ├── description.md
+│   │   └── tests.md
+│   ├── 02_[name]/
+│   │   ├── description.md
+│   │   └── tests.md
+│   └── ...
+├── common-helpers/
+│   ├── 01_helper_[name]/
+│   ├── 02_helper_[name]/
+│   └── ...
+├── diagrams/
+│   ├── components.drawio
+│   └── flows/
+│       ├── flow_[name].md          (Mermaid)
+│       └── ...
+└── FINAL_report.md
+```
+
+### Save Timing
+
+| Step | Save immediately after | Filename |
+|------|------------------------|----------|
+| Step 1 | Integration test environment spec | `integration_tests/environment.md` |
+| Step 1 | Integration test data spec | `integration_tests/test_data.md` |
+| Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
+| Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
+| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
+| Step 2 | Architecture analysis complete | `architecture.md` |
+| Step 2 | System flows documented | `system-flows.md` |
+| Step 2 | Data model documented | `data_model.md` |
+| Step 2 | Deployment plan complete | `deployment/` (5 files) |
+| Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
+| Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
+| Step 3 | Diagrams generated | `diagrams/` |
+| Step 4 | Risk assessment complete | `risk_mitigations.md` |
+| Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
+| Step 6 | Epics created in Jira | Jira via MCP |
+| Final | All steps complete | `FINAL_report.md` |
+
+### Save Principles
+
+1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
+2. **Incremental updates**: same file can be updated multiple times; append or replace
+3. **Preserve process**: keep all intermediate files even after integration into final report
+4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
+
+### Resumability
+
+If PLANS_DIR already contains artifacts:
+
+1. List existing files and match them to the save timing table above
+2. Identify the last completed step based on which artifacts exist
+3. Resume from the next incomplete step
+4. Inform the user which steps are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all steps (1 through 6). Update status as each step completes.
+
+## Workflow
+
+### Step 1: Integration Tests
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Analyze input data completeness and produce detailed black-box integration test specifications
+**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
+
+#### Phase 1a: Input Data Completeness Analysis
+
+1. Read `_docs/01_solution/solution.md` (finalized in Prereq 2)
+2. Read `acceptance_criteria.md`, `restrictions.md`
+3. Read testing strategy from solution.md
+4. Analyze `input_data/` contents against:
+   - Coverage of acceptance criteria scenarios
+   - Coverage of restriction edge cases
+   - Coverage of testing strategy requirements
+5. Threshold: at least 70% coverage of the scenarios
+6. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/`
+7. Present coverage assessment to user
+
+**BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient.
+
+#### Phase 1b: Black-Box Test Scenario Specification
+
+Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
+
+1. Define test environment using `templates/integration-environment.md` as structure
+2. Define test data management using `templates/integration-test-data.md` as structure
+3. Write functional test scenarios (positive + negative) using `templates/integration-functional-tests.md` as structure
+4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `templates/integration-non-functional-tests.md` as structure
+5. Build traceability matrix using `templates/integration-traceability-matrix.md` as structure
+
+**Self-verification**:
+- [ ] Every acceptance criterion is covered by at least one test scenario
+- [ ] Every restriction is verified by at least one test scenario
+- [ ] Positive and negative scenarios are balanced
+- [ ] Consumer app has no direct access to system internals
+- [ ] Docker environment is self-contained (`docker compose up` sufficient)
+- [ ] External dependencies have mock/stub services defined
+- [ ] Traceability matrix has no uncovered AC or restrictions
+
+**Save action**: Write all files under `integration_tests/`:
+- `environment.md`
+- `test_data.md`
+- `functional_tests.md`
+- `non_functional_tests.md`
+- `traceability_matrix.md`
+
+**BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed.
+
+Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3.
+
+---
+
+### Step 2: Solution Analysis
+
+**Role**: Professional software architect
+**Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
+**Constraints**: No code, no component-level detail yet; focus on system-level view
+
+#### Phase 2a: Architecture & Flows
+
+1. Read all input files thoroughly
+2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
+3. Research unknown or questionable topics via internet; ask user about ambiguities
+4. Document architecture using `templates/architecture.md` as structure
+5. Document system flows using `templates/system-flows.md` as structure
+
+**Self-verification**:
+- [ ] Architecture covers all capabilities mentioned in solution.md
+- [ ] System flows cover all main user/system interactions
+- [ ] No contradictions with problem.md or restrictions.md
+- [ ] Technology choices are justified
+- [ ] Integration test findings are reflected in architecture decisions
+
+**Save action**: Write `architecture.md` and `system-flows.md`
+
+**BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
+
+#### Phase 2b: Data Model
+
+**Role**: Professional software architect
+**Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
+
+1. Extract core entities from architecture.md and solution.md
+2. Define entity attributes, types, and constraints
+3. Define relationships between entities (Mermaid ERD)
+4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
+5. Define seed data requirements per environment (dev, staging)
+6. Define backward compatibility approach for schema changes (additive-only by default)
+
+**Self-verification**:
+- [ ] Every entity mentioned in architecture.md is defined
+- [ ] Relationships are explicit with cardinality
+- [ ] Migration strategy specifies reversibility requirement
+- [ ] Seed data requirements defined
+- [ ] Backward compatibility approach documented
+
+**Save action**: Write `data_model.md`
+
+#### Phase 2c: Deployment Planning
+
+**Role**: DevOps / Platform engineer
+**Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
+
+Use the `/deploy` skill's templates as structure for each artifact:
+
+1. Read architecture.md and restrictions.md for infrastructure constraints
+2. Research Docker best practices for the project's tech stack
+3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
+4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
+5. Define environment strategy: dev, staging, production with secrets management
+6. Define observability: structured logging, metrics, tracing, alerting
+7. Define deployment procedures: strategy, health checks, rollback, checklist
+
+**Self-verification**:
+- [ ] Every component has a Docker specification
+- [ ] CI/CD pipeline covers lint, test, security, build, deploy
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Observability covers logging, metrics, tracing, alerting
+- [ ] Deployment procedures include rollback and health checks
+
+**Save action**: Write all 5 files under `deployment/`:
+- `containerization.md`
+- `ci_cd_pipeline.md`
+- `environment_strategy.md`
+- `observability.md`
+- `deployment_procedures.md`
+
+---
+
+### Step 3: Component Decomposition
+
+**Role**: Professional software architect
+**Goal**: Decompose the architecture into components with detailed specs
+**Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
+
+1. Identify components from the architecture; think about separation, reusability, and communication patterns
+2. Use integration test scenarios from Step 1 to validate component boundaries
+3. If additional components are needed (data preparation, shared helpers), create them
+4. For each component, write a spec using `templates/component-spec.md` as structure
+5. Generate diagrams:
+   - draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
+   - Mermaid flowchart per main control flow
+6. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified.
+
+**Self-verification**:
+- [ ] Each component has a single, clear responsibility
+- [ ] No functionality is spread across multiple components
+- [ ] All inter-component interfaces are defined (who calls whom, with what)
+- [ ] Component dependency graph has no circular dependencies
+- [ ] All components from architecture.md are accounted for
+- [ ] Every integration test scenario can be traced through component interactions
+
+**Save action**: Write:
+ - each component `components/[##]_[name]/description.md`
+ - common helper `common-helpers/[##]_helper_[name].md`
+ - diagrams `diagrams/`
+
+**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
+
+---
+
+### Step 4: Architecture Review & Risk Assessment
+
+**Role**: Professional software architect and analyst
+**Goal**: Validate all artifacts for consistency, then identify and mitigate risks
+**Constraints**: This is a review step — fix problems found, do not add new features
+
+#### 4a. Evaluator Pass (re-read ALL artifacts)
+
+Review checklist:
+- [ ] All components follow Single Responsibility Principle
+- [ ] All components follow dumb code / smart data principle
+- [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
+- [ ] No circular dependencies in the dependency graph
+- [ ] No missing interactions between components
+- [ ] No over-engineering — is there a simpler decomposition?
+- [ ] Security considerations addressed in component design
+- [ ] Performance bottlenecks identified
+- [ ] API contracts are consistent across components
+
+Fix any issues found before proceeding to risk identification.
+
+#### 4b. Risk Identification
+
+1. Identify technical and project risks
+2. Assess probability and impact using `templates/risk-register.md`
+3. Define mitigation strategies
+4. Apply mitigations to architecture, flows, and component documents where applicable
+
+**Self-verification**:
+- [ ] Every High/Critical risk has a concrete mitigation strategy
+- [ ] Mitigations are reflected in the relevant component or architecture docs
+- [ ] No new risks introduced by the mitigations themselves
+
+**Save action**: Write `risk_mitigations.md`
+
+**BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
+
+**Iterative**: If user requests another round, repeat Step 4 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
+
+---
+
+### Step 5: Test Specifications
+
+**Role**: Professional Quality Assurance Engineer
+
+**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
+
+**Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
+
+1. For each component, write tests using `templates/test-spec.md` as structure
+2. Cover all 4 types: integration, performance, security, acceptance
+3. Include test data management (setup, teardown, isolation)
+4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
+
+**Self-verification**:
+- [ ] Every acceptance criterion has at least one test covering it
+- [ ] Test inputs are realistic and well-defined
+- [ ] Expected results are specific and measurable
+- [ ] No component is left without tests
+
+**Save action**: Write each `components/[##]_[name]/tests.md`
+
+---
+
+### Step 6: Jira Epics
+
+**Role**: Professional product manager
+
+**Goal**: Create Jira epics from components, ordered by dependency
+
+**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the Jira epic should understand the full context without needing to open separate files.
+
+1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
+2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md`
+3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
+4. Include effort estimation per epic (T-shirt size or story points range)
+5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
+6. Generate Mermaid diagrams showing component-to-epic mapping and component relationships
+
+**CRITICAL — Epic description richness requirements**:
+
+Each epic description in Jira MUST include ALL of the following sections with substantial content:
+- **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections)
+- **Problem / Context**: what problem this component solves, why it exists, current pain points
+- **Scope**: detailed in-scope and out-of-scope lists
+- **Architecture notes**: relevant ADRs, technology choices, patterns used, key design decisions
+- **Interface specification**: full method signatures, input/output types, error types (from component description.md)
+- **Data flow**: how data enters and exits this component (include Mermaid sequence or flowchart diagram)
+- **Dependencies**: epic dependencies (with Jira IDs) and external dependencies (libraries, hardware, services)
+- **Acceptance criteria**: measurable criteria with specific thresholds (from component tests.md)
+- **Non-functional requirements**: latency, memory, throughput targets with failure thresholds
+- **Risks & mitigations**: relevant risks from risk_mitigations.md with concrete mitigation strategies
+- **Effort estimation**: T-shirt size and story points range
+- **Child issues**: planned task breakdown with complexity points
+- **Key constraints**: from restrictions.md that affect this component
+- **Testing strategy**: summary of test types and coverage from tests.md
+
+Do NOT create minimal epics with just a summary and short description. The Jira epic is the primary reference document for the implementation team.
+
+**Self-verification**:
+- [ ] "Bootstrap & Initial Structure" epic exists and is first in order
+- [ ] "Integration Tests" epic exists
+- [ ] Every component maps to exactly one epic
+- [ ] Dependency order is respected (no epic depends on a later one)
+- [ ] Acceptance criteria are measurable
+- [ ] Effort estimates are realistic
+- [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
+- [ ] Epic descriptions are self-contained — readable without opening other files
+
+7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
+
+**Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
+
+---
+
+## Quality Checklist (before FINAL_report.md)
+
+Before writing the final report, verify ALL of the following:
+
+### Integration Tests
+- [ ] Every acceptance criterion is covered in traceability_matrix.md
+- [ ] Every restriction is verified by at least one test
+- [ ] Positive and negative scenarios are balanced
+- [ ] Docker environment is self-contained
+- [ ] Consumer app treats main system as black box
+- [ ] CI/CD integration and reporting defined
+
+### Architecture
+- [ ] Covers all capabilities from solution.md
+- [ ] Technology choices are justified
+- [ ] Deployment model is defined
+- [ ] Integration test findings are reflected in architecture decisions
+
+### Data Model
+- [ ] Every entity from architecture.md is defined
+- [ ] Relationships have explicit cardinality
+- [ ] Migration strategy with reversibility requirement
+- [ ] Seed data requirements defined
+- [ ] Backward compatibility approach documented
+
+### Deployment
+- [ ] Containerization plan covers all components
+- [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Observability covers logging, metrics, tracing, alerting
+- [ ] Deployment procedures include rollback and health checks
+
+### Components
+- [ ] Every component follows SRP
+- [ ] No circular dependencies
+- [ ] All inter-component interfaces are defined and consistent
+- [ ] No orphan components (unused by any flow)
+- [ ] Every integration test scenario can be traced through component interactions
+
+### Risks
+- [ ] All High/Critical risks have mitigations
+- [ ] Mitigations are reflected in component/architecture docs
+- [ ] User has confirmed risk assessment is sufficient
+
+### Tests
+- [ ] Every acceptance criterion is covered by at least one test
+- [ ] All 4 test types are represented per component (where applicable)
+- [ ] Test data management is defined
+
+### Epics
+- [ ] "Bootstrap & Initial Structure" epic exists
+- [ ] "Integration Tests" epic exists
+- [ ] Every component maps to an epic
+- [ ] Dependency order is correct
+- [ ] Acceptance criteria are measurable
+
+**Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
+
+## Common Mistakes
+
+- **Proceeding without input data**: all three data gate items (acceptance_criteria, restrictions, input_data) must be present before any planning begins
+- **Coding during planning**: this workflow produces documents, never code
+- **Multi-responsibility components**: if a component does two things, split it
+- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
+- **Diagrams without data**: generate diagrams only after the underlying structure is documented
+- **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input
+- **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output
+- **Ignoring restrictions.md**: every constraint must be traceable in the architecture or risk register
+- **Ignoring integration test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — planning cannot proceed |
+| Ambiguous requirements | ASK user |
+| Input data coverage below 70% | Search internet for supplementary data, ASK user to validate |
+| Technology choice with multiple valid options | ASK user |
+| Component naming | PROCEED, confirm at next BLOCKING gate |
+| File structure within templates | PROCEED |
+| Contradictions between input files | ASK user |
+| Risk mitigation requires architecture change | ASK user |
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│               Solution Planning (6-Step Method)                │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ 1: Data Gate (BLOCKING)                                 │
+│   → verify AC, restrictions, input_data exist — STOP if not    │
+│ PREREQ 2: Finalize solution draft                              │
+│   → rename highest solution_draft##.md to solution.md          │
+│ PREREQ 3: Workspace setup                                      │
+│   → create PLANS_DIR/ if needed                                │
+│                                                                │
+│ 1. Integration Tests  → integration_tests/ (5 files)           │
+│    [BLOCKING: user confirms test coverage]                     │
+│ 2a. Architecture      → architecture.md, system-flows.md       │
+│    [BLOCKING: user confirms architecture]                      │
+│ 2b. Data Model        → data_model.md                          │
+│ 2c. Deployment        → deployment/ (5 files)                  │
+│ 3. Component Decompose → components/[##]_[name]/description    │
+│    [BLOCKING: user confirms decomposition]                     │
+│ 4. Review & Risk      → risk_mitigations.md                    │
+│    [BLOCKING: user confirms risks, iterative]                  │
+│ 5. Test Specifications → components/[##]_[name]/tests.md       │
+│ 6. Jira Epics         → Jira via MCP                           │
+│    ─────────────────────────────────────────────────           │
+│    Quality Checklist → FINAL_report.md                         │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: SRP · Dumb code/smart data · Save immediately      │
+│             Ask don't assume · Plan don't code                 │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,128 @@
+# Architecture Document Template
+
+Use this template for the architecture document. Save as `_docs/02_plans/architecture.md`.
+
+---
+
+```markdown
+# [System Name] — Architecture
+
+## 1. System Context
+
+**Problem being solved**: [One paragraph summarizing the problem from problem.md]
+
+**System boundaries**: [What is inside the system vs. external]
+
+**External systems**:
+
+| System | Integration Type | Direction | Purpose |
+|--------|-----------------|-----------|---------|
+| [name] | REST / Queue / DB / File | Inbound / Outbound / Both | [why] |
+
+## 2. Technology Stack
+
+| Layer | Technology | Version | Rationale |
+|-------|-----------|---------|-----------|
+| Language | | | |
+| Framework | | | |
+| Database | | | |
+| Cache | | | |
+| Message Queue | | | |
+| Hosting | | | |
+| CI/CD | | | |
+
+**Key constraints from restrictions.md**:
+- [Constraint 1 and how it affects technology choices]
+- [Constraint 2]
+
+## 3. Deployment Model
+
+**Environments**: Development, Staging, Production
+
+**Infrastructure**:
+- [Cloud provider / On-prem / Hybrid]
+- [Container orchestration if applicable]
+- [Scaling strategy: horizontal / vertical / auto]
+
+**Environment-specific configuration**:
+
+| Config | Development | Production |
+|--------|-------------|------------|
+| Database | [local/docker] | [managed service] |
+| Secrets | [.env file] | [secret manager] |
+| Logging | [console] | [centralized] |
+
+## 4. Data Model Overview
+
+> High-level data model covering the entire system. Detailed per-component models go in component specs.
+
+**Core entities**:
+
+| Entity | Description | Owned By Component |
+|--------|-------------|--------------------|
+| [entity] | [what it represents] | [component ##] |
+
+**Key relationships**:
+- [Entity A] → [Entity B]: [relationship description]
+
+**Data flow summary**:
+- [Source] → [Transform] → [Destination]: [what data and why]
+
+## 5. Integration Points
+
+### Internal Communication
+
+| From | To | Protocol | Pattern | Notes |
+|------|----|----------|---------|-------|
+| [component] | [component] | Sync REST / Async Queue / Direct call | Request-Response / Event / Command | |
+
+### External Integrations
+
+| External System | Protocol | Auth | Rate Limits | Failure Mode |
+|----------------|----------|------|-------------|--------------|
+| [system] | [REST/gRPC/etc] | [API key/OAuth/etc] | [limits] | [retry/circuit breaker/fallback] |
+
+## 6. Non-Functional Requirements
+
+| Requirement | Target | Measurement | Priority |
+|------------|--------|-------------|----------|
+| Availability | [e.g., 99.9%] | [how measured] | High/Medium/Low |
+| Latency (p95) | [e.g., <200ms] | [endpoint/operation] | |
+| Throughput | [e.g., 1000 req/s] | [peak/sustained] | |
+| Data retention | [e.g., 90 days] | [which data] | |
+| Recovery (RPO/RTO) | [e.g., RPO 1hr, RTO 4hr] | | |
+| Scalability | [e.g., 10x current load] | [timeline] | |
+
+## 7. Security Architecture
+
+**Authentication**: [mechanism — JWT / session / API key]
+
+**Authorization**: [RBAC / ABAC / per-resource]
+
+**Data protection**:
+- At rest: [encryption method]
+- In transit: [TLS version]
+- Secrets management: [tool/approach]
+
+**Audit logging**: [what is logged, where, retention]
+
+## 8. Key Architectural Decisions
+
+Record significant decisions that shaped the architecture.
+
+### ADR-001: [Decision Title]
+
+**Context**: [Why this decision was needed]
+
+**Decision**: [What was decided]
+
+**Alternatives considered**:
+1. [Alternative 1] — rejected because [reason]
+2. [Alternative 2] — rejected because [reason]
+
+**Consequences**: [Trade-offs accepted]
+
+### ADR-002: [Decision Title]
+
+...
+```
@@ -0,0 +1,156 @@
+# Component Specification Template
+
+Use this template for each component. Save as `components/[##]_[name]/description.md`.
+
+---
+
+```markdown
+# [Component Name]
+
+## 1. High-Level Overview
+
+**Purpose**: [One sentence: what this component does and its role in the system]
+
+**Architectural Pattern**: [e.g., Repository, Event-driven, Pipeline, Facade, etc.]
+
+**Upstream dependencies**: [Components that this component calls or consumes from]
+
+**Downstream consumers**: [Components that call or consume from this component]
+
+## 2. Internal Interfaces
+
+For each interface this component exposes internally:
+
+### Interface: [InterfaceName]
+
+| Method | Input | Output | Async | Error Types |
+|--------|-------|--------|-------|-------------|
+| `method_name` | `InputDTO` | `OutputDTO` | Yes/No | `ErrorType1`, `ErrorType2` |
+
+**Input DTOs**:
+```
+[DTO name]:
+  field_1: type (required/optional) — description
+  field_2: type (required/optional) — description
+```
+
+**Output DTOs**:
+```
+[DTO name]:
+  field_1: type — description
+  field_2: type — description
+```
+
+## 3. External API Specification
+
+> Include this section only if the component exposes an external HTTP/gRPC API.
+> Skip if the component is internal-only.
+
+| Endpoint | Method | Auth | Rate Limit | Description |
+|----------|--------|------|------------|-------------|
+| `/api/v1/...` | GET/POST/PUT/DELETE | Required/Public | X req/min | Brief description |
+
+**Request/Response schemas**: define per endpoint using OpenAPI-style notation.
+
+**Example request/response**:
+```json
+// Request
+{ }
+
+// Response
+{ }
+```
+
+## 4. Data Access Patterns
+
+### Queries
+
+| Query | Frequency | Hot Path | Index Needed |
+|-------|-----------|----------|--------------|
+| [describe query] | High/Medium/Low | Yes/No | Yes/No |
+
+### Caching Strategy
+
+| Data | Cache Type | TTL | Invalidation |
+|------|-----------|-----|-------------|
+| [data item] | In-memory / Redis / None | [duration] | [trigger] |
+
+### Storage Estimates
+
+| Table/Collection | Est. Row Count (1yr) | Row Size | Total Size | Growth Rate |
+|-----------------|---------------------|----------|------------|-------------|
+| [table_name] | | | | /month |
+
+### Data Management
+
+**Seed data**: [Required seed data and how to load it]
+
+**Rollback**: [Rollback procedure for this component's data changes]
+
+## 5. Implementation Details
+
+**Algorithmic Complexity**: [Big O for critical methods — only if non-trivial]
+
+**State Management**: [Local state / Global state / Stateless — explain how state is handled]
+
+**Key Dependencies**: [External libraries and their purpose]
+
+| Library | Version | Purpose |
+|---------|---------|---------|
+| [name] | [version] | [why needed] |
+
+**Error Handling Strategy**:
+- [How errors are caught, propagated, and reported]
+- [Retry policy if applicable]
+- [Circuit breaker if applicable]
+
+## 6. Extensions and Helpers
+
+> List any shared utilities this component needs that should live in a `helpers/` folder.
+
+| Helper | Purpose | Used By |
+|--------|---------|---------|
+| [helper_name] | [what it does] | [list of components] |
+
+## 7. Caveats & Edge Cases
+
+**Known limitations**:
+- [Limitation 1]
+
+**Potential race conditions**:
+- [Race condition scenario, if any]
+
+**Performance bottlenecks**:
+- [Bottleneck description and mitigation approach]
+
+## 8. Dependency Graph
+
+**Must be implemented after**: [list of component numbers/names]
+
+**Can be implemented in parallel with**: [list of component numbers/names]
+
+**Blocks**: [list of components that depend on this one]
+
+## 9. Logging Strategy
+
+| Log Level | When | Example |
+|-----------|------|---------|
+| ERROR | Unrecoverable failures | `Failed to process order {id}: {error}` |
+| WARN | Recoverable issues | `Retry attempt {n} for {operation}` |
+| INFO | Key business events | `Order {id} created by user {uid}` |
+| DEBUG | Development diagnostics | `Query returned {n} rows in {ms}ms` |
+
+**Log format**: [structured JSON / plaintext — match system standard]
+
+**Log storage**: [stdout / file / centralized logging service]
+```
+
+---
+
+## Guidance Notes
+
+- **Section 3 (External API)**: skip entirely for internal-only components. Include for any component that exposes HTTP endpoints, WebSocket connections, or gRPC services.
+- **Section 4 (Storage Estimates)**: critical for components that manage persistent data. Skip for stateless components.
+- **Section 5 (Algorithmic Complexity)**: only document if the algorithm is non-trivial (O(n^2) or worse, recursive, etc.). Simple CRUD operations don't need this.
+- **Section 6 (Helpers)**: if the helper is used by only one component, keep it inside that component. Only extract to `helpers/` if shared by 2+ components.
+- **Section 8 (Dependency Graph)**: this is essential for determining implementation order. Be precise about what "depends on" means — data dependency, API dependency, or shared infrastructure.
@@ -0,0 +1,127 @@
+# Jira Epic Template
+
+Use this template for each Jira epic. Create epics via Jira MCP.
+
+---
+
+```markdown
+## Epic: [Component Name] — [Outcome]
+
+**Example**: Data Ingestion — Near-real-time pipeline
+
+### Epic Summary
+
+[1-2 sentences: what we are building + why it matters]
+
+### Problem / Context
+
+[Current state, pain points, constraints, business opportunities.
+Link to architecture.md and relevant component spec.]
+
+### Scope
+
+**In Scope**:
+- [Capability 1 — describe what, not how]
+- [Capability 2]
+- [Capability 3]
+
+**Out of Scope**:
+- [Explicit exclusion 1 — prevents scope creep]
+- [Explicit exclusion 2]
+
+### Assumptions
+
+- [System design assumption]
+- [Data structure assumption]
+- [Infrastructure assumption]
+
+### Dependencies
+
+**Epic dependencies** (must be completed first):
+- [Epic name / ID]
+
+**External dependencies**:
+- [Services, hardware, environments, certificates, data sources]
+
+### Effort Estimation
+
+**T-shirt size**: S / M / L / XL
+**Story points range**: [min]-[max]
+
+### Users / Consumers
+
+| Type | Who | Key Use Cases |
+|------|-----|--------------|
+| Internal | [team/role] | [use case] |
+| External | [user type] | [use case] |
+| System | [service name] | [integration point] |
+
+### Requirements
+
+**Functional**:
+- [API expectations, events, data handling]
+- [Idempotency, retry behavior]
+
+**Non-functional**:
+- [Availability, latency, throughput targets]
+- [Scalability, processing limits, data retention]
+
+**Security / Compliance**:
+- [Authentication, encryption, secrets management]
+- [Logging, audit trail]
+- [SOC2 / ISO / GDPR if applicable]
+
+### Design & Architecture
+
+- Architecture doc: `_docs/02_plans/architecture.md`
+- Component spec: `_docs/02_plans/components/[##]_[name]/description.md`
+- System flows: `_docs/02_plans/system-flows.md`
+
+### Definition of Done
+
+- [ ] All in-scope capabilities implemented
+- [ ] Automated tests pass (unit + integration + e2e)
+- [ ] Minimum coverage threshold met (75%)
+- [ ] Runbooks written (if applicable)
+- [ ] Documentation updated
+
+### Acceptance Criteria
+
+| # | Criterion | Measurable Condition |
+|---|-----------|---------------------|
+| 1 | [criterion] | [how to verify] |
+| 2 | [criterion] | [how to verify] |
+
+### Risks & Mitigations
+
+| # | Risk | Mitigation | Owner |
+|---|------|------------|-------|
+| 1 | [top risk] | [mitigation] | [owner] |
+| 2 | | | |
+| 3 | | | |
+
+### Labels
+
+- `component:[name]`
+- `env:prod` / `env:stg`
+- `type:platform` / `type:data` / `type:integration`
+
+### Child Issues
+
+| Type | Title | Points |
+|------|-------|--------|
+| Spike | [research/investigation task] | [1-3] |
+| Task | [implementation task] | [1-5] |
+| Task | [implementation task] | [1-5] |
+| Enabler | [infrastructure/setup task] | [1-3] |
+```
+
+---
+
+## Guidance Notes
+
+- Be concise. Fewer words with the same meaning = better epic.
+- Capabilities in scope are "what", not "how" — avoid describing implementation details.
+- Dependency order matters: epics that must be done first should be listed earlier in the backlog.
+- Every epic maps to exactly one component. If a component is too large for one epic, split the component first.
+- Complexity points for child issues follow the project standard: 1, 2, 3, 5, 8. Do not create issues above 5 points — split them.
@@ -0,0 +1,104 @@
+# Final Planning Report Template
+
+Use this template after completing all 5 steps and the quality checklist. Save as `_docs/02_plans/FINAL_report.md`.
+
+---
+
+```markdown
+# [System Name] — Planning Report
+
+## Executive Summary
+
+[2-3 sentences: what was planned, the core architectural approach, and the key outcome (number of components, epics, estimated effort)]
+
+## Problem Statement
+
+[Brief restatement from problem.md — transformed, not copy-pasted]
+
+## Architecture Overview
+
+[Key architectural decisions and technology stack summary. Reference `architecture.md` for full details.]
+
+**Technology stack**: [language, framework, database, hosting — one line]
+
+**Deployment**: [environment strategy — one line]
+
+## Component Summary
+
+| # | Component | Purpose | Dependencies | Epic |
+|---|-----------|---------|-------------|------|
+| 01 | [name] | [one-line purpose] | — | [Jira ID] |
+| 02 | [name] | [one-line purpose] | 01 | [Jira ID] |
+| ... | | | | |
+
+**Implementation order** (based on dependency graph):
+1. [Phase 1: components that can start immediately]
+2. [Phase 2: components that depend on Phase 1]
+3. [Phase 3: ...]
+
+## System Flows
+
+| Flow | Description | Key Components |
+|------|-------------|---------------|
+| [name] | [one-line summary] | [component list] |
+
+[Reference `system-flows.md` for full diagrams and details.]
+
+## Risk Summary
+
+| Level | Count | Key Risks |
+|-------|-------|-----------|
+| Critical | [N] | [brief list] |
+| High | [N] | [brief list] |
+| Medium | [N] | — |
+| Low | [N] | — |
+
+**Iterations completed**: [N]
+**All Critical/High risks mitigated**: Yes / No — [details if No]
+
+[Reference `risk_mitigations.md` for full register.]
+
+## Test Coverage
+
+| Component | Integration | Performance | Security | Acceptance | AC Coverage |
+|-----------|-------------|-------------|----------|------------|-------------|
+| [name] | [N tests] | [N tests] | [N tests] | [N tests] | [X/Y ACs] |
+| ... | | | | | |
+
+**Overall acceptance criteria coverage**: [X / Y total ACs covered] ([percentage]%)
+
+## Epic Roadmap
+
+| Order | Epic | Component | Effort | Dependencies |
+|-------|------|-----------|--------|-------------|
+| 1 | [Jira ID]: [name] | [component] | [S/M/L/XL] | — |
+| 2 | [Jira ID]: [name] | [component] | [S/M/L/XL] | Epic 1 |
+| ... | | | | |
+
+**Total estimated effort**: [sum or range]
+
+## Key Decisions Made
+
+| # | Decision | Rationale | Alternatives Rejected |
+|---|----------|-----------|----------------------|
+| 1 | [decision] | [why] | [what was rejected] |
+| 2 | | | |
+
+## Open Questions
+
+| # | Question | Impact | Assigned To |
+|---|----------|--------|-------------|
+| 1 | [unresolved question] | [what it blocks or affects] | [who should answer] |
+
+## Artifact Index
+
+| File | Description |
+|------|-------------|
+| `architecture.md` | System architecture |
+| `system-flows.md` | System flows and diagrams |
+| `components/01_[name]/description.md` | Component spec |
+| `components/01_[name]/tests.md` | Test spec |
+| `risk_mitigations.md` | Risk register |
+| `diagrams/components.drawio` | Component diagram |
+| `diagrams/flows/flow_[name].md` | Flow diagrams |
+```
@@ -0,0 +1,90 @@
+# E2E Test Environment Template
+
+Save as `PLANS_DIR/integration_tests/environment.md`.
+
+---
+
+```markdown
+# E2E Test Environment
+
+## Overview
+
+**System under test**: [main system name and entry points — API URLs, message queues, serial ports, etc.]
+**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals.
+
+## Docker Environment
+
+### Services
+
+| Service | Image / Build | Purpose | Ports |
+|---------|--------------|---------|-------|
+| system-under-test | [main app image or build context] | The main system being tested | [ports] |
+| test-db | [postgres/mysql/etc.] | Database for the main system | [ports] |
+| e2e-consumer | [build context for consumer app] | Black-box test runner | — |
+| [dependency] | [image] | [purpose — cache, queue, mock, etc.] | [ports] |
+
+### Networks
+
+| Network | Services | Purpose |
+|---------|----------|---------|
+| e2e-net | all | Isolated test network |
+
+### Volumes
+
+| Volume | Mounted to | Purpose |
+|--------|-----------|---------|
+| [name] | [service:path] | [test data, DB persistence, etc.] |
+
+### docker-compose structure
+
+```yaml
+# Outline only — not runnable code
+services:
+  system-under-test:
+    # main system
+  test-db:
+    # database
+  e2e-consumer:
+    # consumer test app
+    depends_on:
+      - system-under-test
+```
+
+## Consumer Application
+
+**Tech stack**: [language, framework, test runner]
+**Entry point**: [how it starts — e.g., pytest, jest, custom runner]
+
+### Communication with system under test
+
+| Interface | Protocol | Endpoint / Topic | Authentication |
+|-----------|----------|-----------------|----------------|
+| [API name] | [HTTP/gRPC/AMQP/etc.] | [URL or topic] | [method] |
+
+### What the consumer does NOT have access to
+
+- No direct database access to the main system
+- No internal module imports
+- No shared memory or file system with the main system
+
+## CI/CD Integration
+
+**When to run**: [e.g., on PR merge to dev, nightly, before production deploy]
+**Pipeline stage**: [where in the CI pipeline this fits]
+**Gate behavior**: [block merge / warning only / manual approval]
+**Timeout**: [max total suite duration before considered failed]
+
+## Reporting
+
+**Format**: CSV
+**Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
+**Output path**: [where the CSV is written — e.g., ./e2e-results/report.csv]
+```
+
+---
+
+## Guidance Notes
+
+- The consumer app must treat the main system as a true black box — no internal imports, no direct DB queries against the main system's database.
+- Docker environment should be self-contained — `docker compose up` must be sufficient to run the full suite.
+- If the main system requires external services (payment gateways, third-party APIs), define mock/stub services in the Docker environment.
@@ -0,0 +1,78 @@
+# E2E Functional Tests Template
+
+Save as `PLANS_DIR/integration_tests/functional_tests.md`.
+
+---
+
+```markdown
+# E2E Functional Tests
+
+## Positive Scenarios
+
+### FT-P-01: [Scenario Name]
+
+**Summary**: [One sentence: what end-to-end use case this validates]
+**Traces to**: AC-[ID], AC-[ID]
+**Category**: [which AC category — e.g., Position Accuracy, Image Processing, etc.]
+
+**Preconditions**:
+- [System state required before test]
+
+**Input data**: [reference to specific data set or file from test_data.md]
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | [call / send / provide input] | [response / event / output] |
+| 2 | [call / send / provide input] | [response / event / output] |
+
+**Expected outcome**: [specific, measurable result]
+**Max execution time**: [e.g., 10s]
+
+---
+
+### FT-P-02: [Scenario Name]
+
+(repeat structure)
+
+---
+
+## Negative Scenarios
+
+### FT-N-01: [Scenario Name]
+
+**Summary**: [One sentence: what invalid/edge input this tests]
+**Traces to**: AC-[ID] (negative case), RESTRICT-[ID]
+**Category**: [which AC/restriction category]
+
+**Preconditions**:
+- [System state required before test]
+
+**Input data**: [reference to specific invalid data or edge case]
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | [provide invalid input / trigger edge case] | [error response / graceful degradation / fallback behavior] |
+
+**Expected outcome**: [system rejects gracefully / falls back to X / returns error Y]
+**Max execution time**: [e.g., 5s]
+
+---
+
+### FT-N-02: [Scenario Name]
+
+(repeat structure)
+```
+
+---
+
+## Guidance Notes
+
+- Functional tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
+- Positive scenarios validate the system does what it should.
+- Negative scenarios validate the system rejects or handles gracefully what it shouldn't accept.
+- Expected outcomes must be specific and measurable — not "works correctly" but "returns position within 50m of ground truth."
+- Input data references should point to specific entries in test_data.md.
@@ -0,0 +1,97 @@
+# E2E Non-Functional Tests Template
+
+Save as `PLANS_DIR/integration_tests/non_functional_tests.md`.
+
+---
+
+```markdown
+# E2E Non-Functional Tests
+
+## Performance Tests
+
+### NFT-PERF-01: [Test Name]
+
+**Summary**: [What performance characteristic this validates]
+**Traces to**: AC-[ID]
+**Metric**: [what is measured — latency, throughput, frame rate, etc.]
+
+**Preconditions**:
+- [System state, load profile, data volume]
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | [action] | [what to measure and how] |
+
+**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
+**Duration**: [how long the test runs]
+
+---
+
+## Resilience Tests
+
+### NFT-RES-01: [Test Name]
+
+**Summary**: [What failure/recovery scenario this validates]
+**Traces to**: AC-[ID]
+
+**Preconditions**:
+- [System state before fault injection]
+
+**Fault injection**:
+- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | [inject fault] | [system behavior during fault] |
+| 2 | [observe recovery] | [system behavior after recovery] |
+
+**Pass criteria**: [recovery time, data integrity, continued operation]
+
+---
+
+## Security Tests
+
+### NFT-SEC-01: [Test Name]
+
+**Summary**: [What security property this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
+
+**Pass criteria**: [specific security outcome]
+
+---
+
+## Resource Limit Tests
+
+### NFT-RES-LIM-01: [Test Name]
+
+**Summary**: [What resource constraint this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Preconditions**:
+- [System running under specified constraints]
+
+**Monitoring**:
+- [What resources to monitor — memory, CPU, GPU, disk, temperature]
+
+**Duration**: [how long to run]
+**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
+```
+
+---
+
+## Guidance Notes
+
+- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
+- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
+- Security tests at E2E level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
+- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
@@ -0,0 +1,46 @@
+# E2E Test Data Template
+
+Save as `PLANS_DIR/integration_tests/test_data.md`.
+
+---
+
+```markdown
+# E2E Test Data Management
+
+## Seed Data Sets
+
+| Data Set | Description | Used by Tests | How Loaded | Cleanup |
+|----------|-------------|---------------|-----------|---------|
+| [name] | [what it contains] | [test IDs] | [SQL script / API call / fixture file / volume mount] | [how removed after test] |
+
+## Data Isolation Strategy
+
+[e.g., each test run gets a fresh container restart, or transactions are rolled back, or namespaced data, or separate DB per test group]
+
+## Input Data Mapping
+
+| Input Data File | Source Location | Description | Covers Scenarios |
+|-----------------|----------------|-------------|-----------------|
+| [filename] | `_docs/00_problem/input_data/[filename]` | [what it contains] | [test IDs that use this data] |
+
+## External Dependency Mocks
+
+| External Service | Mock/Stub | How Provided | Behavior |
+|-----------------|-----------|-------------|----------|
+| [service name] | [mock type] | [Docker service / in-process stub / recorded responses] | [what it returns / simulates] |
+
+## Data Validation Rules
+
+| Data Type | Validation | Invalid Examples | Expected System Behavior |
+|-----------|-----------|-----------------|------------------------|
+| [type] | [rules] | [invalid input examples] | [how system should respond] |
+```
+
+---
+
+## Guidance Notes
+
+- Every seed data set should be traceable to specific test scenarios.
+- Input data from `_docs/00_problem/input_data/` should be mapped to test scenarios that use it.
+- External mocks must be deterministic — same input always produces same output.
+- Data isolation must guarantee no test can affect another test's outcome.
@@ -0,0 +1,47 @@
+# E2E Traceability Matrix Template
+
+Save as `PLANS_DIR/integration_tests/traceability_matrix.md`.
+
+---
+
+```markdown
+# E2E Traceability Matrix
+
+## Acceptance Criteria Coverage
+
+| AC ID | Acceptance Criterion | Test IDs | Coverage |
+|-------|---------------------|----------|----------|
+| AC-01 | [criterion text] | FT-P-01, NFT-PERF-01 | Covered |
+| AC-02 | [criterion text] | FT-P-02, FT-N-01 | Covered |
+| AC-03 | [criterion text] | — | NOT COVERED — [reason and mitigation] |
+
+## Restrictions Coverage
+
+| Restriction ID | Restriction | Test IDs | Coverage |
+|---------------|-------------|----------|----------|
+| RESTRICT-01 | [restriction text] | FT-N-02, NFT-RES-LIM-01 | Covered |
+| RESTRICT-02 | [restriction text] | — | NOT COVERED — [reason and mitigation] |
+
+## Coverage Summary
+
+| Category | Total Items | Covered | Not Covered | Coverage % |
+|----------|-----------|---------|-------------|-----------|
+| Acceptance Criteria | [N] | [N] | [N] | [%] |
+| Restrictions | [N] | [N] | [N] | [%] |
+| **Total** | [N] | [N] | [N] | [%] |
+
+## Uncovered Items Analysis
+
+| Item | Reason Not Covered | Risk | Mitigation |
+|------|-------------------|------|-----------|
+| [AC/Restriction ID] | [why it cannot be tested at E2E level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
+```
+
+---
+
+## Guidance Notes
+
+- Every acceptance criterion must appear in the matrix — either covered or explicitly marked as not covered with a reason.
+- Every restriction must appear in the matrix.
+- NOT COVERED items must have a reason and a mitigation strategy (e.g., "covered at component test level" or "requires real hardware").
+- Coverage percentage should be at least 75% for acceptance criteria at the E2E level.
@@ -0,0 +1,99 @@
+# Risk Register Template
+
+Use this template for risk assessment. Save as `_docs/02_plans/risk_mitigations.md`.
+Subsequent iterations: `risk_mitigations_02.md`, `risk_mitigations_03.md`, etc.
+
+---
+
+```markdown
+# Risk Assessment — [Topic] — Iteration [##]
+
+## Risk Scoring Matrix
+
+|  | Low Impact | Medium Impact | High Impact |
+|--|------------|---------------|-------------|
+| **High Probability** | Medium | High | Critical |
+| **Medium Probability** | Low | Medium | High |
+| **Low Probability** | Low | Low | Medium |
+
+## Acceptance Criteria by Risk Level
+
+| Level | Action Required |
+|-------|----------------|
+| Low | Accepted, monitored quarterly |
+| Medium | Mitigation plan required before implementation |
+| High | Mitigation + contingency plan required, reviewed weekly |
+| Critical | Must be resolved before proceeding to next planning step |
+
+## Risk Register
+
+| ID | Risk | Category | Probability | Impact | Score | Mitigation | Owner | Status |
+|----|------|----------|-------------|--------|-------|------------|-------|--------|
+| R01 | [risk description] | [category] | High/Med/Low | High/Med/Low | Critical/High/Med/Low | [mitigation strategy] | [owner] | Open/Mitigated/Accepted |
+| R02 | | | | | | | | |
+
+## Risk Categories
+
+### Technical Risks
+- Technology choices may not meet requirements
+- Integration complexity underestimated
+- Performance targets unachievable
+- Security vulnerabilities in design
+- Data model cannot support future requirements
+
+### Schedule Risks
+- Dependencies delayed
+- Scope creep from ambiguous requirements
+- Underestimated complexity
+
+### Resource Risks
+- Key person dependency
+- Team lacks experience with chosen technology
+- Infrastructure not available in time
+
+### External Risks
+- Third-party API changes or deprecation
+- Vendor reliability or pricing changes
+- Regulatory or compliance changes
+- Data source availability
+
+## Detailed Risk Analysis
+
+### R01: [Risk Title]
+
+**Description**: [Detailed description of the risk]
+
+**Trigger conditions**: [What would cause this risk to materialize]
+
+**Affected components**: [List of components impacted]
+
+**Mitigation strategy**:
+1. [Action 1]
+2. [Action 2]
+
+**Contingency plan**: [What to do if mitigation fails]
+
+**Residual risk after mitigation**: [Low/Medium/High]
+
+**Documents updated**: [List architecture/component docs that were updated to reflect this mitigation]
+
+---
+
+### R02: [Risk Title]
+
+(repeat structure above)
+
+## Architecture/Component Changes Applied
+
+| Risk ID | Document Modified | Change Description |
+|---------|------------------|--------------------|
+| R01 | `architecture.md` §3 | [what changed] |
+| R01 | `components/02_[name]/description.md` §5 | [what changed] |
+
+## Summary
+
+**Total risks identified**: [N]
+**Critical**: [N] | **High**: [N] | **Medium**: [N] | **Low**: [N]
+**Risks mitigated this iteration**: [N]
+**Risks requiring user decision**: [list]
+```
@@ -0,0 +1,108 @@
+# System Flows Template
+
+Use this template for the system flows document. Save as `_docs/02_plans/system-flows.md`.
+Individual flow diagrams go in `_docs/02_plans/diagrams/flows/flow_[name].md`.
+
+---
+
+```markdown
+# [System Name] — System Flows
+
+## Flow Inventory
+
+| # | Flow Name | Trigger | Primary Components | Criticality |
+|---|-----------|---------|-------------------|-------------|
+| F1 | [name] | [user action / scheduled / event] | [component list] | High/Medium/Low |
+| F2 | [name] | | | |
+| ... | | | | |
+
+## Flow Dependencies
+
+| Flow | Depends On | Shares Data With |
+|------|-----------|-----------------|
+| F1 | — | F2 (via [entity]) |
+| F2 | F1 must complete first | F3 |
+
+---
+
+## Flow F1: [Flow Name]
+
+### Description
+
+[1-2 sentences: what this flow does, who triggers it, what the outcome is]
+
+### Preconditions
+
+- [Condition 1]
+- [Condition 2]
+
+### Sequence Diagram
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant ComponentA
+    participant ComponentB
+    participant Database
+
+    User->>ComponentA: [action]
+    ComponentA->>ComponentB: [call with params]
+    ComponentB->>Database: [query/write]
+    Database-->>ComponentB: [result]
+    ComponentB-->>ComponentA: [response]
+    ComponentA-->>User: [result]
+```
+
+### Flowchart
+
+```mermaid
+flowchart TD
+    Start([Trigger]) --> Step1[Step description]
+    Step1 --> Decision{Condition?}
+    Decision -->|Yes| Step2[Step description]
+    Decision -->|No| Step3[Step description]
+    Step2 --> EndNode([Result])
+    Step3 --> EndNode
+```
+
+### Data Flow
+
+| Step | From | To | Data | Format |
+|------|------|----|------|--------|
+| 1 | [source] | [destination] | [what data] | [DTO/event/etc] |
+| 2 | | | | |
+
+### Error Scenarios
+
+| Error | Where | Detection | Recovery |
+|-------|-------|-----------|----------|
+| [error type] | [which step] | [how detected] | [what happens] |
+
+### Performance Expectations
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| End-to-end latency | [target] | [conditions] |
+| Throughput | [target] | [peak/sustained] |
+
+---
+
+## Flow F2: [Flow Name]
+
+(repeat structure above)
+```
+
+---
+
+## Mermaid Diagram Conventions
+
+Follow these conventions for consistency across all flow diagrams:
+
+- **Participants**: use component names matching `components/[##]_[name]`
+- **Node IDs**: camelCase, no spaces (e.g., `validateInput`, `saveOrder`)
+- **Decision nodes**: use `{Question?}` format
+- **Start/End**: use `([label])` stadium shape
+- **External systems**: use `[[label]]` subroutine shape
+- **Subgraphs**: group by component or bounded context
+- **No styling**: do not add colors or CSS classes — let the renderer theme handle it
+- **Edge labels**: wrap special characters in quotes (e.g., `-->|"O(n) check"|`)
@@ -0,0 +1,172 @@
+# Test Specification Template
+
+Use this template for each component's test spec. Save as `components/[##]_[name]/tests.md`.
+
+---
+
+```markdown
+# Test Specification — [Component Name]
+
+## Acceptance Criteria Traceability
+
+| AC ID | Acceptance Criterion | Test IDs | Coverage |
+|-------|---------------------|----------|----------|
+| AC-01 | [criterion from acceptance_criteria.md] | IT-01, AT-01 | Covered |
+| AC-02 | [criterion] | PT-01 | Covered |
+| AC-03 | [criterion] | — | NOT COVERED — [reason] |
+
+---
+
+## Integration Tests
+
+### IT-01: [Test Name]
+
+**Summary**: [One sentence: what this test verifies]
+
+**Traces to**: AC-01, AC-03
+
+**Description**: [Detailed test scenario]
+
+**Input data**:
+```
+[specific input data for this test]
+```
+
+**Expected result**:
+```
+[specific expected output or state]
+```
+
+**Max execution time**: [e.g., 5s]
+
+**Dependencies**: [other components/services that must be running]
+
+---
+
+### IT-02: [Test Name]
+
+(repeat structure)
+
+---
+
+## Performance Tests
+
+### PT-01: [Test Name]
+
+**Summary**: [One sentence: what performance aspect is tested]
+
+**Traces to**: AC-02
+
+**Load scenario**:
+- Concurrent users: [N]
+- Request rate: [N req/s]
+- Duration: [N minutes]
+- Ramp-up: [strategy]
+
+**Expected results**:
+
+| Metric | Target | Failure Threshold |
+|--------|--------|-------------------|
+| Latency (p50) | [target] | [max] |
+| Latency (p95) | [target] | [max] |
+| Latency (p99) | [target] | [max] |
+| Throughput | [target req/s] | [min req/s] |
+| Error rate | [target %] | [max %] |
+
+**Resource limits**:
+- CPU: [max %]
+- Memory: [max MB/GB]
+- Database connections: [max pool size]
+
+---
+
+### PT-02: [Test Name]
+
+(repeat structure)
+
+---
+
+## Security Tests
+
+### ST-01: [Test Name]
+
+**Summary**: [One sentence: what security aspect is tested]
+
+**Traces to**: AC-04
+
+**Attack vector**: [e.g., SQL injection on search endpoint, privilege escalation via direct ID access]
+
+**Test procedure**:
+1. [Step 1]
+2. [Step 2]
+
+**Expected behavior**: [what the system should do — reject, sanitize, log, etc.]
+
+**Pass criteria**: [specific measurable condition]
+
+**Fail criteria**: [what constitutes a failure]
+
+---
+
+### ST-02: [Test Name]
+
+(repeat structure)
+
+---
+
+## Acceptance Tests
+
+### AT-01: [Test Name]
+
+**Summary**: [One sentence: what user-facing behavior is verified]
+
+**Traces to**: AC-01
+
+**Preconditions**:
+- [Precondition 1]
+- [Precondition 2]
+
+**Steps**:
+
+| Step | Action | Expected Result |
+|------|--------|-----------------|
+| 1 | [user action] | [expected outcome] |
+| 2 | [user action] | [expected outcome] |
+| 3 | [user action] | [expected outcome] |
+
+---
+
+### AT-02: [Test Name]
+
+(repeat structure)
+
+---
+
+## Test Data Management
+
+**Required test data**:
+
+| Data Set | Description | Source | Size |
+|----------|-------------|--------|------|
+| [name] | [what it contains] | [generated / fixture / copy of prod subset] | [approx size] |
+
+**Setup procedure**:
+1. [How to prepare the test environment]
+2. [How to load test data]
+
+**Teardown procedure**:
+1. [How to clean up after tests]
+2. [How to restore initial state]
+
+**Data isolation strategy**: [How tests are isolated from each other — separate DB, transactions, namespacing]
+```
+
+---
+
+## Guidance Notes
+
+- Every test MUST trace back to at least one acceptance criterion (AC-XX). If a test doesn't trace to any, question whether it's needed.
+- If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2").
+- Performance test targets should come from the NFR section in `architecture.md`.
+- Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component.
+- Not every component needs all 4 test types. A stateless utility component may only need integration tests.
@@ -0,0 +1,240 @@
+---
+name: problem
+description: |
+  Interactive problem gathering skill that builds _docs/00_problem/ through structured interview.
+  Iteratively asks probing questions until the problem, restrictions, acceptance criteria, and input data
+  are fully understood. Produces all required files for downstream skills (research, plan, etc.).
+  Trigger phrases:
+  - "problem", "define problem", "problem gathering"
+  - "what am I building", "describe problem"
+  - "start project", "new project"
+category: build
+tags: [problem, gathering, interview, requirements, acceptance-criteria]
+disable-model-invocation: true
+---
+
+# Problem Gathering
+
+Build a complete problem definition through structured, interactive interview with the user. Produces all required files in `_docs/00_problem/` that downstream skills (research, plan, decompose, implement, deploy) depend on.
+
+## Core Principles
+
+- **Ask, don't assume**: never infer requirements the user hasn't stated
+- **Exhaust before writing**: keep asking until all dimensions are covered; do not write files prematurely
+- **Concrete over vague**: push for measurable values, specific constraints, real numbers
+- **Save immediately**: once the user confirms, write all files at once
+- **User is the authority**: the AI suggests, the user decides
+
+## Context Resolution
+
+Fixed paths:
+
+- OUTPUT_DIR: `_docs/00_problem/`
+- INPUT_DATA_DIR: `_docs/00_problem/input_data/`
+
+## Prerequisite Checks
+
+1. If OUTPUT_DIR already exists and contains files, present what exists and ask user: **resume and fill gaps, overwrite, or skip?**
+2. If overwrite or fresh start, create OUTPUT_DIR and INPUT_DATA_DIR
+
+## Completeness Criteria
+
+The interview is complete when the AI can write ALL of these:
+
+| File | Complete when |
+|------|--------------|
+| `problem.md` | Clear problem statement: what is being built, why, for whom, what it does |
+| `restrictions.md` | All constraints identified: hardware, software, environment, operational, regulatory, budget, timeline |
+| `acceptance_criteria.md` | Measurable success criteria with specific numeric targets grouped by category |
+| `input_data/` | At least one reference data file or detailed data description document |
+| `security_approach.md` | (optional) Security requirements identified, or explicitly marked as not applicable |
+
+## Interview Protocol
+
+### Phase 1: Open Discovery
+
+Start with broad, open questions. Let the user describe the problem in their own words.
+
+**Opening**: Ask the user to describe what they are building and what problem it solves. Do not interrupt or narrow down yet.
+
+After the user responds, summarize what you understood and ask: "Did I get this right? What did I miss?"
+
+### Phase 2: Structured Probing
+
+Work through each dimension systematically. For each dimension, ask only what the user hasn't already covered. Skip dimensions that were fully answered in Phase 1.
+
+**Dimension checklist:**
+
+1. **Problem & Goals**
+   - What exactly does the system do?
+   - What problem does it solve? Why does it need to exist?
+   - Who are the users / operators / stakeholders?
+   - What is the expected usage pattern (frequency, load, environment)?
+
+2. **Scope & Boundaries**
+   - What is explicitly IN scope?
+   - What is explicitly OUT of scope?
+   - Are there related systems this integrates with?
+   - What does the system NOT do (common misconceptions)?
+
+3. **Hardware & Environment**
+   - What hardware does it run on? (CPU, GPU, memory, storage)
+   - What operating system / platform?
+   - What is the deployment environment? (cloud, edge, embedded, on-prem)
+   - Any physical constraints? (power, thermal, size, connectivity)
+
+4. **Software & Tech Constraints**
+   - Required programming languages or frameworks?
+   - Required protocols or interfaces?
+   - Existing systems it must integrate with?
+   - Libraries or tools that must or must not be used?
+
+5. **Acceptance Criteria**
+   - What does "done" look like?
+   - Performance targets: latency, throughput, accuracy, error rates?
+   - Quality bars: reliability, availability, recovery time?
+   - Push for specific numbers: "less than Xms", "above Y%", "within Z meters"
+   - Edge cases: what happens when things go wrong?
+   - Startup and shutdown behavior?
+
+6. **Input Data**
+   - What data does the system consume?
+   - Formats, schemas, volumes, update frequency?
+   - Does the user have sample/reference data to provide?
+   - If no data exists yet, what would representative data look like?
+
+7. **Security** (optional, probe gently)
+   - Authentication / authorization requirements?
+   - Data sensitivity (PII, classified, proprietary)?
+   - Communication security (encryption, TLS)?
+   - If the user says "not a concern", mark as N/A and move on
+
+8. **Operational Constraints**
+   - Budget constraints?
+   - Timeline constraints?
+   - Team size / expertise constraints?
+   - Regulatory or compliance requirements?
+   - Geographic restrictions?
+
+### Phase 3: Gap Analysis
+
+After all dimensions are covered:
+
+1. Internally assess completeness against the Completeness Criteria table
+2. Present a completeness summary to the user:
+
+```
+Completeness Check:
+- problem.md:             READY / GAPS: [list missing aspects]
+- restrictions.md:        READY / GAPS: [list missing aspects]
+- acceptance_criteria.md: READY / GAPS: [list missing aspects]
+- input_data/:            READY / GAPS: [list missing aspects]
+- security_approach.md:   READY / N/A / GAPS: [list missing aspects]
+```
+
+3. If gaps exist, ask targeted follow-up questions for each gap
+4. Repeat until all required files show READY
+
+### Phase 4: Draft & Confirm
+
+1. Draft all files in the conversation (show the user what will be written)
+2. Present each file's content for review
+3. Ask: "Should I save these files? Any changes needed?"
+4. Apply any requested changes
+5. Save all files to OUTPUT_DIR
+
+## Output File Formats
+
+### problem.md
+
+Free-form text. Clear, concise description of:
+- What is being built
+- What problem it solves
+- How it works at a high level
+- Key context the reader needs to understand the problem
+
+No headers required. Paragraph format. Should be readable by someone unfamiliar with the project.
+
+### restrictions.md
+
+Categorized constraints with markdown headers and bullet points:
+
+```markdown
+# [Category Name]
+
+- Constraint description with specific values where applicable
+- Another constraint
+```
+
+Categories are derived from the interview (hardware, software, environment, operational, etc.). Each restriction should be specific and testable.
+
+### acceptance_criteria.md
+
+Categorized measurable criteria with markdown headers and bullet points:
+
+```markdown
+# [Category Name]
+
+- Criterion with specific numeric target
+- Another criterion with measurable threshold
+```
+
+Every criterion must have a measurable value. Vague criteria like "should be fast" are not acceptable — push for "less than 400ms end-to-end".
+
+### input_data/
+
+At least one file. Options:
+- User provides actual data files (CSV, JSON, images, etc.) — save as-is
+- User describes data parameters — save as `data_parameters.md`
+- User provides URLs to data — save as `data_sources.md` with links and descriptions
+
+### security_approach.md (optional)
+
+If security requirements exist, document them. If the user says security is not a concern for this project, skip this file entirely.
+
+## Progress Tracking
+
+Create a TodoWrite with phases 1-4. Update as each phase completes.
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| User cannot provide acceptance criteria numbers | Suggest industry benchmarks, ASK user to confirm or adjust |
+| User has no input data at all | ASK what representative data would look like, create a `data_parameters.md` describing expected data |
+| User says "I don't know" to a critical dimension | Research the domain briefly, suggest reasonable defaults, ASK user to confirm |
+| Conflicting requirements discovered | Present the conflict, ASK user which takes priority |
+| User wants to skip a required file | Explain why downstream skills need it, ASK if they want a minimal placeholder |
+
+## Common Mistakes
+
+- **Writing files before the interview is complete**: gather everything first, then write
+- **Accepting vague criteria**: "fast", "accurate", "reliable" are not acceptance criteria without numbers
+- **Assuming technical choices**: do not suggest specific technologies unless the user constrains them
+- **Over-engineering the problem statement**: problem.md should be concise, not a dissertation
+- **Inventing restrictions**: only document what the user actually states as a constraint
+- **Skipping input data**: downstream skills (especially research and plan) need concrete data context
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│           Problem Gathering (4-Phase Interview)                 │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: Check if _docs/00_problem/ exists (resume/overwrite?)  │
+│                                                                │
+│ Phase 1: Open Discovery                                        │
+│   → "What are you building?" → summarize → confirm             │
+│ Phase 2: Structured Probing                                    │
+│   → 8 dimensions: problem, scope, hardware, software,          │
+│     acceptance criteria, input data, security, operations       │
+│   → skip what Phase 1 already covered                          │
+│ Phase 3: Gap Analysis                                          │
+│   → assess completeness per file → fill gaps iteratively       │
+│ Phase 4: Draft & Confirm                                       │
+│   → show all files → user confirms → save to _docs/00_problem/ │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Ask don't assume · Concrete over vague             │
+│             Exhaust before writing · User is authority          │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,471 @@
+---
+name: refactor
+description: |
+  Structured refactoring workflow (6-phase method) with three execution modes:
+  - Full Refactoring: all 6 phases — baseline, discovery, analysis, safety net, execution, hardening
+  - Targeted Refactoring: skip discovery if docs exist, focus on a specific component/area
+  - Quick Assessment: phases 0-2 only, outputs a refactoring plan without execution
+  Supports project mode (_docs/ structure) and standalone mode (@file.md).
+  Trigger phrases:
+  - "refactor", "refactoring", "improve code"
+  - "analyze coupling", "decoupling", "technical debt"
+  - "refactoring assessment", "code quality improvement"
+category: evolve
+tags: [refactoring, coupling, technical-debt, performance, hardening]
+disable-model-invocation: true
+---
+
+# Structured Refactoring (6-Phase Method)
+
+Transform existing codebases through a systematic refactoring workflow: capture baseline, document current state, research improvements, build safety net, execute changes, and harden.
+
+## Core Principles
+
+- **Preserve behavior first**: never refactor without a passing test suite
+- **Measure before and after**: every change must be justified by metrics
+- **Small incremental changes**: commit frequently, never break tests
+- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
+- **Ask, don't assume**: when scope or priorities are unclear, STOP and ask the user
+
+## Context Resolution
+
+Determine the operating mode based on invocation before any other logic runs.
+
+**Project mode** (no explicit input file provided):
+- PROBLEM_DIR: `_docs/00_problem/`
+- SOLUTION_DIR: `_docs/01_solution/`
+- COMPONENTS_DIR: `_docs/02_components/`
+- TESTS_DIR: `_docs/02_tests/`
+- REFACTOR_DIR: `_docs/04_refactoring/`
+- All existing guardrails apply.
+
+**Standalone mode** (explicit input file provided, e.g. `/refactor @some_component.md`):
+- INPUT_FILE: the provided file (treated as component/area description)
+- REFACTOR_DIR: `_standalone/refactoring/`
+- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
+- `acceptance_criteria.md` is optional — warn if absent
+
+Announce the detected mode and resolved paths to the user before proceeding.
+
+## Mode Detection
+
+After context resolution, determine the execution mode:
+
+1. **User explicitly says** "quick assessment" or "just assess" → **Quick Assessment**
+2. **User explicitly says** "refactor [component/file/area]" with a specific target → **Targeted Refactoring**
+3. **Default** → **Full Refactoring**
+
+| Mode | Phases Executed | When to Use |
+|------|----------------|-------------|
+| **Full Refactoring** | 0 → 1 → 2 → 3 → 4 → 5 | Complete refactoring of a system or major area |
+| **Targeted Refactoring** | 0 → (skip 1 if docs exist) → 2 → 3 → 4 → 5 | Refactor a specific component; docs already exist |
+| **Quick Assessment** | 0 → 1 → 2 | Produce a refactoring roadmap without executing changes |
+
+Inform the user which mode was detected and confirm before proceeding.
+
+## Prerequisite Checks (BLOCKING)
+
+**Project mode:**
+1. PROBLEM_DIR exists with `problem.md` (or `problem_description.md`) — **STOP if missing**, ask user to create it
+2. If `acceptance_criteria.md` is missing: **warn** and ask whether to proceed
+3. Create REFACTOR_DIR if it does not exist
+4. If REFACTOR_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
+
+**Standalone mode:**
+1. INPUT_FILE exists and is non-empty — **STOP if missing**
+2. Warn if no `acceptance_criteria.md` provided
+3. Create REFACTOR_DIR if it does not exist
+
+## Artifact Management
+
+### Directory Structure
+
+```
+REFACTOR_DIR/
+├── baseline_metrics.md          (Phase 0)
+├── discovery/
+│   ├── components/
+│   │   └── [##]_[name].md       (Phase 1)
+│   ├── solution.md              (Phase 1)
+│   └── system_flows.md          (Phase 1)
+├── analysis/
+│   ├── research_findings.md     (Phase 2)
+│   └── refactoring_roadmap.md   (Phase 2)
+├── test_specs/
+│   └── [##]_[test_name].md      (Phase 3)
+├── coupling_analysis.md         (Phase 4)
+├── execution_log.md             (Phase 4)
+├── hardening/
+│   ├── technical_debt.md        (Phase 5)
+│   ├── performance.md           (Phase 5)
+│   └── security.md              (Phase 5)
+└── FINAL_report.md              (after all phases)
+```
+
+### Save Timing
+
+| Phase | Save immediately after | Filename |
+|-------|------------------------|----------|
+| Phase 0 | Baseline captured | `baseline_metrics.md` |
+| Phase 1 | Each component documented | `discovery/components/[##]_[name].md` |
+| Phase 1 | Solution synthesized | `discovery/solution.md`, `discovery/system_flows.md` |
+| Phase 2 | Research complete | `analysis/research_findings.md` |
+| Phase 2 | Roadmap produced | `analysis/refactoring_roadmap.md` |
+| Phase 3 | Test specs written | `test_specs/[##]_[test_name].md` |
+| Phase 4 | Coupling analyzed | `coupling_analysis.md` |
+| Phase 4 | Execution complete | `execution_log.md` |
+| Phase 5 | Each hardening track | `hardening/<track>.md` |
+| Final | All phases done | `FINAL_report.md` |
+
+### Resumability
+
+If REFACTOR_DIR already contains artifacts:
+
+1. List existing files and match to the save timing table
+2. Identify the last completed phase based on which artifacts exist
+3. Resume from the next incomplete phase
+4. Inform the user which phases are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all applicable phases. Update status as each phase completes.
+
+## Workflow
+
+### Phase 0: Context & Baseline
+
+**Role**: Software engineer preparing for refactoring
+**Goal**: Collect refactoring goals and capture baseline metrics
+**Constraints**: Measurement only — no code changes
+
+#### 0a. Collect Goals
+
+If PROBLEM_DIR files do not yet exist, help the user create them:
+
+1. `problem.md` — what the system currently does, what changes are needed, pain points
+2. `acceptance_criteria.md` — success criteria for the refactoring
+3. `security_approach.md` — security requirements (if applicable)
+
+Store in PROBLEM_DIR.
+
+#### 0b. Capture Baseline
+
+1. Read problem description and acceptance criteria
+2. Measure current system metrics using project-appropriate tools:
+
+| Metric Category | What to Capture |
+|----------------|-----------------|
+| **Coverage** | Overall, unit, integration, critical paths |
+| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
+| **Code Smells** | Total, critical, major |
+| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
+| **Dependencies** | Total count, outdated, security vulnerabilities |
+| **Build** | Build time, test execution time, deployment time |
+
+3. Create functionality inventory: all features/endpoints with status and coverage
+
+**Self-verification**:
+- [ ] All metric categories measured (or noted as N/A with reason)
+- [ ] Functionality inventory is complete
+- [ ] Measurements are reproducible
+
+**Save action**: Write `REFACTOR_DIR/baseline_metrics.md`
+
+**BLOCKING**: Present baseline summary to user. Do NOT proceed until user confirms.
+
+---
+
+### Phase 1: Discovery
+
+**Role**: Principal software architect
+**Goal**: Generate documentation from existing code and form solution description
+**Constraints**: Document what exists, not what should be. No code changes.
+
+**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.
+
+#### 1a. Document Components
+
+For each component in the codebase:
+
+1. Analyze project structure, directories, files
+2. Go file by file, analyze each method
+3. Analyze connections between components
+
+Write per component to `REFACTOR_DIR/discovery/components/[##]_[name].md`:
+- Purpose and architectural patterns
+- Mermaid diagrams for logic flows
+- API reference table (name, description, input, output)
+- Implementation details: algorithmic complexity, state management, dependencies
+- Caveats, edge cases, known limitations
+
+#### 1b. Synthesize Solution & Flows
+
+1. Review all generated component documentation
+2. Synthesize into a cohesive solution description
+3. Create flow diagrams showing component interactions
+
+Write:
+- `REFACTOR_DIR/discovery/solution.md` — product description, component overview, interaction diagram
+- `REFACTOR_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case
+
+Also copy to project standard locations if in project mode:
+- `SOLUTION_DIR/solution.md`
+- `COMPONENTS_DIR/system_flows.md`
+
+**Self-verification**:
+- [ ] Every component in the codebase is documented
+- [ ] Solution description covers all components
+- [ ] Flow diagrams cover all major use cases
+- [ ] Mermaid diagrams are syntactically correct
+
+**Save action**: Write discovery artifacts
+
+**BLOCKING**: Present discovery summary to user. Do NOT proceed until user confirms documentation accuracy.
+
+---
+
+### Phase 2: Analysis
+
+**Role**: Researcher and software architect
+**Goal**: Research improvements and produce a refactoring roadmap
+**Constraints**: Analysis only — no code changes
+
+#### 2a. Deep Research
+
+1. Analyze current implementation patterns
+2. Research modern approaches for similar systems
+3. Identify what could be done differently
+4. Suggest improvements based on state-of-the-art practices
+
+Write `REFACTOR_DIR/analysis/research_findings.md`:
+- Current state analysis: patterns used, strengths, weaknesses
+- Alternative approaches per component: current vs alternative, pros/cons, migration effort
+- Prioritized recommendations: quick wins + strategic improvements
+
+#### 2b. Solution Assessment
+
+1. Assess current implementation against acceptance criteria
+2. Identify weak points in codebase, map to specific code areas
+3. Perform gap analysis: acceptance criteria vs current state
+4. Prioritize changes by impact and effort
+
+Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
+- Weak points assessment: location, description, impact, proposed solution
+- Gap analysis: what's missing, what needs improvement
+- Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
+
+**Self-verification**:
+- [ ] All acceptance criteria are addressed in gap analysis
+- [ ] Recommendations are grounded in actual code, not abstract
+- [ ] Roadmap phases are prioritized by impact
+- [ ] Quick wins are identified separately
+
+**Save action**: Write analysis artifacts
+
+**BLOCKING**: Present refactoring roadmap to user. Do NOT proceed until user confirms.
+
+**Quick Assessment mode stops here.** Present final summary and write `FINAL_report.md` with phases 0-2 content.
+
+---
+
+### Phase 3: Safety Net
+
+**Role**: QA engineer and developer
+**Goal**: Design and implement tests that capture current behavior before refactoring
+**Constraints**: Tests must all pass on the current codebase before proceeding
+
+#### 3a. Design Test Specs
+
+Coverage requirements (must meet before refactoring):
+- Minimum overall coverage: 75%
+- Critical path coverage: 90%
+- All public APIs must have integration tests
+- All error handling paths must be tested
+
+For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
+- Integration tests: summary, current behavior, input data, expected result, max expected time
+- Acceptance tests: summary, preconditions, steps with expected results
+- Coverage analysis: current %, target %, uncovered critical paths
+
+#### 3b. Implement Tests
+
+1. Set up test environment and infrastructure if not exists
+2. Implement each test from specs
+3. Run tests, verify all pass on current codebase
+4. Document any discovered issues
+
+**Self-verification**:
+- [ ] Coverage requirements met (75% overall, 90% critical paths)
+- [ ] All tests pass on current codebase
+- [ ] All public APIs have integration tests
+- [ ] Test data fixtures are configured
+
+**Save action**: Write test specs; implemented tests go into the project's test folder
+
+**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 4. If tests fail, fix the tests (not the code) or ask user for guidance. Do NOT proceed to Phase 4 with failing tests.
+
+---
+
+### Phase 4: Execution
+
+**Role**: Software architect and developer
+**Goal**: Analyze coupling and execute decoupling changes
+**Constraints**: Small incremental changes; tests must stay green after every change
+
+#### 4a. Analyze Coupling
+
+1. Analyze coupling between components/modules
+2. Map dependencies (direct and transitive)
+3. Identify circular dependencies
+4. Form decoupling strategy
+
+Write `REFACTOR_DIR/coupling_analysis.md`:
+- Dependency graph (Mermaid)
+- Coupling metrics per component
+- Problem areas: components involved, coupling type, severity, impact
+- Decoupling strategy: priority order, proposed interfaces/abstractions, effort estimates
+
+**BLOCKING**: Present coupling analysis to user. Do NOT proceed until user confirms strategy.
+
+#### 4b. Execute Decoupling
+
+For each change in the decoupling strategy:
+
+1. Implement the change
+2. Run integration tests
+3. Fix any failures
+4. Commit with descriptive message
+
+Address code smells encountered: long methods, large classes, duplicate code, dead code, magic numbers.
+
+Write `REFACTOR_DIR/execution_log.md`:
+- Change description, files affected, test status per change
+- Before/after metrics comparison against baseline
+
+**Self-verification**:
+- [ ] All tests still pass after execution
+- [ ] No circular dependencies remain (or reduced per plan)
+- [ ] Code smells addressed
+- [ ] Metrics improved compared to baseline
+
+**Save action**: Write execution artifacts
+
+**BLOCKING**: Present execution summary to user. Do NOT proceed until user confirms.
+
+---
+
+### Phase 5: Hardening (Optional, Parallel Tracks)
+
+**Role**: Varies per track
+**Goal**: Address technical debt, performance, and security
+**Constraints**: Each track is optional; user picks which to run
+
+Present the three tracks and let user choose which to execute:
+
+#### Track A: Technical Debt
+
+**Role**: Technical debt analyst
+
+1. Identify and categorize debt items: design, code, test, documentation
+2. Assess each: location, description, impact, effort, interest (cost of not fixing)
+3. Prioritize: quick wins → strategic debt → tolerable debt
+4. Create actionable plan with prevention measures
+
+Write `REFACTOR_DIR/hardening/technical_debt.md`
+
+#### Track B: Performance Optimization
+
+**Role**: Performance engineer
+
+1. Profile current performance, identify bottlenecks
+2. For each bottleneck: location, symptom, root cause, impact
+3. Propose optimizations with expected improvement and risk
+4. Implement one at a time, benchmark after each change
+5. Verify tests still pass
+
+Write `REFACTOR_DIR/hardening/performance.md` with before/after benchmarks
+
+#### Track C: Security Review
+
+**Role**: Security engineer
+
+1. Review code against OWASP Top 10
+2. Verify security requirements from `security_approach.md` are met
+3. Check: authentication, authorization, input validation, output encoding, encryption, logging
+
+Write `REFACTOR_DIR/hardening/security.md`:
+- Vulnerability assessment: location, type, severity, exploit scenario, fix
+- Security controls review
+- Compliance check against `security_approach.md`
+- Recommendations: critical fixes, improvements, hardening
+
+**Self-verification** (per track):
+- [ ] All findings are grounded in actual code
+- [ ] Recommendations are actionable with effort estimates
+- [ ] All tests still pass after any changes
+
+**Save action**: Write hardening artifacts
+
+---
+
+## Final Report
+
+After all executed phases complete, write `REFACTOR_DIR/FINAL_report.md`:
+
+- Refactoring mode used and phases executed
+- Baseline metrics vs final metrics comparison
+- Changes made summary
+- Remaining items (deferred to future)
+- Lessons learned
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Unclear refactoring scope | **ASK user** |
+| Ambiguous acceptance criteria | **ASK user** |
+| Tests failing before refactoring | **ASK user** — fix tests or fix code? |
+| Coupling change risks breaking external contracts | **ASK user** |
+| Performance optimization vs readability trade-off | **ASK user** |
+| Missing baseline metrics (no test suite, no CI) | **WARN user**, suggest building safety net first |
+| Security vulnerability found during refactoring | **WARN user** immediately, don't defer |
+
+## Trigger Conditions
+
+When the user wants to:
+- Improve existing code structure or quality
+- Reduce technical debt or coupling
+- Prepare codebase for new features
+- Assess code health before major changes
+
+**Keywords**: "refactor", "refactoring", "improve code", "reduce coupling", "technical debt", "code quality", "decoupling"
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│           Structured Refactoring (6-Phase Method)              │
+├────────────────────────────────────────────────────────────────┤
+│ CONTEXT: Resolve mode (project vs standalone) + set paths      │
+│ MODE: Full / Targeted / Quick Assessment                       │
+│                                                                │
+│ 0. Context & Baseline  → baseline_metrics.md                   │
+│    [BLOCKING: user confirms baseline]                          │
+│ 1. Discovery           → discovery/ (components, solution)     │
+│    [BLOCKING: user confirms documentation]                     │
+│ 2. Analysis            → analysis/ (research, roadmap)         │
+│    [BLOCKING: user confirms roadmap]                           │
+│    ── Quick Assessment stops here ──                           │
+│ 3. Safety Net          → test_specs/ + implemented tests       │
+│    [GATE: all tests must pass]                                 │
+│ 4. Execution           → coupling_analysis, execution_log      │
+│    [BLOCKING: user confirms changes]                           │
+│ 5. Hardening           → hardening/ (debt, perf, security)     │
+│    [optional, user picks tracks]                               │
+│    ─────────────────────────────────────────────────           │
+│    FINAL_report.md                                             │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Preserve behavior · Measure before/after           │
+│             Small changes · Save immediately · Ask don't assume│
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,708 @@
+---
+name: deep-research
+description: |
+  Deep Research Methodology (8-Step Method) with two execution modes:
+  - Mode A (Initial Research): Assess acceptance criteria, then research problem and produce solution draft
+  - Mode B (Solution Assessment): Assess existing solution draft for weak points and produce revised draft
+  Supports project mode (_docs/ structure) and standalone mode (@file.md).
+  Auto-detects research mode based on existing solution_draft files.
+  Trigger phrases:
+  - "research", "deep research", "deep dive", "in-depth analysis"
+  - "research this", "investigate", "look into"
+  - "assess solution", "review solution draft"
+  - "comparative analysis", "concept comparison", "technical comparison"
+category: build
+tags: [research, analysis, solution-design, comparison, decision-support]
+---
+
+# Deep Research (8-Step Method)
+
+Transform vague topics raised by users into high-quality, deliverable research reports through a systematic methodology. Operates in two modes: **Initial Research** (produce new solution draft) and **Solution Assessment** (assess and revise existing draft).
+
+## Core Principles
+
+- **Conclusions come from mechanism comparison, not "gut feelings"**
+- **Pin down the facts first, then reason**
+- **Prioritize authoritative sources: L1 > L2 > L3 > L4**
+- **Intermediate results must be saved for traceability and reuse**
+- **Ask, don't assume** — when any aspect of the problem, criteria, or restrictions is unclear, STOP and ask the user before proceeding
+
+## Context Resolution
+
+Determine the operating mode based on invocation before any other logic runs.
+
+**Project mode** (no explicit input file provided):
+- INPUT_DIR: `_docs/00_problem/`
+- OUTPUT_DIR: `_docs/01_solution/`
+- RESEARCH_DIR: `_docs/00_research/`
+- All existing guardrails, mode detection, and draft numbering apply as-is.
+
+**Standalone mode** (explicit input file provided, e.g. `/research @some_doc.md`):
+- INPUT_FILE: the provided file (treated as problem description)
+- OUTPUT_DIR: `_standalone/01_solution/`
+- RESEARCH_DIR: `_standalone/00_research/`
+- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
+- `restrictions.md` and `acceptance_criteria.md` are optional — warn if absent, proceed if user confirms
+- Mode detection uses OUTPUT_DIR for `solution_draft*.md` scanning
+- Draft numbering works the same, scoped to OUTPUT_DIR
+- **Final step**: after all research is complete, move INPUT_FILE into `_standalone/`
+
+Announce the detected mode and resolved paths to the user before proceeding.
+
+## Project Integration
+
+### Prerequisite Guardrails (BLOCKING)
+
+Before any research begins, verify the input context exists. **Do not proceed if guardrails fail.**
+
+**Project mode:**
+1. Check INPUT_DIR exists — **STOP if missing**, ask user to create it and provide problem files
+2. Check `problem.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
+3. Check `restrictions.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
+4. Check `acceptance_criteria.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
+5. Check `input_data/` in INPUT_DIR exists and contains at least one file — **STOP if missing**
+6. Read **all** files in INPUT_DIR to ground the investigation in the project context
+7. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
+
+**Standalone mode:**
+1. Check INPUT_FILE exists and is non-empty — **STOP if missing**
+2. Warn if no `restrictions.md` or `acceptance_criteria.md` were provided alongside INPUT_FILE — proceed if user confirms
+3. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
+
+### Mode Detection
+
+After guardrails pass, determine the execution mode:
+
+1. Scan OUTPUT_DIR for files matching `solution_draft*.md`
+2. **No matches found** → **Mode A: Initial Research**
+3. **Matches found** → **Mode B: Solution Assessment** (use the highest-numbered draft as input)
+4. **User override**: if the user explicitly says "research from scratch" or "initial research", force Mode A regardless of existing drafts
+
+Inform the user which mode was detected and confirm before proceeding.
+
+### Solution Draft Numbering
+
+All final output is saved as `OUTPUT_DIR/solution_draft##.md` with a 2-digit zero-padded number:
+
+1. Scan existing files in OUTPUT_DIR matching `solution_draft*.md`
+2. Extract the highest existing number
+3. Increment by 1
+4. Zero-pad to 2 digits (e.g., `01`, `02`, ..., `10`, `11`)
+
+Example: if `solution_draft01.md` through `solution_draft10.md` exist, the next output is `solution_draft11.md`.
+
+### Working Directory & Intermediate Artifact Management
+
+#### Directory Structure
+
+At the start of research, **must** create a working directory under RESEARCH_DIR:
+
+```
+RESEARCH_DIR/
+├── 00_ac_assessment.md            # Mode A Phase 1 output: AC & restrictions assessment
+├── 00_question_decomposition.md   # Step 0-1 output
+├── 01_source_registry.md          # Step 2 output: all consulted source links
+├── 02_fact_cards.md               # Step 3 output: extracted facts
+├── 03_comparison_framework.md     # Step 4 output: selected framework and populated data
+├── 04_reasoning_chain.md          # Step 6 output: fact → conclusion reasoning
+├── 05_validation_log.md           # Step 7 output: use-case validation results
+└── raw/                           # Raw source archive (optional)
+    ├── source_1.md
+    └── source_2.md
+```
+
+### Save Timing & Content
+
+| Step | Save immediately after completion | Filename |
+|------|-----------------------------------|----------|
+| Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
+| Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
+| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
+| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
+| Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
+| Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
+| Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
+| Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
+
+### Save Principles
+
+1. **Save immediately**: Write to the corresponding file as soon as a step is completed; don't wait until the end
+2. **Incremental updates**: Same file can be updated multiple times; append or replace new content
+3. **Preserve process**: Keep intermediate files even after their content is integrated into the final report
+4. **Enable recovery**: If research is interrupted, progress can be recovered from intermediate files
+
+## Execution Flow
+
+### Mode A: Initial Research
+
+Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the user explicitly requests initial research.
+
+#### Phase 1: AC & Restrictions Assessment (BLOCKING)
+
+**Role**: Professional software architect
+
+A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
+
+**Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
+
+**Task**:
+1. Read all problem context files thoroughly
+2. **ASK the user about every unclear aspect** — do not assume:
+   - Unclear problem boundaries → ask
+   - Ambiguous acceptance criteria values → ask
+   - Missing context (no `security_approach.md`, no `input_data/`) → ask what they have
+   - Conflicting restrictions → ask which takes priority
+3. Research in internet:
+   - How realistic are the acceptance criteria for this specific domain?
+   - How critical is each criterion?
+   - What domain-specific acceptance criteria are we missing?
+   - Impact of each criterion value on the whole system quality
+   - Cost/budget implications of each criterion
+   - Timeline implications — how long would it take to meet each criterion
+4. Research restrictions:
+   - Are the restrictions realistic?
+   - Should any be tightened or relaxed?
+   - Are there additional restrictions we should add?
+5. Verify findings with authoritative sources (official docs, papers, benchmarks)
+
+**Uses Steps 0-3 of the 8-step engine** (question classification, decomposition, source tiering, fact extraction) scoped to AC and restrictions assessment.
+
+**📁 Save action**: Write `RESEARCH_DIR/00_ac_assessment.md` with format:
+
+```markdown
+# Acceptance Criteria Assessment
+
+## Acceptance Criteria
+
+| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
+|-----------|-----------|-------------------|---------------------|--------|
+| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
+
+## Restrictions Assessment
+
+| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
+|-------------|-----------|-------------------|---------------------|--------|
+| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
+
+## Key Findings
+[Summary of critical findings]
+
+## Sources
+[Key references used]
+```
+
+**BLOCKING**: Present the AC assessment tables to the user. Wait for confirmation or adjustments before proceeding to Phase 2. The user may update `acceptance_criteria.md` or `restrictions.md` based on findings.
+
+---
+
+#### Phase 2: Problem Research & Solution Draft
+
+**Role**: Professional researcher and software architect
+
+Full 8-step research methodology. Produces the first solution draft.
+
+**Input**: All files from INPUT_DIR (possibly updated after Phase 1) + Phase 1 artifacts
+
+**Task** (drives the 8-step engine):
+1. Research existing/competitor solutions for similar problems
+2. Research the problem thoroughly — all possible ways to solve it, split into components
+3. For each component, research all possible solutions and find the most efficient state-of-the-art approaches
+4. Verify that suggested tools/libraries actually exist and work as described
+5. Include security considerations in each component analysis
+6. Provide rough cost estimates for proposed solutions
+
+Be concise in formulating. The fewer words, the better, but do not miss any important details.
+
+**📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
+
+---
+
+#### Phase 3: Tech Stack Consolidation (OPTIONAL)
+
+**Role**: Software architect evaluating technology choices
+
+Focused synthesis step — no new 8-step cycle. Uses research already gathered in Phase 2 to make concrete technology decisions.
+
+**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + all files from INPUT_DIR
+
+**Task**:
+1. Extract technology options from the solution draft's component comparison tables
+2. Score each option against: fitness for purpose, maturity, security track record, team expertise, cost, scalability
+3. Produce a tech stack summary with selection rationale
+4. Assess risks and learning requirements per technology choice
+
+**📁 Save action**: Write `OUTPUT_DIR/tech_stack.md` with:
+- Requirements analysis (functional, non-functional, constraints)
+- Technology evaluation tables (language, framework, database, infrastructure, key libraries) with scores
+- Tech stack summary block
+- Risk assessment and learning requirements tables
+
+---
+
+#### Phase 4: Security Deep Dive (OPTIONAL)
+
+**Role**: Security architect
+
+Focused analysis step — deepens the security column from the solution draft into a proper threat model and controls specification.
+
+**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + `security_approach.md` from INPUT_DIR + problem context
+
+**Task**:
+1. Build threat model: asset inventory, threat actors, attack vectors
+2. Define security requirements and proposed controls per component (with risk level)
+3. Summarize authentication/authorization, data protection, secure communication, and logging/monitoring approach
+
+**📁 Save action**: Write `OUTPUT_DIR/security_analysis.md` with:
+- Threat model (assets, actors, vectors)
+- Per-component security requirements and controls table
+- Security controls summary
+
+---
+
+### Mode B: Solution Assessment
+
+Triggered when `solution_draft*.md` files exist in OUTPUT_DIR.
+
+**Role**: Professional software architect
+
+Full 8-step research methodology applied to assessing and improving an existing solution draft.
+
+**Input**: All files from INPUT_DIR + the latest (highest-numbered) `solution_draft##.md` from OUTPUT_DIR
+
+**Task** (drives the 8-step engine):
+1. Read the existing solution draft thoroughly
+2. Research in internet — identify all potential weak points and problems
+3. Identify security weak points and vulnerabilities
+4. Identify performance bottlenecks
+5. Address these problems and find ways to solve them
+6. Based on findings, form a new solution draft in the same format
+
+**📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
+
+**Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions above.
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Unclear problem boundaries | **ASK user** |
+| Ambiguous acceptance criteria values | **ASK user** |
+| Missing context files (`security_approach.md`, `input_data/`) | **ASK user** what they have |
+| Conflicting restrictions | **ASK user** which takes priority |
+| Technology choice with multiple valid options | **ASK user** |
+| Contradictions between input files | **ASK user** |
+| Missing acceptance criteria or restrictions files | **WARN user**, ask whether to proceed |
+| File naming within research artifacts | PROCEED |
+| Source tier classification | PROCEED |
+
+## Trigger Conditions
+
+When the user wants to:
+- Deeply understand a concept/technology/phenomenon
+- Compare similarities and differences between two or more things
+- Gather information and evidence for a decision
+- Assess or improve an existing solution draft
+
+**Keywords**:
+- "deep research", "deep dive", "in-depth analysis"
+- "research this", "investigate", "look into"
+- "assess solution", "review draft", "improve solution"
+- "comparative analysis", "concept comparison", "technical comparison"
+
+**Differentiation from other Skills**:
+- Needs a **visual knowledge graph** → use `research-to-diagram`
+- Needs **written output** (articles/tutorials) → use `wsy-writer`
+- Needs **material organization** → use `material-to-markdown`
+- Needs **research + solution draft** → use this Skill
+
+## Research Engine (8-Step Method)
+
+The 8-step method is the core research engine used by both modes. Steps 0-1 and Step 8 have mode-specific behavior; Steps 2-7 are identical regardless of mode.
+
+### Step 0: Question Type Classification
+
+First, classify the research question type and select the corresponding strategy:
+
+| Question Type | Core Task | Focus Dimensions |
+|---------------|-----------|------------------|
+| **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
+| **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
+| **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
+| **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
+| **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |
+
+**Mode-specific classification**:
+
+| Mode / Phase | Typical Question Type |
+|--------------|----------------------|
+| Mode A Phase 1 | Knowledge Organization + Decision Support |
+| Mode A Phase 2 | Decision Support |
+| Mode B | Problem Diagnosis + Decision Support |
+
+### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)
+
+Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.
+
+**For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`
+
+Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.
+
+**📁 Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`
+
+---
+
+### Step 1: Question Decomposition & Boundary Definition
+
+**Mode-specific sub-questions**:
+
+**Mode A Phase 2** (Initial Research — Problem & Solution):
+- "What existing/competitor solutions address this problem?"
+- "What are the component parts of this problem?"
+- "For each component, what are the state-of-the-art solutions?"
+- "What are the security considerations per component?"
+- "What are the cost implications of each approach?"
+
+**Mode B** (Solution Assessment):
+- "What are the weak points and potential problems in the existing draft?"
+- "What are the security vulnerabilities in the proposed architecture?"
+- "Where are the performance bottlenecks?"
+- "What solutions exist for each identified issue?"
+
+**General sub-question patterns** (use when applicable):
+- **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
+- **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
+- **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
+- **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)
+
+**⚠️ Research Subject Boundary Definition (BLOCKING - must be explicit)**:
+
+When decomposing questions, you must explicitly define the **boundaries of the research subject**:
+
+| Dimension | Boundary to define | Example |
+|-----------|--------------------|---------|
+| **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
+| **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
+| **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
+| **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
+
+**Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.
+
+**📁 Save action**:
+1. Read all files from INPUT_DIR to ground the research in the project context
+2. Create working directory `RESEARCH_DIR/`
+3. Write `00_question_decomposition.md`, including:
+   - Original question
+   - Active mode (A Phase 2 or B) and rationale
+   - Summary of relevant problem context from INPUT_DIR
+   - Classified question type and rationale
+   - **Research subject boundary definition** (population, geography, timeframe, level)
+   - List of decomposed sub-questions
+4. Write TodoWrite to track progress
+
+### Step 2: Source Tiering & Authority Anchoring
+
+Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.
+
+**For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`
+
+**Tool Usage**:
+- Use `WebSearch` for broad searches; `WebFetch` to read specific pages
+- Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
+- Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
+- When citing web sources, include the URL and date accessed
+
+**📁 Save action**:
+For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
+
+### Step 3: Fact Extraction & Evidence Cards
+
+Transform sources into **verifiable fact cards**:
+
+```markdown
+## Fact Cards
+
+### Fact 1
+- **Statement**: [specific fact description]
+- **Source**: [link/document section]
+- **Confidence**: High/Medium/Low
+
+### Fact 2
+...
+```
+
+**Key discipline**:
+- Pin down facts first, then reason
+- Distinguish "what officials said" from "what I infer"
+- When conflicting information is found, annotate and preserve both sides
+- Annotate confidence level:
+  - ✅ High: Explicitly stated in official documentation
+  - ⚠️ Medium: Mentioned in official blog but not formally documented
+  - ❓ Low: Inference or from unofficial sources
+
+**📁 Save action**:
+For each extracted fact, **immediately** append to `02_fact_cards.md`:
+```markdown
+## Fact #[number]
+- **Statement**: [specific fact description]
+- **Source**: [Source #number] [link]
+- **Phase**: [Phase 1 / Phase 2 / Assessment]
+- **Target Audience**: [which group this fact applies to, inherited from source or further refined]
+- **Confidence**: ✅/⚠️/❓
+- **Related Dimension**: [corresponding comparison dimension]
+```
+
+**⚠️ Target audience in fact statements**:
+- If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
+- Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
+- Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"
+
+### Step 4: Build Comparison/Analysis Framework
+
+Based on the question type, select fixed analysis dimensions. **For dimension lists** (General, Concept Comparison, Decision Support): Read `references/comparison-frameworks.md`
+
+**📁 Save action**:
+Write to `03_comparison_framework.md`:
+```markdown
+# Comparison Framework
+
+## Selected Framework Type
+[Concept Comparison / Decision Support / ...]
+
+## Selected Dimensions
+1. [Dimension 1]
+2. [Dimension 2]
+...
+
+## Initial Population
+| Dimension | X | Y | Factual Basis |
+|-----------|---|---|---------------|
+| [Dimension 1] | [description] | [description] | Fact #1, #3 |
+| ... | | | |
+```
+
+### Step 5: Reference Point Baseline Alignment
+
+Ensure all compared parties have clear, consistent definitions:
+
+**Checklist**:
+- [ ] Is the reference point's definition stable/widely accepted?
+- [ ] Does it need verification, or can domain common knowledge be used?
+- [ ] Does the reader's understanding of the reference point match mine?
+- [ ] Are there ambiguities that need to be clarified first?
+
+### Step 6: Fact-to-Conclusion Reasoning Chain
+
+Explicitly write out the "fact → comparison → conclusion" reasoning process:
+
+```markdown
+## Reasoning Process
+
+### Regarding [Dimension Name]
+
+1. **Fact confirmation**: According to [source], X's mechanism is...
+2. **Compare with reference**: While Y's mechanism is...
+3. **Conclusion**: Therefore, the difference between X and Y on this dimension is...
+```
+
+**Key discipline**:
+- Conclusions come from mechanism comparison, not "gut feelings"
+- Every conclusion must be traceable to specific facts
+- Uncertain conclusions must be annotated
+
+**📁 Save action**:
+Write to `04_reasoning_chain.md`:
+```markdown
+# Reasoning Chain
+
+## Dimension 1: [Dimension Name]
+
+### Fact Confirmation
+According to [Fact #X], X's mechanism is...
+
+### Reference Comparison
+While Y's mechanism is... (Source: [Fact #Y])
+
+### Conclusion
+Therefore, the difference between X and Y on this dimension is...
+
+### Confidence
+✅/⚠️/❓ + rationale
+
+---
+## Dimension 2: [Dimension Name]
+...
+```
+
+### Step 7: Use-Case Validation (Sanity Check)
+
+Validate conclusions against a typical scenario:
+
+**Validation questions**:
+- Based on my conclusions, how should this scenario be handled?
+- Is that actually the case?
+- Are there counterexamples that need to be addressed?
+
+**Review checklist**:
+- [ ] Are draft conclusions consistent with Step 3 fact cards?
+- [ ] Are there any important dimensions missed?
+- [ ] Is there any over-extrapolation?
+- [ ] Are conclusions actionable/verifiable?
+
+**📁 Save action**:
+Write to `05_validation_log.md`:
+```markdown
+# Validation Log
+
+## Validation Scenario
+[Scenario description]
+
+## Expected Based on Conclusions
+If using X: [expected behavior]
+If using Y: [expected behavior]
+
+## Actual Validation Results
+[actual situation]
+
+## Counterexamples
+[yes/no, describe if yes]
+
+## Review Checklist
+- [x] Draft conclusions consistent with fact cards
+- [x] No important dimensions missed
+- [x] No over-extrapolation
+- [ ] Issue found: [if any]
+
+## Conclusions Requiring Revision
+[if any]
+```
+
+### Step 8: Deliverable Formatting
+
+Make the output **readable, traceable, and actionable**.
+
+**📁 Save action**:
+Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` using the appropriate output template based on active mode:
+- Mode A: `templates/solution_draft_mode_a.md`
+- Mode B: `templates/solution_draft_mode_b.md`
+
+Sources to integrate:
+- Extract background from `00_question_decomposition.md`
+- Reference key facts from `02_fact_cards.md`
+- Organize conclusions from `04_reasoning_chain.md`
+- Generate references from `01_source_registry.md`
+- Supplement with use cases from `05_validation_log.md`
+- For Mode A: include AC assessment from `00_ac_assessment.md`
+
+## Solution Draft Output Templates
+
+### Mode A: Initial Research Output
+
+Use template: `templates/solution_draft_mode_a.md`
+
+### Mode B: Solution Assessment Output
+
+Use template: `templates/solution_draft_mode_b.md`
+
+## Stakeholder Perspectives
+
+Adjust content depth based on audience:
+
+| Audience | Focus | Detail Level |
+|----------|-------|--------------|
+| **Decision-makers** | Conclusions, risks, recommendations | Concise, emphasize actionability |
+| **Implementers** | Specific mechanisms, how-to | Detailed, emphasize how to do it |
+| **Technical experts** | Details, boundary conditions, limitations | In-depth, emphasize accuracy |
+
+## Output Files
+
+Default intermediate artifacts location: `RESEARCH_DIR/`
+
+**Required files** (automatically generated through the process):
+
+| File | Content | When Generated |
+|------|---------|----------------|
+| `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
+| `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
+| `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
+| `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
+| `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
+| `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
+| `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
+| `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
+| `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
+| `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
+
+**Optional files**:
+- `raw/*.md` - Raw source archives (saved when content is lengthy)
+
+## Methodology Quick Reference Card
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│              Deep Research — Mode-Aware 8-Step Method            │
+├──────────────────────────────────────────────────────────────────┤
+│ CONTEXT: Resolve mode (project vs standalone) + set paths        │
+│ GUARDRAILS: Check INPUT_DIR/INPUT_FILE exists + required files   │
+│ MODE DETECT: solution_draft*.md in 01_solution? → A or B         │
+│                                                                  │
+│ MODE A: Initial Research                                         │
+│   Phase 1: AC & Restrictions Assessment (BLOCKING)               │
+│   Phase 2: Full 8-step → solution_draft##.md                     │
+│   Phase 3: Tech Stack Consolidation (OPTIONAL) → tech_stack.md   │
+│   Phase 4: Security Deep Dive (OPTIONAL) → security_analysis.md  │
+│                                                                  │
+│ MODE B: Solution Assessment                                      │
+│   Read latest draft → Full 8-step → solution_draft##.md (N+1)    │
+│   Optional: Phase 3 / Phase 4 on revised draft                   │
+│                                                                  │
+│ 8-STEP ENGINE:                                                   │
+│  0. Classify question type → Select framework template           │
+│  1. Decompose question → mode-specific sub-questions             │
+│  2. Tier sources → L1 Official > L2 Blog > L3 Media > L4         │
+│  3. Extract facts → Each with source, confidence level           │
+│  4. Build framework → Fixed dimensions, structured compare       │
+│  5. Align references → Ensure unified definitions                │
+│  6. Reasoning chain → Fact→Compare→Conclude, explicit            │
+│  7. Use-case validation → Sanity check, prevent armchairing      │
+│  8. Deliverable → solution_draft##.md (mode-specific format)     │
+├──────────────────────────────────────────────────────────────────┤
+│ Key discipline: Ask don't assume · Facts before reasoning        │
+│                 Conclusions from mechanism, not gut feelings     │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+## Usage Examples
+
+For detailed execution flow examples (Mode A initial, Mode B assessment, standalone, force override): Read `references/usage-examples.md`
+
+## Source Verifiability Requirements
+
+Every cited piece of external information must be directly verifiable by the user. All links must be publicly accessible (annotate `[login required]` if not), citations must include exact section/page/timestamp, and unverifiable information must be annotated `[limited source]`. Full checklist in `references/quality-checklists.md`.
+
+## Quality Checklist
+
+Before completing the solution draft, run through the checklists in `references/quality-checklists.md`. This covers:
+- General quality (L1/L2 support, verifiability, actionability)
+- Mode A specific (AC assessment, competitor analysis, component tables, tech stack)
+- Mode B specific (findings table, self-contained draft, performance column)
+- Timeliness check for high-sensitivity domains (version annotations, cross-validation, community mining)
+- Target audience consistency (boundary definition, source matching, fact card audience)
+
+## Final Reply Guidelines
+
+When replying to the user after research is complete:
+
+**✅ Should include**:
+- Active mode used (A or B) and which optional phases were executed
+- One-sentence core conclusion
+- Key findings summary (3-5 points)
+- Path to the solution draft: `OUTPUT_DIR/solution_draft##.md`
+- Paths to optional artifacts if produced: `tech_stack.md`, `security_analysis.md`
+- If there are significant uncertainties, annotate points requiring further verification
+
+**❌ Must not include**:
+- Process file listings (e.g., `00_question_decomposition.md`, `01_source_registry.md`, etc.)
+- Detailed research step descriptions
+- Working directory structure display
+
+**Reason**: Process files are for retrospective review, not for the user. The user cares about conclusions, not the process.
@@ -0,0 +1,34 @@
+# Comparison & Analysis Frameworks — Reference
+
+## General Dimensions (select as needed)
+
+1. Goal / What problem does it solve
+2. Working mechanism / Process
+3. Input / Output / Boundaries
+4. Advantages / Disadvantages / Trade-offs
+5. Applicable scenarios / Boundary conditions
+6. Cost / Benefit / Risk
+7. Historical evolution / Future trends
+8. Security / Permissions / Controllability
+
+## Concept Comparison Specific Dimensions
+
+1. Definition & essence
+2. Trigger / invocation method
+3. Execution agent
+4. Input/output & type constraints
+5. Determinism & repeatability
+6. Resource & context management
+7. Composition & reuse patterns
+8. Security boundaries & permission control
+
+## Decision Support Specific Dimensions
+
+1. Solution overview
+2. Implementation cost
+3. Maintenance cost
+4. Risk assessment
+5. Expected benefit
+6. Applicable scenarios
+7. Team capability requirements
+8. Migration difficulty
@@ -0,0 +1,75 @@
+# Novelty Sensitivity Assessment — Reference
+
+## Novelty Sensitivity Classification
+
+| Sensitivity Level | Typical Domains | Source Time Window | Description |
+|-------------------|-----------------|-------------------|-------------|
+| **Critical** | AI/LLMs, blockchain, cryptocurrency | 3-6 months | Technology iterates extremely fast; info from months ago may be completely outdated |
+| **High** | Cloud services, frontend frameworks, API interfaces | 6-12 months | Frequent version updates; must confirm current version |
+| **Medium** | Programming languages, databases, operating systems | 1-2 years | Relatively stable but still evolving |
+| **Low** | Algorithm fundamentals, design patterns, theoretical concepts | No limit | Core principles change slowly |
+
+## Critical Sensitivity Domain Special Rules
+
+When the research topic involves the following domains, special rules must be enforced:
+
+**Trigger word identification**:
+- AI-related: LLM, GPT, Claude, Gemini, AI Agent, RAG, vector database, prompt engineering
+- Cloud-native: Kubernetes new versions, Serverless, container runtimes
+- Cutting-edge tech: Web3, quantum computing, AR/VR
+
+**Mandatory rules**:
+
+1. **Search with time constraints**:
+   - Use `time_range: "month"` or `time_range: "week"` to limit search results
+   - Prefer `start_date: "YYYY-MM-DD"` set to within the last 3 months
+
+2. **Elevate official source priority**:
+   - Must first consult official documentation, official blogs, official Changelogs
+   - GitHub Release Notes, official X/Twitter announcements
+   - Academic papers (arXiv and other preprint platforms)
+
+3. **Mandatory version number annotation**:
+   - Any technical description must annotate the current version number
+   - Example: "Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) supports..."
+   - Prohibit vague statements like "the latest version supports..."
+
+4. **Outdated information handling**:
+   - Technical blogs/tutorials older than 6 months -> historical reference only, cannot serve as factual evidence
+   - Version inconsistency found -> must verify current version before using
+   - Obviously outdated descriptions (e.g., "will support in the future" but now already supported) -> discard directly
+
+5. **Cross-validation**:
+   - Highly sensitive information must be confirmed from at least 2 independent sources
+   - Priority: Official docs > Official blogs > Authoritative tech media > Personal blogs
+
+6. **Official download/release page direct verification (BLOCKING)**:
+   - Must directly visit official download pages to verify platform support (don't rely on search engine caches)
+   - Use `WebFetch` to directly extract download page content
+   - Search results about "coming soon" or "planned support" may be outdated; must verify in real time
+   - Platform support is frequently changing information; cannot infer from old sources
+
+7. **Product-specific protocol/feature name search (BLOCKING)**:
+   - Beyond searching the product name, must additionally search protocol/standard names the product supports
+   - Common protocols/standards to search:
+     - AI tools: MCP, ACP (Agent Client Protocol), LSP, DAP
+     - Cloud services: OAuth, OIDC, SAML
+     - Data exchange: GraphQL, gRPC, REST
+   - Search format: `"<product_name> <protocol_name> support"` or `"<product_name> <protocol_name> integration"`
+
+## Timeliness Assessment Output Template
+
+```markdown
+## Timeliness Sensitivity Assessment
+
+- **Research Topic**: [topic]
+- **Sensitivity Level**: Critical / High / Medium / Low
+- **Rationale**: [why this level]
+- **Source Time Window**: [X months/years]
+- **Priority official sources to consult**:
+  1. [Official source 1]
+  2. [Official source 2]
+- **Key version information to verify**:
+  - [Product/technology 1]: Current version ____
+  - [Product/technology 2]: Current version ____
+```
@@ -0,0 +1,61 @@
+# Quality Checklists — Reference
+
+## General Quality
+
+- [ ] All core conclusions have L1/L2 tier factual support
+- [ ] No use of vague words like "possibly", "probably" without annotating uncertainty
+- [ ] Comparison dimensions are complete with no key differences missed
+- [ ] At least one real use case validates conclusions
+- [ ] References are complete with accessible links
+- [ ] Every citation can be directly verified by the user (source verifiability)
+- [ ] Structure hierarchy is clear; executives can quickly locate information
+
+## Mode A Specific
+
+- [ ] Phase 1 completed: AC assessment was presented to and confirmed by user
+- [ ] AC assessment consistent: Solution draft respects the (possibly adjusted) acceptance criteria and restrictions
+- [ ] Competitor analysis included: Existing solutions were researched
+- [ ] All components have comparison tables: Each component lists alternatives with tools, advantages, limitations, security, cost
+- [ ] Tools/libraries verified: Suggested tools actually exist and work as described
+- [ ] Testing strategy covers AC: Tests map to acceptance criteria
+- [ ] Tech stack documented (if Phase 3 ran): `tech_stack.md` has evaluation tables, risk assessment, and learning requirements
+- [ ] Security analysis documented (if Phase 4 ran): `security_analysis.md` has threat model and per-component controls
+
+## Mode B Specific
+
+- [ ] Findings table complete: All identified weak points documented with solutions
+- [ ] Weak point categories covered: Functional, security, and performance assessed
+- [ ] New draft is self-contained: Written as if from scratch, no "updated" markers
+- [ ] Performance column included: Mode B comparison tables include performance characteristics
+- [ ] Previous draft issues addressed: Every finding in the table is resolved in the new draft
+
+## Timeliness Check (High-Sensitivity Domain BLOCKING)
+
+When the research topic has Critical or High sensitivity level:
+
+- [ ] Timeliness sensitivity assessment completed: `00_question_decomposition.md` contains a timeliness assessment section
+- [ ] Source timeliness annotated: Every source has publication date, timeliness status, version info
+- [ ] No outdated sources used as factual evidence (Critical: within 6 months; High: within 1 year)
+- [ ] Version numbers explicitly annotated for all technical products/APIs/SDKs
+- [ ] Official sources prioritized: Core conclusions have support from official documentation/blogs
+- [ ] Cross-validation completed: Key technical information confirmed from at least 2 independent sources
+- [ ] Download page directly verified: Platform support info comes from real-time extraction of official download pages
+- [ ] Protocol/feature names searched: Searched for product-supported protocol names (MCP, ACP, etc.)
+- [ ] GitHub Issues mined: Reviewed product's GitHub Issues popular discussions
+- [ ] Community hotspots identified: Identified and recorded feature points users care most about
+
+## Target Audience Consistency Check (BLOCKING)
+
+- [ ] Research boundary clearly defined: `00_question_decomposition.md` has clear population/geography/timeframe/level boundaries
+- [ ] Every source has target audience annotated in `01_source_registry.md`
+- [ ] Mismatched sources properly handled (excluded, annotated, or marked reference-only)
+- [ ] No audience confusion in fact cards: Every fact has target audience consistent with research boundary
+- [ ] No audience confusion in the report: Policies/research/data cited have consistent target audiences
+
+## Source Verifiability
+
+- [ ] All cited links are publicly accessible (annotate `[login required]` if not)
+- [ ] Citations include exact section/page/timestamp for long documents
+- [ ] Cited facts have corresponding statements in the original text (no over-interpretation)
+- [ ] Source publication/update dates annotated; technical docs include version numbers
+- [ ] Unverifiable information annotated `[limited source]` and not sole support for core conclusions
@@ -0,0 +1,118 @@
+# Source Tiering & Authority Anchoring — Reference
+
+## Source Tiers
+
+| Tier | Source Type | Purpose | Credibility |
+|------|------------|---------|-------------|
+| **L1** | Official docs, papers, specs, RFCs | Definitions, mechanisms, verifiable facts | High |
+| **L2** | Official blogs, tech talks, white papers | Design intent, architectural thinking | High |
+| **L3** | Authoritative media, expert commentary, tutorials | Supplementary intuition, case studies | Medium |
+| **L4** | Community discussions, personal blogs, forums | Discover blind spots, validate understanding | Low |
+
+## L4 Community Source Specifics (mandatory for product comparison research)
+
+| Source Type | Access Method | Value |
+|------------|---------------|-------|
+| **GitHub Issues** | Visit `github.com/<org>/<repo>/issues` | Real user pain points, feature requests, bug reports |
+| **GitHub Discussions** | Visit `github.com/<org>/<repo>/discussions` | Feature discussions, usage insights, community consensus |
+| **Reddit** | Search `site:reddit.com "<product_name>"` | Authentic user reviews, comparison discussions |
+| **Hacker News** | Search `site:news.ycombinator.com "<product_name>"` | In-depth technical community discussions |
+| **Discord/Telegram** | Product's official community channels | Active user feedback (must annotate [limited source]) |
+
+## Principles
+
+- Conclusions must be traceable to L1/L2
+- L3/L4 serve only as supplementary and validation
+- L4 community discussions are used to discover "what users truly care about"
+- Record all information sources
+
+## Timeliness Filtering Rules (execute based on Step 0.5 sensitivity level)
+
+| Sensitivity Level | Source Filtering Rule | Suggested Search Parameters |
+|-------------------|----------------------|-----------------------------|
+| Critical | Only accept sources within 6 months as factual evidence | `time_range: "month"` or `start_date` set to last 3 months |
+| High | Prefer sources within 1 year; annotate if older than 1 year | `time_range: "year"` |
+| Medium | Sources within 2 years used normally; older ones need validity check | Default search |
+| Low | No time limit | Default search |
+
+## High-Sensitivity Domain Search Strategy
+
+```
+1. Round 1: Targeted official source search
+   - Use include_domains to restrict to official domains
+   - Example: include_domains: ["anthropic.com", "openai.com", "docs.xxx.com"]
+
+2. Round 2: Official download/release page direct verification (BLOCKING)
+   - Directly visit official download pages; don't rely on search caches
+   - Use tavily-extract or WebFetch to extract page content
+   - Verify: platform support, current version number, release date
+
+3. Round 3: Product-specific protocol/feature search (BLOCKING)
+   - Search protocol names the product supports (MCP, ACP, LSP, etc.)
+   - Format: "<product_name> <protocol_name>" site:official_domain
+
+4. Round 4: Time-limited broad search
+   - time_range: "month" or start_date set to recent
+   - Exclude obviously outdated sources
+
+5. Round 5: Version verification
+   - Cross-validate version numbers from search results
+   - If inconsistency found, immediately consult official Changelog
+
+6. Round 6: Community voice mining (BLOCKING - mandatory for product comparison research)
+   - Visit the product's GitHub Issues page, review popular/pinned issues
+   - Search Issues for key feature terms (e.g., "MCP", "plugin", "integration")
+   - Review discussion trends from the last 3-6 months
+   - Identify the feature points and differentiating characteristics users care most about
+```
+
+## Community Voice Mining Detailed Steps
+
+```
+GitHub Issues Mining Steps:
+1. Visit github.com/<org>/<repo>/issues
+2. Sort by "Most commented" to view popular discussions
+3. Search keywords:
+   - Feature-related: feature request, enhancement, MCP, plugin, API
+   - Comparison-related: vs, compared to, alternative, migrate from
+4. Review issue labels: enhancement, feature, discussion
+5. Record frequently occurring feature demands and user pain points
+
+Value Translation:
+- Frequently discussed features -> likely differentiating highlights
+- User complaints/requests -> likely product weaknesses
+- Comparison discussions -> directly obtain user-perspective difference analysis
+```
+
+## Source Registry Entry Template
+
+For each source consulted, immediately append to `01_source_registry.md`:
+```markdown
+## Source #[number]
+- **Title**: [source title]
+- **Link**: [URL]
+- **Tier**: L1/L2/L3/L4
+- **Publication Date**: [YYYY-MM-DD]
+- **Timeliness Status**: Currently valid / Needs verification / Outdated (reference only)
+- **Version Info**: [If involving a specific version, must annotate]
+- **Target Audience**: [Explicitly annotate the group/geography/level this source targets]
+- **Research Boundary Match**: Full match / Partial overlap / Reference only
+- **Summary**: [1-2 sentence key content]
+- **Related Sub-question**: [which sub-question this corresponds to]
+```
+
+## Target Audience Verification (BLOCKING)
+
+Before including each source, verify that its target audience matches the research boundary:
+
+| Source Type | Target audience to verify | Verification method |
+|------------|---------------------------|---------------------|
+| **Policy/Regulation** | Who is it for? (K-12/university/all) | Check document title, scope clauses |
+| **Academic Research** | Who are the subjects? (vocational/undergraduate/graduate) | Check methodology/sample description sections |
+| **Statistical Data** | Which population is measured? | Check data source description |
+| **Case Reports** | What type of institution is involved? | Confirm institution type |
+
+Handling mismatched sources:
+- Target audience completely mismatched -> do not include
+- Partially overlapping -> include but annotate applicable scope
+- Usable as analogous reference -> include but explicitly annotate "reference only"
@@ -0,0 +1,56 @@
+# Usage Examples — Reference
+
+## Example 1: Initial Research (Mode A)
+
+```
+User: Research this problem and find the best solution
+```
+
+Execution flow:
+1. Context resolution: no explicit file -> project mode (INPUT_DIR=`_docs/00_problem/`, OUTPUT_DIR=`_docs/01_solution/`)
+2. Guardrails: verify INPUT_DIR exists with required files
+3. Mode detection: no `solution_draft*.md` -> Mode A
+4. Phase 1: Assess acceptance criteria and restrictions, ask user about unclear parts
+5. BLOCKING: present AC assessment, wait for user confirmation
+6. Phase 2: Full 8-step research — competitors, components, state-of-the-art solutions
+7. Output: `OUTPUT_DIR/solution_draft01.md`
+8. (Optional) Phase 3: Tech stack consolidation -> `tech_stack.md`
+9. (Optional) Phase 4: Security deep dive -> `security_analysis.md`
+
+## Example 2: Solution Assessment (Mode B)
+
+```
+User: Assess the current solution draft
+```
+
+Execution flow:
+1. Context resolution: no explicit file -> project mode
+2. Guardrails: verify INPUT_DIR exists
+3. Mode detection: `solution_draft03.md` found in OUTPUT_DIR -> Mode B, read it as input
+4. Full 8-step research — weak points, security, performance, solutions
+5. Output: `OUTPUT_DIR/solution_draft04.md` with findings table + revised draft
+
+## Example 3: Standalone Research
+
+```
+User: /research @my_problem.md
+```
+
+Execution flow:
+1. Context resolution: explicit file -> standalone mode (INPUT_FILE=`my_problem.md`, OUTPUT_DIR=`_standalone/my_problem/01_solution/`)
+2. Guardrails: verify INPUT_FILE exists and is non-empty, warn about missing restrictions/AC
+3. Mode detection + full research flow as in Example 1, scoped to standalone paths
+4. Output: `_standalone/my_problem/01_solution/solution_draft01.md`
+5. Move `my_problem.md` into `_standalone/my_problem/`
+
+## Example 4: Force Initial Research (Override)
+
+```
+User: Research from scratch, ignore existing drafts
+```
+
+Execution flow:
+1. Context resolution: no explicit file -> project mode
+2. Mode detection: drafts exist, but user explicitly requested initial research -> Mode A
+3. Phase 1 + Phase 2 as in Example 1
+4. Output: `OUTPUT_DIR/solution_draft##.md` (incremented from highest existing)
@@ -0,0 +1,37 @@
+# Solution Draft
+
+## Product Solution Description
+[Short description of the proposed solution. Brief component interaction diagram.]
+
+## Existing/Competitor Solutions Analysis
+[Analysis of existing solutions for similar problems, if any.]
+
+## Architecture
+
+[Architecture solution that meets restrictions and acceptance criteria.]
+
+### Component: [Component Name]
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
+| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
+
+[Repeat per component]
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- [Test 1]
+- [Test 2]
+
+### Non-Functional Tests
+- [Performance test 1]
+- [Security test 1]
+
+## References
+[All cited source links]
+
+## Related Artifacts
+- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
+- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
@@ -0,0 +1,40 @@
+# Solution Draft
+
+## Assessment Findings
+
+| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
+|------------------------|----------------------------------------------|-------------|
+| [old] | [weak point] | [new] |
+
+## Product Solution Description
+[Short description. Brief component interaction diagram. Written as if from scratch — no "updated" markers.]
+
+## Architecture
+
+[Architecture solution that meets restrictions and acceptance criteria.]
+
+### Component: [Component Name]
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
+| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
+
+[Repeat per component]
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- [Test 1]
+- [Test 2]
+
+### Non-Functional Tests
+- [Performance test 1]
+- [Security test 1]
+
+## References
+[All cited source links]
+
+## Related Artifacts
+- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
+- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
@@ -0,0 +1,174 @@
+---
+name: retrospective
+description: |
+  Collect metrics from implementation batch reports and code review findings, analyze trends across cycles,
+  and produce improvement reports with actionable recommendations.
+  3-step workflow: collect metrics, analyze trends, produce report.
+  Outputs to _docs/05_metrics/.
+  Trigger phrases:
+  - "retrospective", "retro", "run retro"
+  - "metrics review", "feedback loop"
+  - "implementation metrics", "analyze trends"
+category: evolve
+tags: [retrospective, metrics, trends, improvement, feedback-loop]
+disable-model-invocation: true
+---
+
+# Retrospective
+
+Collect metrics from implementation artifacts, analyze trends across development cycles, and produce actionable improvement reports.
+
+## Core Principles
+
+- **Data-driven**: conclusions come from metrics, not impressions
+- **Actionable**: every finding must have a concrete improvement suggestion
+- **Cumulative**: each retrospective compares against previous ones to track progress
+- **Save immediately**: write artifacts to disk after each step
+- **Non-judgmental**: focus on process improvement, not blame
+
+## Context Resolution
+
+Fixed paths:
+
+- IMPL_DIR: `_docs/03_implementation/`
+- METRICS_DIR: `_docs/05_metrics/`
+- TASKS_DIR: `_docs/02_tasks/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Prerequisite Checks (BLOCKING)
+
+1. `IMPL_DIR` exists and contains at least one `batch_*_report.md` — **STOP if missing** (nothing to analyze)
+2. Create METRICS_DIR if it does not exist
+3. Check for previous retrospective reports in METRICS_DIR to enable trend comparison
+
+## Artifact Management
+
+### Directory Structure
+
+```
+METRICS_DIR/
+├── retro_[YYYY-MM-DD].md
+├── retro_[YYYY-MM-DD].md
+└── ...
+```
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all steps (1 through 3). Update status as each step completes.
+
+## Workflow
+
+### Step 1: Collect Metrics
+
+**Role**: Data analyst
+**Goal**: Parse all implementation artifacts and extract quantitative metrics
+**Constraints**: Collection only — no interpretation yet
+
+#### Sources
+
+| Source | Metrics Extracted |
+|--------|------------------|
+| `batch_*_report.md` | Tasks per batch, batch count, task statuses (Done/Blocked/Partial) |
+| Code review sections in batch reports | PASS/FAIL/PASS_WITH_WARNINGS ratios, finding counts by severity and category |
+| Task spec files in TASKS_DIR | Complexity points per task, dependency count |
+| `FINAL_implementation_report.md` | Total tasks, total batches, overall duration |
+| Git log (if available) | Commits per batch, files changed per batch |
+
+#### Metrics to Compute
+
+**Implementation Metrics**:
+- Total tasks implemented
+- Total batches executed
+- Average tasks per batch
+- Average complexity points per batch
+- Total complexity points delivered
+
+**Quality Metrics**:
+- Code review pass rate (PASS / total reviews)
+- Code review findings by severity: Critical, High, Medium, Low counts
+- Code review findings by category: Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope
+- FAIL count (batches that required user intervention)
+
+**Efficiency Metrics**:
+- Blocked task count and reasons
+- Tasks completed on first attempt vs requiring fixes
+- Batch with most findings (identify problem areas)
+
+**Self-verification**:
+- [ ] All batch reports parsed
+- [ ] All metric categories computed
+- [ ] No batch reports missed
+
+---
+
+### Step 2: Analyze Trends
+
+**Role**: Process improvement analyst
+**Goal**: Identify patterns, recurring issues, and improvement opportunities
+**Constraints**: Analysis must be grounded in the metrics from Step 1
+
+1. If previous retrospective reports exist in METRICS_DIR, load the most recent one for comparison
+2. Identify patterns:
+   - **Recurring findings**: which code review categories appear most frequently?
+   - **Problem components**: which components/files generate the most findings?
+   - **Complexity accuracy**: do high-complexity tasks actually produce more issues?
+   - **Blocker patterns**: what types of blockers occur and can they be prevented?
+3. Compare against previous retrospective (if exists):
+   - Which metrics improved?
+   - Which metrics degraded?
+   - Were previous improvement actions effective?
+4. Identify top 3 improvement actions ranked by impact
+
+**Self-verification**:
+- [ ] Patterns are grounded in specific metrics
+- [ ] Comparison with previous retro included (if exists)
+- [ ] Top 3 actions are concrete and actionable
+
+---
+
+### Step 3: Produce Report
+
+**Role**: Technical writer
+**Goal**: Write a structured retrospective report with metrics, trends, and recommendations
+**Constraints**: Concise, data-driven, actionable
+
+Write `METRICS_DIR/retro_[YYYY-MM-DD].md` using `templates/retrospective-report.md` as structure.
+
+**Self-verification**:
+- [ ] All metrics from Step 1 included
+- [ ] Trend analysis from Step 2 included
+- [ ] Top 3 improvement actions clearly stated
+- [ ] Suggested rule/skill updates are specific
+
+**Save action**: Write `retro_[YYYY-MM-DD].md`
+
+Present the report summary to the user.
+
+---
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| No batch reports exist | **STOP** — nothing to analyze |
+| Batch reports have inconsistent format | **WARN user**, extract what is available |
+| No previous retrospective for comparison | PROCEED — report baseline metrics only |
+| Metrics suggest systemic issue (>50% FAIL rate) | **WARN user** — suggest immediate process review |
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│              Retrospective (3-Step Method)                     │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: batch reports exist in _docs/03_implementation/        │
+│                                                                │
+│ 1. Collect Metrics  → parse batch reports, compute metrics     │
+│ 2. Analyze Trends   → patterns, comparison, improvement areas  │
+│ 3. Produce Report   → _docs/05_metrics/retro_[date].md         │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Data-driven · Actionable · Cumulative              │
+│             Non-judgmental · Save immediately                  │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,93 @@
+# Retrospective Report Template
+
+Save as `_docs/05_metrics/retro_[YYYY-MM-DD].md`.
+
+---
+
+```markdown
+# Retrospective — [YYYY-MM-DD]
+
+## Implementation Summary
+
+| Metric | Value |
+|--------|-------|
+| Total tasks | [count] |
+| Total batches | [count] |
+| Total complexity points | [sum] |
+| Avg tasks per batch | [value] |
+| Avg complexity per batch | [value] |
+
+## Quality Metrics
+
+### Code Review Results
+
+| Verdict | Count | Percentage |
+|---------|-------|-----------|
+| PASS | [count] | [%] |
+| PASS_WITH_WARNINGS | [count] | [%] |
+| FAIL | [count] | [%] |
+
+### Findings by Severity
+
+| Severity | Count |
+|----------|-------|
+| Critical | [count] |
+| High | [count] |
+| Medium | [count] |
+| Low | [count] |
+
+### Findings by Category
+
+| Category | Count | Top Files |
+|----------|-------|-----------|
+| Bug | [count] | [most affected files] |
+| Spec-Gap | [count] | [most affected files] |
+| Security | [count] | [most affected files] |
+| Performance | [count] | [most affected files] |
+| Maintainability | [count] | [most affected files] |
+| Style | [count] | [most affected files] |
+
+## Efficiency
+
+| Metric | Value |
+|--------|-------|
+| Blocked tasks | [count] |
+| Tasks requiring fixes after review | [count] |
+| Batch with most findings | Batch [N] — [reason] |
+
+### Blocker Analysis
+
+| Blocker Type | Count | Prevention |
+|-------------|-------|-----------|
+| [type] | [count] | [suggested prevention] |
+
+## Trend Comparison
+
+| Metric | Previous | Current | Change |
+|--------|----------|---------|--------|
+| Pass rate | [%] | [%] | [+/-] |
+| Avg findings per batch | [value] | [value] | [+/-] |
+| Blocked tasks | [count] | [count] | [+/-] |
+
+*Previous retrospective: [date or "N/A — first retro"]*
+
+## Top 3 Improvement Actions
+
+1. **[Action title]**: [specific, actionable description]
+   - Impact: [expected improvement]
+   - Effort: [low/medium/high]
+
+2. **[Action title]**: [specific, actionable description]
+   - Impact: [expected improvement]
+   - Effort: [low/medium/high]
+
+3. **[Action title]**: [specific, actionable description]
+   - Impact: [expected improvement]
+   - Effort: [low/medium/high]
+
+## Suggested Rule/Skill Updates
+
+| File | Change | Rationale |
+|------|--------|-----------|
+| [.cursor/rules/... or .cursor/skills/...] | [specific change] | [based on which metric] |
+```
@@ -0,0 +1,130 @@
+---
+name: rollback
+description: |
+  Revert implementation to a specific batch checkpoint using git revert, reset Jira ticket statuses,
+  verify rollback integrity with tests, and produce a rollback report.
+  Trigger phrases:
+  - "rollback", "revert", "revert batch"
+  - "undo implementation", "roll back to batch"
+category: build
+tags: [rollback, revert, recovery, implementation]
+disable-model-invocation: true
+---
+
+# Implementation Rollback
+
+Revert the codebase to a specific batch checkpoint, reset Jira statuses for reverted tasks, and verify integrity.
+
+## Core Principles
+
+- **Preserve history**: always use `git revert`, never force-push
+- **Verify after revert**: run the full test suite after every rollback
+- **Update tracking**: reset Jira ticket statuses for all reverted tasks
+- **Atomic rollback**: if rollback fails midway, stop and report — do not leave the codebase in a partial state
+- **Ask, don't assume**: if the target batch is ambiguous, present options and ask
+
+## Context Resolution
+
+- IMPL_DIR: `_docs/03_implementation/`
+- Batch reports: `IMPL_DIR/batch_*_report.md`
+
+## Prerequisite Checks (BLOCKING)
+
+1. IMPL_DIR exists and contains at least one `batch_*_report.md` — **STOP if missing**
+2. Git working tree is clean (no uncommitted changes) — **STOP if dirty**, ask user to commit or stash
+
+## Input
+
+- User specifies a target batch number or commit hash
+- If not specified, present the list of available batch checkpoints and ask
+
+## Workflow
+
+### Step 1: Identify Checkpoints
+
+1. Read all `batch_*_report.md` files from IMPL_DIR
+2. Extract: batch number, date, tasks included, commit hash, code review verdict
+3. Present batch list to user
+
+**BLOCKING**: User must confirm which batch to roll back to.
+
+### Step 2: Revert Commits
+
+1. Determine which commits need to be reverted (all commits after the target batch)
+2. For each commit in reverse chronological order:
+   - Run `git revert <commit-hash> --no-edit`
+   - If merge conflicts occur: present conflicts and ask user for resolution
+3. If any revert fails and cannot be resolved, abort the rollback sequence with `git revert --abort` and report
+
+### Step 3: Verify Integrity
+
+1. Run the full test suite
+2. If tests fail: report failures to user, ask how to proceed (fix or abort)
+3. If tests pass: continue
+
+### Step 4: Update Jira
+
+1. Identify all tasks from reverted batches
+2. Reset each task's Jira ticket status to "To Do" via Jira MCP
+
+### Step 5: Finalize
+
+1. Commit with message: `[ROLLBACK] Reverted to batch [N]: [task list]`
+2. Write rollback report to `IMPL_DIR/rollback_report.md`
+
+## Output
+
+Write `_docs/03_implementation/rollback_report.md`:
+
+```markdown
+# Rollback Report
+
+**Date**: [YYYY-MM-DD]
+**Target**: Batch [N] (commit [hash])
+**Reverted Batches**: [list]
+
+## Reverted Tasks
+
+| Task | Batch | Status Before | Status After |
+|------|-------|--------------|-------------|
+| [JIRA-ID] | [batch #] | In Testing | To Do |
+
+## Test Results
+- [pass/fail count]
+
+## Jira Updates
+- [list of ticket transitions]
+
+## Notes
+- [any conflicts, manual steps, or issues encountered]
+```
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| No batch reports exist | **STOP** — nothing to roll back |
+| Uncommitted changes in working tree | **STOP** — ask user to commit or stash |
+| Merge conflicts during revert | **ASK user** for resolution |
+| Tests fail after rollback | **ASK user** — fix or abort |
+| Rollback fails midway | Abort with `git revert --abort`, report to user |
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│              Rollback (5-Step Method)                            │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: batch reports exist, clean working tree                 │
+│                                                                │
+│ 1. Identify Checkpoints → present batch list                    │
+│    [BLOCKING: user confirms target batch]                       │
+│ 2. Revert Commits       → git revert per commit                │
+│ 3. Verify Integrity     → run full test suite                   │
+│ 4. Update Jira          → reset statuses to "To Do"            │
+│ 5. Finalize             → commit + rollback_report.md           │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Preserve history · Verify after revert              │
+│             Atomic rollback · Ask don't assume                 │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,300 @@
+---
+name: security-testing
+description: "Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices."
+category: specialized-testing
+priority: critical
+tokenEstimate: 1200
+agents: [qe-security-scanner, qe-api-contract-validator, qe-quality-analyzer]
+implementation_status: optimized
+optimization_version: 1.0
+last_optimized: 2025-12-02
+dependencies: []
+quick_reference_card: true
+tags: [security, owasp, sast, dast, vulnerabilities, auth, injection]
+trust_tier: 3
+validation:
+  schema_path: schemas/output.json
+  validator_path: scripts/validate-config.json
+  eval_path: evals/security-testing.yaml
+---
+
+# Security Testing
+
+<default_to_action>
+When testing security or conducting audits:
+1. TEST OWASP Top 10 vulnerabilities systematically
+2. VALIDATE authentication and authorization on every endpoint
+3. SCAN dependencies for known vulnerabilities (npm audit)
+4. CHECK for injection attacks (SQL, XSS, command)
+5. VERIFY secrets aren't exposed in code/logs
+
+**Quick Security Checks:**
+- Access control → Test horizontal/vertical privilege escalation
+- Crypto → Verify password hashing, HTTPS, no sensitive data exposed
+- Injection → Test SQL injection, XSS, command injection
+- Auth → Test weak passwords, session fixation, MFA enforcement
+- Config → Check error messages don't leak info
+
+**Critical Success Factors:**
+- Think like an attacker, build like a defender
+- Security is built in, not added at the end
+- Test continuously in CI/CD, not just before release
+</default_to_action>
+
+## Quick Reference Card
+
+### When to Use
+- Security audits and penetration testing
+- Testing authentication/authorization
+- Validating input sanitization
+- Reviewing security configuration
+
+### OWASP Top 10
+Use the most recent **stable** version of the OWASP Top 10. At the start of each security audit, research the current version at https://owasp.org/www-project-top-ten/ and test against all listed categories. Do not rely on a hardcoded list — the OWASP Top 10 is updated periodically and the current version must be verified.
+
+### Tools
+| Type | Tool | Purpose |
+|------|------|---------|
+| SAST | SonarQube, Semgrep | Static code analysis |
+| DAST | OWASP ZAP, Burp | Dynamic scanning |
+| Deps | npm audit, Snyk | Dependency vulnerabilities |
+| Secrets | git-secrets, TruffleHog | Secret scanning |
+
+### Agent Coordination
+- `qe-security-scanner`: Multi-layer SAST/DAST scanning
+- `qe-api-contract-validator`: API security testing
+- `qe-quality-analyzer`: Security code review
+
+---
+
+## Key Vulnerability Tests
+
+### 1. Broken Access Control
+```javascript
+// Horizontal escalation - User A accessing User B's data
+test('user cannot access another user\'s order', async () => {
+  const userAToken = await login('userA');
+  const userBOrder = await createOrder('userB');
+
+  const response = await api.get(`/orders/${userBOrder.id}`, {
+    headers: { Authorization: `Bearer ${userAToken}` }
+  });
+  expect(response.status).toBe(403);
+});
+
+// Vertical escalation - Regular user accessing admin
+test('regular user cannot access admin', async () => {
+  const userToken = await login('regularUser');
+  expect((await api.get('/admin/users', {
+    headers: { Authorization: `Bearer ${userToken}` }
+  })).status).toBe(403);
+});
+```
+
+### 2. Injection Attacks
+```javascript
+// SQL Injection
+test('prevents SQL injection', async () => {
+  const malicious = "' OR '1'='1";
+  const response = await api.get(`/products?search=${malicious}`);
+  expect(response.body.length).toBeLessThan(100); // Not all products
+});
+
+// XSS
+test('sanitizes HTML output', async () => {
+  const xss = '<script>alert("XSS")</script>';
+  await api.post('/comments', { text: xss });
+
+  const html = (await api.get('/comments')).body;
+  expect(html).toContain('&lt;script&gt;');
+  expect(html).not.toContain('<script>');
+});
+```
+
+### 3. Cryptographic Failures
+```javascript
+test('passwords are hashed', async () => {
+  await db.users.create({ email: 'test@example.com', password: 'MyPassword123' });
+  const user = await db.users.findByEmail('test@example.com');
+
+  expect(user.password).not.toBe('MyPassword123');
+  expect(user.password).toMatch(/^\$2[aby]\$\d{2}\$/); // bcrypt
+});
+
+test('no sensitive data in API response', async () => {
+  const response = await api.get('/users/me');
+  expect(response.body).not.toHaveProperty('password');
+  expect(response.body).not.toHaveProperty('ssn');
+});
+```
+
+### 4. Security Misconfiguration
+```javascript
+test('errors don\'t leak sensitive info', async () => {
+  const response = await api.post('/login', { email: 'nonexistent@test.com', password: 'wrong' });
+  expect(response.body.error).toBe('Invalid credentials'); // Generic message
+});
+
+test('sensitive endpoints not exposed', async () => {
+  const endpoints = ['/debug', '/.env', '/.git', '/admin'];
+  for (let ep of endpoints) {
+    expect((await fetch(`https://example.com${ep}`)).status).not.toBe(200);
+  }
+});
+```
+
+### 5. Rate Limiting
+```javascript
+test('rate limiting prevents brute force', async () => {
+  const responses = [];
+  for (let i = 0; i < 20; i++) {
+    responses.push(await api.post('/login', { email: 'test@example.com', password: 'wrong' }));
+  }
+  expect(responses.filter(r => r.status === 429).length).toBeGreaterThan(0);
+});
+```
+
+---
+
+## Security Checklist
+
+### Authentication
+- [ ] Strong password requirements (12+ chars)
+- [ ] Password hashing (bcrypt, scrypt, Argon2)
+- [ ] MFA for sensitive operations
+- [ ] Account lockout after failed attempts
+- [ ] Session ID changes after login
+- [ ] Session timeout
+
+### Authorization
+- [ ] Check authorization on every request
+- [ ] Least privilege principle
+- [ ] No horizontal escalation
+- [ ] No vertical escalation
+
+### Data Protection
+- [ ] HTTPS everywhere
+- [ ] Encrypted at rest
+- [ ] Secrets not in code/logs
+- [ ] PII compliance (GDPR)
+
+### Input Validation
+- [ ] Server-side validation
+- [ ] Parameterized queries (no SQL injection)
+- [ ] Output encoding (no XSS)
+- [ ] Rate limiting
+
+---
+
+## CI/CD Integration
+
+```yaml
+# GitHub Actions
+security-checks:
+  steps:
+    - name: Dependency audit
+      run: npm audit --audit-level=high
+
+    - name: SAST scan
+      run: npm run sast
+
+    - name: Secret scan
+      uses: trufflesecurity/trufflehog@main
+
+    - name: DAST scan
+      if: github.ref == 'refs/heads/main'
+      run: docker run owasp/zap2docker-stable zap-baseline.py -t https://staging.example.com
+```
+
+**Pre-commit hooks:**
+```bash
+#!/bin/sh
+git-secrets --scan
+npm run lint:security
+```
+
+---
+
+## Agent-Assisted Security Testing
+
+```typescript
+// Comprehensive multi-layer scan
+await Task("Security Scan", {
+  target: 'src/',
+  layers: { sast: true, dast: true, dependencies: true, secrets: true },
+  severity: ['critical', 'high', 'medium']
+}, "qe-security-scanner");
+
+// OWASP Top 10 testing
+await Task("OWASP Scan", {
+  categories: ['broken-access-control', 'injection', 'cryptographic-failures'],
+  depth: 'comprehensive'
+}, "qe-security-scanner");
+
+// Validate fix
+await Task("Validate Fix", {
+  vulnerability: 'CVE-2024-12345',
+  expectedResolution: 'upgrade package to v2.0.0',
+  retestAfterFix: true
+}, "qe-security-scanner");
+```
+
+---
+
+## Agent Coordination Hints
+
+### Memory Namespace
+```
+aqe/security/
+├── scans/*           - Scan results
+├── vulnerabilities/* - Found vulnerabilities
+├── fixes/*           - Remediation tracking
+└── compliance/*      - Compliance status
+```
+
+### Fleet Coordination
+```typescript
+const securityFleet = await FleetManager.coordinate({
+  strategy: 'security-testing',
+  agents: [
+    'qe-security-scanner',
+    'qe-api-contract-validator',
+    'qe-quality-analyzer',
+    'qe-deployment-readiness'
+  ],
+  topology: 'parallel'
+});
+```
+
+---
+
+## Common Mistakes
+
+### ❌ Security by Obscurity
+Hiding admin at `/super-secret-admin` → **Use proper auth**
+
+### ❌ Client-Side Validation Only
+JavaScript validation can be bypassed → **Always validate server-side**
+
+### ❌ Trusting User Input
+Assuming input is safe → **Sanitize, validate, escape all input**
+
+### ❌ Hardcoded Secrets
+API keys in code → **Environment variables, secret management**
+
+---
+
+## Related Skills
+- [agentic-quality-engineering](../agentic-quality-engineering/) - Security with agents
+- [api-testing-patterns](../api-testing-patterns/) - API security testing
+- [compliance-testing](../compliance-testing/) - GDPR, HIPAA, SOC2
+
+---
+
+## Remember
+
+**Think like an attacker:** What would you try to break? Test that.
+**Build like a defender:** Assume input is malicious until proven otherwise.
+**Test continuously:** Security testing is ongoing, not one-time.
+
+**With Agents:** Agents automate vulnerability scanning, track remediation, and validate fixes. Use agents to maintain security posture at scale.
@@ -0,0 +1,789 @@
+# =============================================================================
+# AQE Skill Evaluation Test Suite: Security Testing v1.0.0
+# =============================================================================
+#
+# Comprehensive evaluation suite for the security-testing skill per ADR-056.
+# Tests OWASP Top 10 2021 detection, severity classification, remediation
+# quality, and cross-model consistency.
+#
+# Schema: .claude/skills/.validation/schemas/skill-eval.schema.json
+# Validator: .claude/skills/security-testing/scripts/validate-config.json
+#
+# Coverage:
+# - OWASP A01:2021 - Broken Access Control
+# - OWASP A02:2021 - Cryptographic Failures
+# - OWASP A03:2021 - Injection (SQL, XSS, Command)
+# - OWASP A07:2021 - Identification and Authentication Failures
+# - Negative tests (no false positives on secure code)
+#
+# =============================================================================
+
+skill: security-testing
+version: 1.0.0
+description: >
+  Comprehensive evaluation suite for the security-testing skill.
+  Tests OWASP Top 10 2021 detection capabilities, CWE classification accuracy,
+  CVSS scoring, severity classification, and remediation quality.
+  Supports multi-model testing and integrates with ReasoningBank for
+  continuous improvement.
+
+# =============================================================================
+# Multi-Model Configuration
+# =============================================================================
+
+models_to_test:
+  - claude-3.5-sonnet    # Primary model (high accuracy expected)
+  - claude-3-haiku       # Fast model (minimum quality threshold)
+  - gpt-4o               # Cross-vendor validation
+
+# =============================================================================
+# MCP Integration Configuration
+# =============================================================================
+
+mcp_integration:
+  enabled: true
+  namespace: skill-validation
+
+  # Query existing security patterns before running evals
+  query_patterns: true
+
+  # Track each test outcome for learning feedback loop
+  track_outcomes: true
+
+  # Store successful patterns after evals complete
+  store_patterns: true
+
+  # Share learning with fleet coordinator agents
+  share_learning: true
+
+  # Update quality gate with validation metrics
+  update_quality_gate: true
+
+  # Target agents for learning distribution
+  target_agents:
+    - qe-learning-coordinator
+    - qe-queen-coordinator
+    - qe-security-scanner
+    - qe-security-auditor
+
+# =============================================================================
+# ReasoningBank Learning Configuration
+# =============================================================================
+
+learning:
+  store_success_patterns: true
+  store_failure_patterns: true
+  pattern_ttl_days: 90
+  min_confidence_to_store: 0.7
+  cross_model_comparison: true
+
+# =============================================================================
+# Result Format Configuration
+# =============================================================================
+
+result_format:
+  json_output: true
+  markdown_report: true
+  include_raw_output: false
+  include_timing: true
+  include_token_usage: true
+
+# =============================================================================
+# Environment Setup
+# =============================================================================
+
+setup:
+  required_tools:
+    - jq       # JSON parsing (required)
+    - npm      # Dependency audit (optional but recommended)
+
+  environment_variables:
+    SECURITY_SCAN_DEPTH: "deep"
+    OWASP_ENABLED: "true"
+    SEVERITY_THRESHOLD: "medium"
+
+  fixtures:
+    - name: vulnerable_express_app
+      path: fixtures/vulnerable-express-app.js
+      content: |
+        const express = require('express');
+        const app = express();
+
+        // SQL Injection vulnerability
+        app.get('/user', (req, res) => {
+          const query = `SELECT * FROM users WHERE id = ${req.params.id}`;
+          db.query(query);
+        });
+
+        // XSS vulnerability
+        app.get('/profile', (req, res) => {
+          res.send(`<h1>Hello ${req.query.name}</h1>`);
+        });
+
+        // Path Traversal vulnerability
+        app.get('/file', (req, res) => {
+          const path = './uploads/' + req.query.filename;
+          res.sendFile(path);
+        });
+
+# =============================================================================
+# TEST CASES
+# =============================================================================
+
+test_cases:
+  # ---------------------------------------------------------------------------
+  # CATEGORY: SQL Injection (OWASP A03:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc001_sql_injection_string_concat
+    description: "Detect SQL injection via string concatenation in Node.js"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        const express = require('express');
+        const mysql = require('mysql');
+        const app = express();
+
+        app.get('/api/users', (req, res) => {
+          const userId = req.params.id;
+          const query = `SELECT * FROM users WHERE id = ${userId}`;
+          db.query(query, (err, results) => {
+            res.json(results);
+          });
+        });
+      context:
+        language: javascript
+        framework: express
+        environment: production
+
+    expected_output:
+      must_contain:
+        - "SQL injection"
+        - "parameterized"
+      must_not_contain:
+        - "no vulnerabilities"
+        - "secure"
+      must_match_regex:
+        - "CWE-89|CWE-564"
+        - "A03:20[21][0-9]"
+      severity_classification: critical
+      finding_count:
+        min: 1
+        max: 3
+      recommendation_count:
+        min: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+      reasoning_quality_min: 0.7
+      grading_rubric:
+        completeness: 0.3
+        accuracy: 0.5
+        actionability: 0.2
+
+    timeout_ms: 30000
+
+  - id: tc002_sql_injection_parameterized_safe
+    description: "Verify parameterized queries are NOT flagged as vulnerable"
+    category: injection
+    priority: high
+
+    input:
+      code: |
+        app.get('/api/users', (req, res) => {
+          const userId = parseInt(req.params.id, 10);
+          db.query('SELECT * FROM users WHERE id = ?', [userId], (err, results) => {
+            res.json(results);
+          });
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "parameterized"
+        - "secure"
+      must_not_contain:
+        - "SQL injection"
+        - "critical"
+        - "vulnerable"
+      severity_classification: info
+      finding_count:
+        max: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+      allow_partial: true
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Cross-Site Scripting (OWASP A03:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc003_xss_reflected_html_output
+    description: "Detect reflected XSS in unescaped HTML output"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        app.get('/profile', (req, res) => {
+          const name = req.query.name;
+          res.send(`
+            <html>
+              <body>
+                <h1>Welcome, ${name}!</h1>
+                <p>Your profile has been loaded.</p>
+              </body>
+            </html>
+          `);
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "XSS"
+        - "cross-site scripting"
+        - "sanitize"
+        - "escape"
+      must_match_regex:
+        - "CWE-79"
+      severity_classification: high
+      finding_count:
+        min: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+      reasoning_quality_min: 0.75
+
+  - id: tc004_xss_dom_based_innerhtml
+    description: "Detect DOM-based XSS via innerHTML assignment"
+    category: injection
+    priority: high
+
+    input:
+      code: |
+        // Client-side JavaScript
+        const params = new URLSearchParams(window.location.search);
+        const message = params.get('msg');
+        document.getElementById('output').innerHTML = message;
+      context:
+        language: javascript
+        framework: vanilla
+        environment: production
+
+    expected_output:
+      must_contain:
+        - "DOM"
+        - "XSS"
+        - "innerHTML"
+        - "textContent"
+      must_match_regex:
+        - "CWE-79"
+      severity_classification: high
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Authentication Failures (OWASP A07:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc005_hardcoded_credentials
+    description: "Detect hardcoded credentials and API keys"
+    category: authentication
+    priority: critical
+
+    input:
+      code: |
+        const ADMIN_PASSWORD = 'admin123';
+        const API_KEY = 'sk-1234567890abcdef';
+        const DATABASE_URL = 'postgres://admin:password123@localhost/db';
+
+        app.post('/login', (req, res) => {
+          if (req.body.password === ADMIN_PASSWORD) {
+            req.session.isAdmin = true;
+            res.send('Login successful');
+          }
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "hardcoded"
+        - "credentials"
+        - "secret"
+        - "environment variable"
+      must_match_regex:
+        - "CWE-798|CWE-259"
+      severity_classification: critical
+      finding_count:
+        min: 2
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+      reasoning_quality_min: 0.8
+
+  - id: tc006_weak_password_hashing
+    description: "Detect weak password hashing algorithms (MD5, SHA1)"
+    category: authentication
+    priority: high
+
+    input:
+      code: |
+        const crypto = require('crypto');
+
+        function hashPassword(password) {
+          return crypto.createHash('md5').update(password).digest('hex');
+        }
+
+        function verifyPassword(password, hash) {
+          return hashPassword(password) === hash;
+        }
+      context:
+        language: javascript
+        framework: nodejs
+
+    expected_output:
+      must_contain:
+        - "MD5"
+        - "weak"
+        - "bcrypt"
+        - "argon2"
+      must_match_regex:
+        - "CWE-327|CWE-328|CWE-916"
+      severity_classification: high
+      finding_count:
+        min: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Broken Access Control (OWASP A01:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc007_idor_missing_authorization
+    description: "Detect IDOR vulnerability with missing authorization check"
+    category: authorization
+    priority: critical
+
+    input:
+      code: |
+        app.get('/api/users/:id/profile', (req, res) => {
+          // No authorization check - any user can access any profile
+          const userId = req.params.id;
+          db.query('SELECT * FROM profiles WHERE user_id = ?', [userId])
+            .then(profile => res.json(profile));
+        });
+
+        app.delete('/api/users/:id', (req, res) => {
+          // No check if requesting user owns this account
+          db.query('DELETE FROM users WHERE id = ?', [req.params.id]);
+          res.send('User deleted');
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "authorization"
+        - "access control"
+        - "IDOR"
+        - "ownership"
+      must_match_regex:
+        - "CWE-639|CWE-284|CWE-862"
+        - "A01:2021"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Cryptographic Failures (OWASP A02:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc008_weak_encryption_des
+    description: "Detect use of weak encryption algorithms (DES, RC4)"
+    category: cryptography
+    priority: high
+
+    input:
+      code: |
+        const crypto = require('crypto');
+
+        function encryptData(data, key) {
+          const cipher = crypto.createCipher('des', key);
+          return cipher.update(data, 'utf8', 'hex') + cipher.final('hex');
+        }
+
+        function decryptData(data, key) {
+          const decipher = crypto.createDecipher('des', key);
+          return decipher.update(data, 'hex', 'utf8') + decipher.final('utf8');
+        }
+      context:
+        language: javascript
+        framework: nodejs
+
+    expected_output:
+      must_contain:
+        - "DES"
+        - "weak"
+        - "deprecated"
+        - "AES"
+      must_match_regex:
+        - "CWE-327|CWE-328"
+        - "A02:2021"
+      severity_classification: high
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  - id: tc009_plaintext_password_storage
+    description: "Detect plaintext password storage"
+    category: cryptography
+    priority: critical
+
+    input:
+      code: |
+        class User {
+          constructor(email, password) {
+            this.email = email;
+            this.password = password;  // Stored in plaintext!
+          }
+
+          save() {
+            db.query('INSERT INTO users (email, password) VALUES (?, ?)',
+                     [this.email, this.password]);
+          }
+        }
+      context:
+        language: javascript
+        framework: nodejs
+
+    expected_output:
+      must_contain:
+        - "plaintext"
+        - "password"
+        - "hash"
+        - "bcrypt"
+      must_match_regex:
+        - "CWE-256|CWE-312"
+        - "A02:2021"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Path Traversal (Related to A01:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc010_path_traversal_file_access
+    description: "Detect path traversal vulnerability in file access"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        const fs = require('fs');
+
+        app.get('/download', (req, res) => {
+          const filename = req.query.file;
+          const filepath = './uploads/' + filename;
+          res.sendFile(filepath);
+        });
+
+        app.get('/read', (req, res) => {
+          const content = fs.readFileSync('./data/' + req.params.name);
+          res.send(content);
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "path traversal"
+        - "directory traversal"
+        - "../"
+        - "sanitize"
+      must_match_regex:
+        - "CWE-22|CWE-23"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Negative Tests (No False Positives)
+  # ---------------------------------------------------------------------------
+
+  - id: tc011_secure_code_no_false_positives
+    description: "Verify secure code is NOT flagged as vulnerable"
+    category: negative
+    priority: critical
+
+    input:
+      code: |
+        const express = require('express');
+        const helmet = require('helmet');
+        const rateLimit = require('express-rate-limit');
+        const bcrypt = require('bcrypt');
+        const validator = require('validator');
+
+        const app = express();
+        app.use(helmet());
+        app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 }));
+
+        app.post('/api/users', async (req, res) => {
+          const { email, password } = req.body;
+
+          // Input validation
+          if (!validator.isEmail(email)) {
+            return res.status(400).json({ error: 'Invalid email' });
+          }
+
+          // Secure password hashing
+          const hashedPassword = await bcrypt.hash(password, 12);
+
+          // Parameterized query
+          await db.query(
+            'INSERT INTO users (email, password) VALUES ($1, $2)',
+            [email, hashedPassword]
+          );
+
+          res.status(201).json({ message: 'User created' });
+        });
+      context:
+        language: javascript
+        framework: express
+        environment: production
+
+    expected_output:
+      must_contain:
+        - "secure"
+        - "best practice"
+      must_not_contain:
+        - "SQL injection"
+        - "XSS"
+        - "critical vulnerability"
+        - "high severity"
+      finding_count:
+        max: 2  # Allow informational findings only
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.6
+      allow_partial: true
+
+  - id: tc012_secure_auth_implementation
+    description: "Verify secure authentication is recognized as safe"
+    category: negative
+    priority: high
+
+    input:
+      code: |
+        const bcrypt = require('bcrypt');
+        const jwt = require('jsonwebtoken');
+
+        async function login(email, password) {
+          const user = await User.findByEmail(email);
+          if (!user) {
+            return { error: 'Invalid credentials' };
+          }
+
+          const match = await bcrypt.compare(password, user.passwordHash);
+          if (!match) {
+            return { error: 'Invalid credentials' };
+          }
+
+          const token = jwt.sign(
+            { userId: user.id },
+            process.env.JWT_SECRET,
+            { expiresIn: '1h' }
+          );
+
+          return { token };
+        }
+      context:
+        language: javascript
+        framework: nodejs
+
+    expected_output:
+      must_contain:
+        - "bcrypt"
+        - "jwt"
+        - "secure"
+      must_not_contain:
+        - "vulnerable"
+        - "critical"
+        - "hardcoded"
+      severity_classification: info
+
+    validation:
+      schema_check: true
+      allow_partial: true
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Python Security (Multi-language Support)
+  # ---------------------------------------------------------------------------
+
+  - id: tc013_python_sql_injection
+    description: "Detect SQL injection in Python Flask application"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        from flask import Flask, request
+        import sqlite3
+
+        app = Flask(__name__)
+
+        @app.route('/user')
+        def get_user():
+            user_id = request.args.get('id')
+            conn = sqlite3.connect('users.db')
+            cursor = conn.cursor()
+            cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
+            return str(cursor.fetchone())
+      context:
+        language: python
+        framework: flask
+
+    expected_output:
+      must_contain:
+        - "SQL injection"
+        - "parameterized"
+        - "f-string"
+      must_match_regex:
+        - "CWE-89"
+      severity_classification: critical
+      finding_count:
+        min: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  - id: tc014_python_ssti_jinja
+    description: "Detect Server-Side Template Injection in Jinja2"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        from flask import Flask, request, render_template_string
+
+        app = Flask(__name__)
+
+        @app.route('/render')
+        def render():
+            template = request.args.get('template')
+            return render_template_string(template)
+      context:
+        language: python
+        framework: flask
+
+    expected_output:
+      must_contain:
+        - "SSTI"
+        - "template injection"
+        - "render_template_string"
+        - "Jinja2"
+      must_match_regex:
+        - "CWE-94|CWE-1336"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  - id: tc015_python_pickle_deserialization
+    description: "Detect insecure deserialization with pickle"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        import pickle
+        from flask import Flask, request
+
+        app = Flask(__name__)
+
+        @app.route('/load')
+        def load_data():
+            data = request.get_data()
+            obj = pickle.loads(data)
+            return str(obj)
+      context:
+        language: python
+        framework: flask
+
+    expected_output:
+      must_contain:
+        - "pickle"
+        - "deserialization"
+        - "untrusted"
+        - "RCE"
+      must_match_regex:
+        - "CWE-502"
+        - "A08:2021"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+# =============================================================================
+# SUCCESS CRITERIA
+# =============================================================================
+
+success_criteria:
+  # Overall pass rate (90% of tests must pass)
+  pass_rate: 0.9
+
+  # Critical tests must ALL pass (100%)
+  critical_pass_rate: 1.0
+
+  # Average reasoning quality score
+  avg_reasoning_quality: 0.75
+
+  # Maximum suite execution time (5 minutes)
+  max_execution_time_ms: 300000
+
+  # Maximum variance between model results (15%)
+  cross_model_variance: 0.15
+
+# =============================================================================
+# METADATA
+# =============================================================================
+
+metadata:
+  author: "qe-security-auditor"
+  created: "2026-02-02"
+  last_updated: "2026-02-02"
+  coverage_target: >
+    OWASP Top 10 2021: A01 (Broken Access Control), A02 (Cryptographic Failures),
+    A03 (Injection - SQL, XSS, SSTI, Command), A07 (Authentication Failures),
+    A08 (Software Integrity - Deserialization). Covers JavaScript/Node.js
+    Express apps and Python Flask apps. 15 test cases with 90% pass rate
+    requirement and 100% critical pass rate.
@@ -0,0 +1,879 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://agentic-qe.dev/schemas/security-testing-output.json",
+  "title": "AQE Security Testing Skill Output Schema",
+  "description": "Schema for security-testing skill output validation. Extends the base skill-output template with OWASP Top 10 categories, CWE identifiers, and CVSS scoring.",
+  "type": "object",
+  "required": ["skillName", "version", "timestamp", "status", "trustTier", "output"],
+  "properties": {
+    "skillName": {
+      "type": "string",
+      "const": "security-testing",
+      "description": "Must be 'security-testing'"
+    },
+    "version": {
+      "type": "string",
+      "pattern": "^\\d+\\.\\d+\\.\\d+(-[a-zA-Z0-9]+)?$",
+      "description": "Semantic version of the skill"
+    },
+    "timestamp": {
+      "type": "string",
+      "format": "date-time",
+      "description": "ISO 8601 timestamp of output generation"
+    },
+    "status": {
+      "type": "string",
+      "enum": ["success", "partial", "failed", "skipped"],
+      "description": "Overall execution status"
+    },
+    "trustTier": {
+      "type": "integer",
+      "const": 3,
+      "description": "Trust tier 3 indicates full validation with eval suite"
+    },
+    "output": {
+      "type": "object",
+      "required": ["summary", "findings", "owaspCategories"],
+      "properties": {
+        "summary": {
+          "type": "string",
+          "minLength": 50,
+          "maxLength": 2000,
+          "description": "Human-readable summary of security findings"
+        },
+        "score": {
+          "$ref": "#/$defs/securityScore",
+          "description": "Overall security score"
+        },
+        "findings": {
+          "type": "array",
+          "items": {
+            "$ref": "#/$defs/securityFinding"
+          },
+          "maxItems": 500,
+          "description": "List of security vulnerabilities discovered"
+        },
+        "recommendations": {
+          "type": "array",
+          "items": {
+            "$ref": "#/$defs/securityRecommendation"
+          },
+          "maxItems": 100,
+          "description": "Prioritized remediation recommendations with code examples"
+        },
+        "metrics": {
+          "$ref": "#/$defs/securityMetrics",
+          "description": "Security scan metrics and statistics"
+        },
+        "owaspCategories": {
+          "$ref": "#/$defs/owaspCategoryBreakdown",
+          "description": "OWASP Top 10 2021 category breakdown"
+        },
+        "artifacts": {
+          "type": "array",
+          "items": {
+            "$ref": "#/$defs/artifact"
+          },
+          "maxItems": 50,
+          "description": "Generated security reports and scan artifacts"
+        },
+        "timeline": {
+          "type": "array",
+          "items": {
+            "$ref": "#/$defs/timelineEvent"
+          },
+          "description": "Scan execution timeline"
+        },
+        "scanConfiguration": {
+          "$ref": "#/$defs/scanConfiguration",
+          "description": "Configuration used for the security scan"
+        }
+      }
+    },
+    "metadata": {
+      "$ref": "#/$defs/metadata"
+    },
+    "validation": {
+      "$ref": "#/$defs/validationResult"
+    },
+    "learning": {
+      "$ref": "#/$defs/learningData"
+    }
+  },
+  "$defs": {
+    "securityScore": {
+      "type": "object",
+      "required": ["value", "max"],
+      "properties": {
+        "value": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 100,
+          "description": "Security score (0=critical issues, 100=no issues)"
+        },
+        "max": {
+          "type": "number",
+          "const": 100,
+          "description": "Maximum score is always 100"
+        },
+        "grade": {
+          "type": "string",
+          "pattern": "^[A-F][+-]?$",
+          "description": "Letter grade: A (90-100), B (80-89), C (70-79), D (60-69), F (<60)"
+        },
+        "trend": {
+          "type": "string",
+          "enum": ["improving", "stable", "declining", "unknown"],
+          "description": "Trend compared to previous scans"
+        },
+        "riskLevel": {
+          "type": "string",
+          "enum": ["critical", "high", "medium", "low", "minimal"],
+          "description": "Overall risk level assessment"
+        }
+      }
+    },
+    "securityFinding": {
+      "type": "object",
+      "required": ["id", "title", "severity", "owasp"],
+      "properties": {
+        "id": {
+          "type": "string",
+          "pattern": "^SEC-\\d{3,6}$",
+          "description": "Unique finding identifier (e.g., SEC-001)"
+        },
+        "title": {
+          "type": "string",
+          "minLength": 10,
+          "maxLength": 200,
+          "description": "Finding title describing the vulnerability"
+        },
+        "description": {
+          "type": "string",
+          "maxLength": 2000,
+          "description": "Detailed description of the vulnerability"
+        },
+        "severity": {
+          "type": "string",
+          "enum": ["critical", "high", "medium", "low", "info"],
+          "description": "Severity: critical (CVSS 9.0-10.0), high (7.0-8.9), medium (4.0-6.9), low (0.1-3.9), info (0)"
+        },
+        "owasp": {
+          "type": "string",
+          "pattern": "^A(0[1-9]|10):20(21|25)$",
+          "description": "OWASP Top 10 category (e.g., A01:2021, A03:2025)"
+        },
+        "owaspCategory": {
+          "type": "string",
+          "enum": [
+            "A01:2021-Broken-Access-Control",
+            "A02:2021-Cryptographic-Failures",
+            "A03:2021-Injection",
+            "A04:2021-Insecure-Design",
+            "A05:2021-Security-Misconfiguration",
+            "A06:2021-Vulnerable-Components",
+            "A07:2021-Identification-Authentication-Failures",
+            "A08:2021-Software-Data-Integrity-Failures",
+            "A09:2021-Security-Logging-Monitoring-Failures",
+            "A10:2021-Server-Side-Request-Forgery"
+          ],
+          "description": "Full OWASP category name"
+        },
+        "cwe": {
+          "type": "string",
+          "pattern": "^CWE-\\d{1,4}$",
+          "description": "CWE identifier (e.g., CWE-79 for XSS, CWE-89 for SQLi)"
+        },
+        "cvss": {
+          "type": "object",
+          "properties": {
+            "score": {
+              "type": "number",
+              "minimum": 0,
+              "maximum": 10,
+              "description": "CVSS v3.1 base score"
+            },
+            "vector": {
+              "type": "string",
+              "pattern": "^CVSS:3\\.1/AV:[NALP]/AC:[LH]/PR:[NLH]/UI:[NR]/S:[UC]/C:[NLH]/I:[NLH]/A:[NLH]$",
+              "description": "CVSS v3.1 vector string"
+            },
+            "severity": {
+              "type": "string",
+              "enum": ["None", "Low", "Medium", "High", "Critical"],
+              "description": "CVSS severity rating"
+            }
+          }
+        },
+        "location": {
+          "$ref": "#/$defs/location",
+          "description": "Location of the vulnerability"
+        },
+        "evidence": {
+          "type": "string",
+          "maxLength": 5000,
+          "description": "Evidence: code snippet, request/response, or PoC"
+        },
+        "remediation": {
+          "type": "string",
+          "maxLength": 2000,
+          "description": "Specific fix instructions for this finding"
+        },
+        "references": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "required": ["title", "url"],
+            "properties": {
+              "title": { "type": "string" },
+              "url": { "type": "string", "format": "uri" }
+            }
+          },
+          "maxItems": 10,
+          "description": "External references (OWASP, CWE, CVE, etc.)"
+        },
+        "falsePositive": {
+          "type": "boolean",
+          "default": false,
+          "description": "Potential false positive flag"
+        },
+        "confidence": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Confidence in finding accuracy (0.0-1.0)"
+        },
+        "exploitability": {
+          "type": "string",
+          "enum": ["trivial", "easy", "moderate", "difficult", "theoretical"],
+          "description": "How easy is it to exploit this vulnerability"
+        },
+        "affectedVersions": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "Affected package/library versions for dependency vulnerabilities"
+        },
+        "cve": {
+          "type": "string",
+          "pattern": "^CVE-\\d{4}-\\d{4,}$",
+          "description": "CVE identifier if applicable"
+        }
+      }
+    },
+    "securityRecommendation": {
+      "type": "object",
+      "required": ["id", "title", "priority", "owaspCategories"],
+      "properties": {
+        "id": {
+          "type": "string",
+          "pattern": "^REC-\\d{3,6}$",
+          "description": "Unique recommendation identifier"
+        },
+        "title": {
+          "type": "string",
+          "minLength": 10,
+          "maxLength": 200,
+          "description": "Recommendation title"
+        },
+        "description": {
+          "type": "string",
+          "maxLength": 2000,
+          "description": "Detailed recommendation description"
+        },
+        "priority": {
+          "type": "string",
+          "enum": ["critical", "high", "medium", "low"],
+          "description": "Remediation priority"
+        },
+        "effort": {
+          "type": "string",
+          "enum": ["trivial", "low", "medium", "high", "major"],
+          "description": "Estimated effort: trivial(<1hr), low(1-4hr), medium(1-3d), high(1-2wk), major(>2wk)"
+        },
+        "impact": {
+          "type": "integer",
+          "minimum": 1,
+          "maximum": 10,
+          "description": "Security impact if implemented (1-10)"
+        },
+        "relatedFindings": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "pattern": "^SEC-\\d{3,6}$"
+          },
+          "description": "IDs of findings this addresses"
+        },
+        "owaspCategories": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "pattern": "^A(0[1-9]|10):20(21|25)$"
+          },
+          "description": "OWASP categories this recommendation addresses"
+        },
+        "codeExample": {
+          "type": "object",
+          "properties": {
+            "before": {
+              "type": "string",
+              "maxLength": 2000,
+              "description": "Vulnerable code example"
+            },
+            "after": {
+              "type": "string",
+              "maxLength": 2000,
+              "description": "Secure code example"
+            },
+            "language": {
+              "type": "string",
+              "description": "Programming language"
+            }
+          },
+          "description": "Before/after code examples for remediation"
+        },
+        "resources": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "required": ["title", "url"],
+            "properties": {
+              "title": { "type": "string" },
+              "url": { "type": "string", "format": "uri" }
+            }
+          },
+          "maxItems": 10,
+          "description": "External resources and documentation"
+        },
+        "automatable": {
+          "type": "boolean",
+          "description": "Can this fix be automated?"
+        },
+        "fixCommand": {
+          "type": "string",
+          "description": "CLI command to apply fix if automatable"
+        }
+      }
+    },
+    "owaspCategoryBreakdown": {
+      "type": "object",
+      "description": "OWASP Top 10 2021 category scores and findings",
+      "properties": {
+        "A01:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A01:2021 - Broken Access Control"
+        },
+        "A02:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A02:2021 - Cryptographic Failures"
+        },
+        "A03:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A03:2021 - Injection"
+        },
+        "A04:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A04:2021 - Insecure Design"
+        },
+        "A05:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A05:2021 - Security Misconfiguration"
+        },
+        "A06:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A06:2021 - Vulnerable and Outdated Components"
+        },
+        "A07:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A07:2021 - Identification and Authentication Failures"
+        },
+        "A08:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A08:2021 - Software and Data Integrity Failures"
+        },
+        "A09:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A09:2021 - Security Logging and Monitoring Failures"
+        },
+        "A10:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A10:2021 - Server-Side Request Forgery (SSRF)"
+        }
+      },
+      "additionalProperties": false
+    },
+    "owaspCategoryScore": {
+      "type": "object",
+      "required": ["tested", "score"],
+      "properties": {
+        "tested": {
+          "type": "boolean",
+          "description": "Whether this category was tested"
+        },
+        "score": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 100,
+          "description": "Category score (100 = no issues, 0 = critical)"
+        },
+        "grade": {
+          "type": "string",
+          "pattern": "^[A-F][+-]?$",
+          "description": "Letter grade for this category"
+        },
+        "findingCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of findings in this category"
+        },
+        "criticalCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of critical findings"
+        },
+        "highCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of high severity findings"
+        },
+        "status": {
+          "type": "string",
+          "enum": ["pass", "fail", "warn", "skip"],
+          "description": "Category status"
+        },
+        "description": {
+          "type": "string",
+          "description": "Category description and context"
+        },
+        "cwes": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "pattern": "^CWE-\\d{1,4}$"
+          },
+          "description": "CWEs found in this category"
+        }
+      }
+    },
+    "securityMetrics": {
+      "type": "object",
+      "properties": {
+        "totalFindings": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Total vulnerabilities found"
+        },
+        "criticalCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Critical severity findings"
+        },
+        "highCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "High severity findings"
+        },
+        "mediumCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Medium severity findings"
+        },
+        "lowCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Low severity findings"
+        },
+        "infoCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Informational findings"
+        },
+        "filesScanned": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of files analyzed"
+        },
+        "linesOfCode": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Lines of code scanned"
+        },
+        "dependenciesChecked": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of dependencies checked"
+        },
+        "owaspCategoriesTested": {
+          "type": "integer",
+          "minimum": 0,
+          "maximum": 10,
+          "description": "OWASP Top 10 categories tested"
+        },
+        "owaspCategoriesPassed": {
+          "type": "integer",
+          "minimum": 0,
+          "maximum": 10,
+          "description": "OWASP Top 10 categories with no findings"
+        },
+        "uniqueCwes": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Unique CWE identifiers found"
+        },
+        "falsePositiveRate": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Estimated false positive rate"
+        },
+        "scanDurationMs": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Total scan duration in milliseconds"
+        },
+        "coverage": {
+          "type": "object",
+          "properties": {
+            "sast": {
+              "type": "boolean",
+              "description": "Static analysis performed"
+            },
+            "dast": {
+              "type": "boolean",
+              "description": "Dynamic analysis performed"
+            },
+            "dependencies": {
+              "type": "boolean",
+              "description": "Dependency scan performed"
+            },
+            "secrets": {
+              "type": "boolean",
+              "description": "Secret scanning performed"
+            },
+            "configuration": {
+              "type": "boolean",
+              "description": "Configuration review performed"
+            }
+          },
+          "description": "Scan coverage indicators"
+        }
+      }
+    },
+    "scanConfiguration": {
+      "type": "object",
+      "properties": {
+        "target": {
+          "type": "string",
+          "description": "Scan target (file path, URL, or package)"
+        },
+        "targetType": {
+          "type": "string",
+          "enum": ["source", "url", "package", "container", "infrastructure"],
+          "description": "Type of target being scanned"
+        },
+        "scanTypes": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "enum": ["sast", "dast", "dependency", "secret", "configuration", "container", "iac"]
+          },
+          "description": "Types of scans performed"
+        },
+        "severity": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "enum": ["critical", "high", "medium", "low", "info"]
+          },
+          "description": "Severity levels included in scan"
+        },
+        "owaspCategories": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "pattern": "^A(0[1-9]|10):20(21|25)$"
+          },
+          "description": "OWASP categories tested"
+        },
+        "tools": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "Security tools used"
+        },
+        "excludePatterns": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "File patterns excluded from scan"
+        },
+        "rulesets": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "Security rulesets applied"
+        }
+      }
+    },
+    "location": {
+      "type": "object",
+      "properties": {
+        "file": {
+          "type": "string",
+          "maxLength": 500,
+          "description": "File path relative to project root"
+        },
+        "line": {
+          "type": "integer",
+          "minimum": 1,
+          "description": "Line number"
+        },
+        "column": {
+          "type": "integer",
+          "minimum": 1,
+          "description": "Column number"
+        },
+        "endLine": {
+          "type": "integer",
+          "minimum": 1,
+          "description": "End line for multi-line findings"
+        },
+        "endColumn": {
+          "type": "integer",
+          "minimum": 1,
+          "description": "End column"
+        },
+        "url": {
+          "type": "string",
+          "format": "uri",
+          "description": "URL for web-based findings"
+        },
+        "endpoint": {
+          "type": "string",
+          "description": "API endpoint path"
+        },
+        "method": {
+          "type": "string",
+          "enum": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS"],
+          "description": "HTTP method for API findings"
+        },
+        "parameter": {
+          "type": "string",
+          "description": "Vulnerable parameter name"
+        },
+        "component": {
+          "type": "string",
+          "description": "Affected component or module"
+        }
+      }
+    },
+    "artifact": {
+      "type": "object",
+      "required": ["type", "path"],
+      "properties": {
+        "type": {
+          "type": "string",
+          "enum": ["report", "sarif", "data", "log", "evidence"],
+          "description": "Artifact type"
+        },
+        "path": {
+          "type": "string",
+          "maxLength": 500,
+          "description": "Path to artifact"
+        },
+        "format": {
+          "type": "string",
+          "enum": ["json", "sarif", "html", "md", "txt", "xml", "csv"],
+          "description": "Artifact format"
+        },
+        "description": {
+          "type": "string",
+          "maxLength": 500,
+          "description": "Artifact description"
+        },
+        "sizeBytes": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "File size in bytes"
+        },
+        "checksum": {
+          "type": "string",
+          "pattern": "^sha256:[a-f0-9]{64}$",
+          "description": "SHA-256 checksum"
+        }
+      }
+    },
+    "timelineEvent": {
+      "type": "object",
+      "required": ["timestamp", "event"],
+      "properties": {
+        "timestamp": {
+          "type": "string",
+          "format": "date-time",
+          "description": "Event timestamp"
+        },
+        "event": {
+          "type": "string",
+          "maxLength": 200,
+          "description": "Event description"
+        },
+        "type": {
+          "type": "string",
+          "enum": ["start", "checkpoint", "warning", "error", "complete"],
+          "description": "Event type"
+        },
+        "durationMs": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Duration since previous event"
+        },
+        "phase": {
+          "type": "string",
+          "enum": ["initialization", "sast", "dast", "dependency", "secret", "reporting"],
+          "description": "Scan phase"
+        }
+      }
+    },
+    "metadata": {
+      "type": "object",
+      "properties": {
+        "executionTimeMs": {
+          "type": "integer",
+          "minimum": 0,
+          "maximum": 3600000,
+          "description": "Execution time in milliseconds"
+        },
+        "toolsUsed": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "enum": ["semgrep", "npm-audit", "trivy", "owasp-zap", "bandit", "gosec", "eslint-security", "snyk", "gitleaks", "trufflehog", "bearer"]
+          },
+          "uniqueItems": true,
+          "description": "Security tools used"
+        },
+        "agentId": {
+          "type": "string",
+          "pattern": "^qe-[a-z][a-z0-9-]*$",
+          "description": "Agent ID (e.g., qe-security-scanner)"
+        },
+        "modelUsed": {
+          "type": "string",
+          "description": "LLM model used for analysis"
+        },
+        "inputHash": {
+          "type": "string",
+          "pattern": "^[a-f0-9]{64}$",
+          "description": "SHA-256 hash of input"
+        },
+        "targetUrl": {
+          "type": "string",
+          "format": "uri",
+          "description": "Target URL if applicable"
+        },
+        "targetPath": {
+          "type": "string",
+          "description": "Target path if applicable"
+        },
+        "environment": {
+          "type": "string",
+          "enum": ["development", "staging", "production", "ci"],
+          "description": "Execution environment"
+        },
+        "retryCount": {
+          "type": "integer",
+          "minimum": 0,
+          "maximum": 10,
+          "description": "Number of retries"
+        }
+      }
+    },
+    "validationResult": {
+      "type": "object",
+      "properties": {
+        "schemaValid": {
+          "type": "boolean",
+          "description": "Passes JSON schema validation"
+        },
+        "contentValid": {
+          "type": "boolean",
+          "description": "Passes content validation"
+        },
+        "confidence": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Confidence score"
+        },
+        "warnings": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "maxLength": 500
+          },
+          "maxItems": 20,
+          "description": "Validation warnings"
+        },
+        "errors": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "maxLength": 500
+          },
+          "maxItems": 20,
+          "description": "Validation errors"
+        },
+        "validatorVersion": {
+          "type": "string",
+          "pattern": "^\\d+\\.\\d+\\.\\d+$",
+          "description": "Validator version"
+        }
+      }
+    },
+    "learningData": {
+      "type": "object",
+      "properties": {
+        "patternsDetected": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "maxLength": 200
+          },
+          "maxItems": 20,
+          "description": "Security patterns detected (e.g., sql-injection-string-concat)"
+        },
+        "reward": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Reward signal for learning (0.0-1.0)"
+        },
+        "feedbackLoop": {
+          "type": "object",
+          "properties": {
+            "previousRunId": {
+              "type": "string",
+              "format": "uuid",
+              "description": "Previous run ID for comparison"
+            },
+            "improvement": {
+              "type": "number",
+              "minimum": -1,
+              "maximum": 1,
+              "description": "Improvement over previous run"
+            }
+          }
+        },
+        "newVulnerabilityPatterns": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "properties": {
+              "pattern": { "type": "string" },
+              "cwe": { "type": "string" },
+              "confidence": { "type": "number" }
+            }
+          },
+          "description": "New vulnerability patterns learned"
+        }
+      }
+    }
+  }
+}
@@ -0,0 +1,45 @@
+{
+  "skillName": "security-testing",
+  "skillVersion": "1.0.0",
+  "requiredTools": [
+    "jq"
+  ],
+  "optionalTools": [
+    "npm",
+    "semgrep",
+    "trivy",
+    "ajv",
+    "jsonschema",
+    "python3"
+  ],
+  "schemaPath": "schemas/output.json",
+  "requiredFields": [
+    "skillName",
+    "status",
+    "output",
+    "output.summary",
+    "output.findings",
+    "output.owaspCategories"
+  ],
+  "requiredNonEmptyFields": [
+    "output.summary"
+  ],
+  "mustContainTerms": [
+    "OWASP",
+    "security",
+    "vulnerability"
+  ],
+  "mustNotContainTerms": [
+    "TODO",
+    "placeholder",
+    "FIXME"
+  ],
+  "enumValidations": {
+    ".status": [
+      "success",
+      "partial",
+      "failed",
+      "skipped"
+    ]
+  }
+}