diff --git a/.cursor/skills/autopilot/SKILL.md b/.cursor/skills/autopilot/SKILL.md
index db02045..8cec5a5 100644
--- a/.cursor/skills/autopilot/SKILL.md
+++ b/.cursor/skills/autopilot/SKILL.md
@@ -17,6 +17,17 @@ disable-model-invocation: true
 
 Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Detects project state from `_docs/`, resumes from where work stopped, and flows through skills automatically. The user invokes `/autopilot` once — the engine handles sequencing, transitions, and re-entry.
 
+## File Index
+
+| File | Purpose |
+|------|---------|
+| `flows/greenfield.md` | Detection rules, step table, and auto-chain rules for new projects |
+| `flows/existing-code.md` | Detection rules, step table, and auto-chain rules for existing codebases |
+| `state.md` | State file format, rules, re-entry protocol, session boundaries |
+| `protocols.md` | User interaction, Jira MCP auth, choice format, error handling, status summary |
+
+**On every invocation**: read all four files above before executing any logic.
+
 ## Core Principles
 
 - **Auto-chain**: when a skill completes, immediately start the next one — no pause between skills
@@ -24,250 +35,57 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det
 - **State from disk**: all progress is persisted to `_docs/_autopilot_state.md` and cross-checked against `_docs/` folder structure
 - **Rich re-entry**: on every invocation, read the state file for full context before continuing
 - **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here
+- **Sound on pause**: follow `.cursor/rules/human-attention-sound.mdc` — play a notification sound before every pause that requires human input
+- **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically
+- **Single project per workspace**: all `_docs/` paths are relative to workspace root; for monorepos, each service needs its own Cursor workspace
 
-## State File: `_docs/_autopilot_state.md`
+## Flow Resolution
 
-The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
+Determine which flow to use:
 
-### Format
+1. If workspace has source code files **and** `_docs/` does not exist → **existing-code flow** (Pre-Step detection)
+2. If `_docs/_autopilot_state.md` exists and records Document in `Completed Steps` → **existing-code flow**
+3. If `_docs/_autopilot_state.md` exists and `step: done` AND workspace contains source code → **existing-code flow** (completed project re-entry — loops to New Task)
+4. Otherwise → **greenfield flow**
 
-```markdown
-# Autopilot State
+After selecting the flow, apply its detection rules (first match wins) to determine the current step.
 
-## Current Step
-step: [0-5 or "done"]
-name: [Problem / Research / Plan / Decompose / Implement / Deploy / Done]
-status: [not_started / in_progress / completed]
-sub_step: [optional — sub-skill phase if interrupted mid-step, e.g. "Plan Step 3: Component Decomposition"]
+## Execution Loop
 
-## Completed Steps
-
-| Step | Name | Completed | Key Outcome |
-|------|------|-----------|-------------|
-| 0 | Problem | [date] | [one-line summary] |
-| 1 | Research | [date] | [N drafts, final approach summary] |
-| 2 | Plan | [date] | [N components, architecture summary] |
-| 3 | Decompose | [date] | [N tasks, total complexity points] |
-| 4 | Implement | [date] | [N batches, pass/fail summary] |
-| 5 | Deploy | [date] | [artifacts produced] |
-
-## Key Decisions
-- [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
-- [decision 2: e.g. "6 research rounds, final draft: solution_draft06.md"]
-- [decision N]
-
-## Last Session
-date: [date]
-ended_at: [step name and phase]
-reason: [completed step / session boundary / user paused / context limit]
-notes: [any context for next session, e.g. "User asked to revisit risk assessment"]
-
-## Blockers
-- [blocker 1, if any]
-- [none]
-```
-
-### State File Rules
-
-1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 0)
-2. **Update** the state file after every step completion, every session boundary, and every BLOCKING gate confirmation
-3. **Read** the state file as the first action on every invocation — before folder scanning
-4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 2 but `_docs/02_plans/architecture.md` already exists), trust the folder structure and update the state file to match
-5. **Never delete** the state file. It accumulates history across the entire project lifecycle
-
-## Execution Entry Point
-
-Every invocation of this skill follows the same sequence:
+Every invocation follows this sequence:
 
 ```
 1. Read _docs/_autopilot_state.md (if exists)
-2. Cross-check state file against _docs/ folder structure
-3. Resolve current step (state file + folder scan)
-4. Present Status Summary (from state file context)
-5. Enter Execution Loop:
-   a. Read and execute the current skill's SKILL.md
-   b. When skill completes → update state file
-   c. Re-detect next step
-   d. If next skill is ready → auto-chain (go to 5a with next skill)
-   e. If session boundary reached → update state file with session notes → suggest new conversation
-   f. If all steps done → update state file → report completion
+2. Read all File Index files above
+3. Cross-check state file against _docs/ folder structure (rules in state.md)
+4. Resolve flow (see Flow Resolution above)
+5. Resolve current step (detection rules from the active flow file)
+6. Present Status Summary (template in active flow file)
+7. Execute:
+   a. Delegate to current skill (see Skill Delegation below)
+   b. If skill returns FAILED → apply Skill Failure Retry Protocol (see protocols.md):
+      - Auto-retry the same skill (failure may be caused by missing user input or environment issue)
+      - If 3 consecutive auto-retries fail → record in state file Blockers, warn user, stop auto-retry
+   c. When skill completes successfully → reset retry counter, update state file (rules in state.md)
+   d. Re-detect next step from the active flow's detection rules
+   e. If next skill is ready → auto-chain (go to 7a with next skill)
+   f. If session boundary reached → update state, suggest new conversation (rules in state.md)
+   g. If all steps done → update state → report completion
 ```
 
-## State Detection
-
-Read `_docs/_autopilot_state.md` first. If it exists and is consistent with the folder structure, use the `Current Step` from the state file. If the state file doesn't exist or is inconsistent, fall back to folder scanning.
-
-### Folder Scan Rules (fallback)
-
-Scan `_docs/` to determine the current workflow position. Check rules in order — first match wins.
-
-### Detection Rules
-
-**Step 0 — Problem Gathering**
-Condition: `_docs/00_problem/` does not exist, OR any of these are missing/empty:
-- `problem.md`
-- `restrictions.md`
-- `acceptance_criteria.md`
-- `input_data/` (must contain at least one file)
-
-Action: Read and execute `.cursor/skills/problem/SKILL.md`
-
----
-
-**Step 1 — Research (Initial)**
-Condition: `_docs/00_problem/` is complete AND `_docs/01_solution/` has no `solution_draft*.md` files
-
-Action: Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode A)
-
----
-
-**Step 1b — Research Decision**
-Condition: `_docs/01_solution/` contains `solution_draft*.md` files AND `_docs/01_solution/solution.md` does not exist AND `_docs/02_plans/architecture.md` does not exist
-
-Action: Present the current research state to the user:
-- How many solution drafts exist
-- Whether tech_stack.md and security_analysis.md exist
-- One-line summary from the latest draft
-
-Then ask: **"Run another research round (Mode B assessment), or proceed to planning?"**
-- If user wants another round → Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode B)
-- If user wants to proceed → auto-chain to Step 2 (Plan)
-
----
-
-**Step 2 — Plan**
-Condition: `_docs/01_solution/` has `solution_draft*.md` files AND `_docs/02_plans/architecture.md` does not exist
-
-Action:
-1. The plan skill's Prereq 2 will rename the latest draft to `solution.md` — this is handled by the plan skill itself
-2. Read and execute `.cursor/skills/plan/SKILL.md`
-
-If `_docs/02_plans/` exists but is incomplete (has some artifacts but no `FINAL_report.md`), the plan skill's built-in resumability handles it.
-
----
-
-**Step 3 — Decompose**
-Condition: `_docs/02_plans/` contains `architecture.md` AND `_docs/02_plans/components/` has at least one component AND `_docs/02_tasks/` does not exist or has no task files (excluding `_dependencies_table.md`)
-
-Action: Read and execute `.cursor/skills/decompose/SKILL.md`
-
-If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
-
----
-
-**Step 4 — Implement**
-Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
-
-Action: Read and execute `.cursor/skills/implement/SKILL.md`
-
-If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
-
----
-
-**Step 5 — Deploy**
-Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND `_docs/04_deploy/` does not exist or is incomplete
-
-Action: Read and execute `.cursor/skills/deploy/SKILL.md`
-
----
-
-**Done**
-Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md)
-
-Action: Report project completion with summary.
-
-## Status Summary
-
-On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback).
-
-Format:
-
-```
-═══════════════════════════════════════════════════
- AUTOPILOT STATUS
-═══════════════════════════════════════════════════
- Step 0  Problem      [DONE / IN PROGRESS / NOT STARTED]
- Step 1  Research     [DONE (N drafts) / IN PROGRESS / NOT STARTED]
- Step 2  Plan         [DONE / IN PROGRESS / NOT STARTED]
- Step 3  Decompose    [DONE (N tasks) / IN PROGRESS / NOT STARTED]
- Step 4  Implement    [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
- Step 5  Deploy       [DONE / IN PROGRESS / NOT STARTED]
-═══════════════════════════════════════════════════
- Current step: [Step N — Name]
- Action: [what will happen next]
-═══════════════════════════════════════════════════
-```
-
-For re-entry (state file exists), also include:
-- Key decisions from the state file's `Key Decisions` section
-- Last session context from the `Last Session` section
-- Any blockers from the `Blockers` section
-
-## Auto-Chain Rules
-
-After a skill completes, apply these rules:
-
-| Completed Step | Next Action |
-|---------------|-------------|
-| Problem Gathering | Auto-chain → Research (Mode A) |
-| Research (any round) | Auto-chain → Research Decision (ask user: another round or proceed?) |
-| Research Decision → proceed | Auto-chain → Plan |
-| Plan | Auto-chain → Decompose |
-| Decompose | **Session boundary** — suggest new conversation before Implement |
-| Implement | Auto-chain → Deploy |
-| Deploy | Report completion |
-
-### Session Boundary: Decompose → Implement
-
-After decompose completes, **do not auto-chain to implement**. Instead:
-
-1. Update state file: mark Decompose as completed, set current step to 4 (Implement) with status `not_started`
-2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
-3. Present a summary: number of tasks, estimated batches, total complexity points
-4. Suggest: "Implementation is the longest phase and benefits from a fresh conversation context. Start a new conversation and type `/autopilot` to begin implementation."
-5. If the user insists on continuing in the same conversation, proceed.
-
-This is the only hard session boundary. All other transitions auto-chain.
-
 ## Skill Delegation
 
 For each step, the delegation pattern is:
 
-1. Update state file: set current step to `in_progress`, record `sub_step` if applicable
+1. Update state file: set `step` to the autopilot step number, status to `in_progress`, set `sub_step` to the sub-skill's current internal step/phase, reset `retry_count: 0`
 2. Announce: "Starting [Skill Name]..."
 3. Read the skill file: `.cursor/skills/[name]/SKILL.md`
-4. Execute the skill's workflow exactly as written, including:
-   - All BLOCKING gates (present to user, wait for confirmation)
-   - All self-verification checklists
-   - All save actions
-   - All escalation rules
-5. When the skill's workflow is fully complete:
-   - Update state file: mark step as `completed`, record date, write one-line key outcome
-   - Add any key decisions made during this step to the `Key Decisions` section
-   - Return to the auto-chain rules
+4. Execute the skill's workflow exactly as written, including all BLOCKING gates, self-verification checklists, save actions, and escalation rules. Update `sub_step` in state each time the sub-skill advances.
+5. If the skill **fails**: follow the Skill Failure Retry Protocol in `protocols.md` — increment `retry_count`, auto-retry up to 3 times, then escalate.
+6. When complete (success): reset `retry_count: 0`, mark step `completed`, record date + key outcome, add key decisions to state file, return to auto-chain rules (from active flow file)
 
 Do NOT modify, skip, or abbreviate any part of the sub-skill's workflow. The autopilot is a sequencer, not an optimizer.
 
-## Re-Entry Protocol
-
-When the user invokes `/autopilot` and work already exists:
-
-1. Read `_docs/_autopilot_state.md`
-2. Cross-check against `_docs/` folder structure
-3. Present Status Summary with context from state file (key decisions, last session, blockers)
-4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
-5. Continue execution from detected state
-
-## Error Handling
-
-| Situation | Action |
-|-----------|--------|
-| State detection is ambiguous (artifacts suggest two different steps) | Present findings to user, ask which step to execute |
-| Sub-skill fails or hits an unrecoverable blocker | Report the error, suggest the user fix it manually, then re-invoke `/autopilot` |
-| User wants to skip a step | Warn about downstream dependencies, proceed if user confirms |
-| User wants to go back to a previous step | Warn that re-running may overwrite artifacts, proceed if user confirms |
-| User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |
-
 ## Trigger Conditions
 
 This skill activates when the user wants to:
@@ -281,41 +99,9 @@ This skill activates when the user wants to:
 **Differentiation**:
 - User wants only research → use `/research` directly
 - User wants only planning → use `/plan` directly
+- User wants to document an existing codebase → use `/document` directly
 - User wants the full guided workflow → use `/autopilot`
 
-## Methodology Quick Reference
+## Flow Reference
 
-```
-┌────────────────────────────────────────────────────────────────┐
-│              Autopilot (Auto-Chain Orchestrator)                │
-├────────────────────────────────────────────────────────────────┤
-│ EVERY INVOCATION:                                              │
-│   1. State Detection (scan _docs/)                             │
-│   2. Status Summary (show progress)                            │
-│   3. Execute current skill                                     │
-│   4. Auto-chain to next skill (loop)                           │
-│                                                                │
-│ WORKFLOW:                                                       │
-│   Step 0  Problem    → .cursor/skills/problem/SKILL.md         │
-│     ↓ auto-chain                                               │
-│   Step 1  Research   → .cursor/skills/research/SKILL.md        │
-│     ↓ auto-chain (ask: another round?)                         │
-│   Step 2  Plan       → .cursor/skills/plan/SKILL.md            │
-│     ↓ auto-chain                                               │
-│   Step 3  Decompose  → .cursor/skills/decompose/SKILL.md       │
-│     ↓ SESSION BOUNDARY (suggest new conversation)              │
-│   Step 4  Implement  → .cursor/skills/implement/SKILL.md       │
-│     ↓ auto-chain                                               │
-│   Step 5  Deploy     → .cursor/skills/deploy/SKILL.md          │
-│     ↓                                                          │
-│   DONE                                                         │
-│                                                                │
-│ STATE FILE: _docs/_autopilot_state.md                          │
-│ FALLBACK: _docs/ folder structure scan                         │
-│ PAUSE POINTS: sub-skill BLOCKING gates only                    │
-│ SESSION BREAK: after Decompose (before Implement)              │
-├────────────────────────────────────────────────────────────────┤
-│ Principles: Auto-chain · State to file · Rich re-entry         │
-│             Delegate don't duplicate · Pause at decisions only  │
-└────────────────────────────────────────────────────────────────┘
-```
+See `flows/greenfield.md` and `flows/existing-code.md` for step tables, detection rules, auto-chain rules, and status summary templates.
diff --git a/.cursor/skills/autopilot/flows/existing-code.md b/.cursor/skills/autopilot/flows/existing-code.md
new file mode 100644
index 0000000..ff31c36
--- /dev/null
+++ b/.cursor/skills/autopilot/flows/existing-code.md
@@ -0,0 +1,234 @@
+# Existing Code Workflow
+
+Workflow for projects with an existing codebase. Starts with documentation, produces test specs, decomposes and implements tests, verifies them, refactors with that safety net, then adds new functionality and deploys.
+
+## Step Reference Table
+
+| Step | Name | Sub-Skill | Internal SubSteps |
+|------|------|-----------|-------------------|
+| 1 | Document | document/SKILL.md | Steps 1–8 |
+| 2 | Test Spec | test-spec/SKILL.md | Phase 1a–1b |
+| 3 | Decompose Tests | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4 |
+| 4 | Implement Tests | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
+| 5 | Run Tests | test-run/SKILL.md | Steps 1–4 |
+| 6 | Refactor | refactor/SKILL.md | Phases 0–5 (6-phase method) |
+| 7 | New Task | new-task/SKILL.md | Steps 1–8 (loop) |
+| 8 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
+| 9 | Run Tests | test-run/SKILL.md | Steps 1–4 |
+| 10 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
+| 11 | Performance Test | (autopilot-managed) | Load/stress tests (optional) |
+| 12 | Deploy | deploy/SKILL.md | Step 1–7 |
+
+After Step 12, the existing-code workflow is complete.
+
+## Detection Rules
+
+Check rules in order — first match wins.
+
+---
+
+**Step 1 — Document**
+Condition: `_docs/` does not exist AND the workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`, `src/`, `Cargo.toml`, `*.csproj`, `package.json`)
+
+Action: An existing codebase without documentation was detected. Read and execute `.cursor/skills/document/SKILL.md`. After the document skill completes, re-detect state (the produced `_docs/` artifacts will place the project at Step 2 or later).
+
+---
+
+**Step 2 — Test Spec**
+Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/tests/traceability-matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry)
+
+Action: Read and execute `.cursor/skills/test-spec/SKILL.md`
+
+This step applies when the codebase was documented via the `/document` skill. Test specifications must be produced before refactoring or further development.
+
+---
+
+**Step 3 — Decompose Tests**
+Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files)
+
+Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will:
+1. Run Step 1t (test infrastructure bootstrap)
+2. Run Step 3 (blackbox test task decomposition)
+3. Run Step 4 (cross-verification against test coverage)
+
+If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
+
+---
+
+**Step 4 — Implement Tests**
+Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND the autopilot state shows Step 3 (Decompose Tests) is completed AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
+
+Action: Read and execute `.cursor/skills/implement/SKILL.md`
+
+The implement skill reads test tasks from `_docs/02_tasks/` and implements them.
+
+If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
+
+---
+
+**Step 5 — Run Tests**
+Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 4 (Implement Tests) is completed AND the autopilot state does NOT show Step 5 (Run Tests) as completed
+
+Action: Read and execute `.cursor/skills/test-run/SKILL.md`
+
+Verifies the implemented test suite passes before proceeding to refactoring. The tests form the safety net for all subsequent code changes.
+
+---
+
+**Step 6 — Refactor**
+Condition: the autopilot state shows Step 5 (Run Tests) is completed AND `_docs/04_refactoring/FINAL_report.md` does not exist
+
+Action: Read and execute `.cursor/skills/refactor/SKILL.md`
+
+The refactor skill runs the full 6-phase method using the implemented tests as a safety net.
+
+If `_docs/04_refactoring/` has phase reports, the refactor skill detects completed phases and continues.
+
+---
+
+**Step 7 — New Task**
+Condition: the autopilot state shows Step 6 (Refactor) is completed AND the autopilot state does NOT show Step 7 (New Task) as completed
+
+Action: Read and execute `.cursor/skills/new-task/SKILL.md`
+
+The new-task skill interactively guides the user through defining new functionality. It loops until the user is done adding tasks. New task files are written to `_docs/02_tasks/`.
+
+---
+
+**Step 8 — Implement**
+Condition: the autopilot state shows Step 7 (New Task) is completed AND `_docs/03_implementation/` does not contain a FINAL report covering the new tasks (check state for distinction between test implementation and feature implementation)
+
+Action: Read and execute `.cursor/skills/implement/SKILL.md`
+
+The implement skill reads the new tasks from `_docs/02_tasks/` and implements them. Tasks already implemented in Step 4 are skipped (the implement skill tracks completed tasks in batch reports).
+
+If `_docs/03_implementation/` has batch reports from this phase, the implement skill detects completed tasks and continues.
+
+---
+
+**Step 9 — Run Tests**
+Condition: the autopilot state shows Step 8 (Implement) is completed AND the autopilot state does NOT show Step 9 (Run Tests) as completed
+
+Action: Read and execute `.cursor/skills/test-run/SKILL.md`
+
+---
+
+**Step 10 — Security Audit (optional)**
+Condition: the autopilot state shows Step 9 (Run Tests) is completed AND the autopilot state does NOT show Step 10 (Security Audit) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Present using Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Run security audit before deploy?
+══════════════════════════════════════
+ A) Run security audit (recommended for production deployments)
+ B) Skip — proceed directly to deploy
+══════════════════════════════════════
+ Recommendation: A — catches vulnerabilities before production
+══════════════════════════════════════
+```
+
+- If user picks A → Read and execute `.cursor/skills/security/SKILL.md`. After completion, auto-chain to Step 11 (Performance Test).
+- If user picks B → Mark Step 10 as `skipped` in the state file, auto-chain to Step 11 (Performance Test).
+
+---
+
+**Step 11 — Performance Test (optional)**
+Condition: the autopilot state shows Step 10 (Security Audit) is completed or skipped AND the autopilot state does NOT show Step 11 (Performance Test) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Present using Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Run performance/load tests before deploy?
+══════════════════════════════════════
+ A) Run performance tests (recommended for latency-sensitive or high-load systems)
+ B) Skip — proceed directly to deploy
+══════════════════════════════════════
+ Recommendation: [A or B — base on whether acceptance criteria
+ include latency, throughput, or load requirements]
+══════════════════════════════════════
+```
+
+- If user picks A → Run performance tests:
+  1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
+  3. Present results vs acceptance criteria thresholds
+  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
+  5. After completion, auto-chain to Step 12 (Deploy)
+- If user picks B → Mark Step 11 as `skipped` in the state file, auto-chain to Step 12 (Deploy).
+
+---
+
+**Step 12 — Deploy**
+Condition: the autopilot state shows Step 9 (Run Tests) is completed AND (Step 10 is completed or skipped) AND (Step 11 is completed or skipped) AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Read and execute `.cursor/skills/deploy/SKILL.md`
+
+After deployment completes, the existing-code workflow is done.
+
+---
+
+**Re-Entry After Completion**
+Condition: the autopilot state shows `step: done` OR all steps through 12 (Deploy) are completed
+
+Action: The project completed a full cycle. Present status and loop back to New Task:
+
+```
+══════════════════════════════════════
+ PROJECT CYCLE COMPLETE
+══════════════════════════════════════
+ The previous cycle finished successfully.
+ You can now add new functionality.
+══════════════════════════════════════
+ A) Add new features (start New Task)
+ B) Done — no more changes needed
+══════════════════════════════════════
+```
+
+- If user picks A → set `step: 7`, `status: not_started` in the state file, then auto-chain to Step 7 (New Task). Previous cycle history stays in Completed Steps.
+- If user picks B → report final project status and exit.
+
+## Auto-Chain Rules
+
+| Completed Step | Next Action |
+|---------------|-------------|
+| Document (1) | Auto-chain → Test Spec (2) |
+| Test Spec (2) | Auto-chain → Decompose Tests (3) |
+| Decompose Tests (3) | **Session boundary** — suggest new conversation before Implement Tests |
+| Implement Tests (4) | Auto-chain → Run Tests (5) |
+| Run Tests (5, all pass) | Auto-chain → Refactor (6) |
+| Refactor (6) | Auto-chain → New Task (7) |
+| New Task (7) | **Session boundary** — suggest new conversation before Implement |
+| Implement (8) | Auto-chain → Run Tests (9) |
+| Run Tests (9, all pass) | Auto-chain → Security Audit choice (10) |
+| Security Audit (10, done or skipped) | Auto-chain → Performance Test choice (11) |
+| Performance Test (11, done or skipped) | Auto-chain → Deploy (12) |
+| Deploy (12) | **Workflow complete** — existing-code flow done |
+
+## Status Summary Template
+
+```
+═══════════════════════════════════════════════════
+ AUTOPILOT STATUS (existing-code)
+═══════════════════════════════════════════════════
+ Step 1   Document            [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2   Test Spec           [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 3   Decompose Tests     [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 4   Implement Tests     [DONE / IN PROGRESS (batch M) / NOT STARTED / FAILED (retry N/3)]
+ Step 5   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 6   Refactor            [DONE / IN PROGRESS (phase N) / NOT STARTED / FAILED (retry N/3)]
+ Step 7   New Task            [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 8   Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
+ Step 9   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 10  Security Audit      [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 11  Performance Test    [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 12  Deploy              [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+═══════════════════════════════════════════════════
+ Current: Step N — Name
+ SubStep: M — [sub-skill internal step name]
+ Retry:   [N/3 if retrying, omit if 0]
+ Action:  [what will happen next]
+═══════════════════════════════════════════════════
+```
diff --git a/.cursor/skills/autopilot/flows/greenfield.md b/.cursor/skills/autopilot/flows/greenfield.md
new file mode 100644
index 0000000..04bf16f
--- /dev/null
+++ b/.cursor/skills/autopilot/flows/greenfield.md
@@ -0,0 +1,235 @@
+# Greenfield Workflow
+
+Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → UI Design (if applicable) → Decompose → Implement → Run Tests → Security Audit (optional) → Performance Test (optional) → Deploy.
+
+## Step Reference Table
+
+| Step | Name | Sub-Skill | Internal SubSteps |
+|------|------|-----------|-------------------|
+| 1 | Problem | problem/SKILL.md | Phase 1–4 |
+| 2 | Research | research/SKILL.md | Mode A: Phase 1–4 · Mode B: Step 0–8 |
+| 3 | Plan | plan/SKILL.md | Step 1–6 + Final |
+| 4 | UI Design | ui-design/SKILL.md | Phase 0–8 (conditional — UI projects only) |
+| 5 | Decompose | decompose/SKILL.md | Step 1–4 |
+| 6 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
+| 7 | Run Tests | test-run/SKILL.md | Steps 1–4 |
+| 8 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
+| 9 | Performance Test | (autopilot-managed) | Load/stress tests (optional) |
+| 10 | Deploy | deploy/SKILL.md | Step 1–7 |
+
+## Detection Rules
+
+Check rules in order — first match wins.
+
+---
+
+**Step 1 — Problem Gathering**
+Condition: `_docs/00_problem/` does not exist, OR any of these are missing/empty:
+- `problem.md`
+- `restrictions.md`
+- `acceptance_criteria.md`
+- `input_data/` (must contain at least one file)
+
+Action: Read and execute `.cursor/skills/problem/SKILL.md`
+
+---
+
+**Step 2 — Research (Initial)**
+Condition: `_docs/00_problem/` is complete AND `_docs/01_solution/` has no `solution_draft*.md` files
+
+Action: Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode A)
+
+---
+
+**Research Decision** (inline gate between Step 2 and Step 3)
+Condition: `_docs/01_solution/` contains `solution_draft*.md` files AND `_docs/01_solution/solution.md` does not exist AND `_docs/02_document/architecture.md` does not exist
+
+Action: Present the current research state to the user:
+- How many solution drafts exist
+- Whether tech_stack.md and security_analysis.md exist
+- One-line summary from the latest draft
+
+Then present using the **Choose format**:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Research complete — next action?
+══════════════════════════════════════
+ A) Run another research round (Mode B assessment)
+ B) Proceed to planning with current draft
+══════════════════════════════════════
+ Recommendation: [A or B] — [reason based on draft quality]
+══════════════════════════════════════
+```
+
+- If user picks A → Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode B)
+- If user picks B → auto-chain to Step 3 (Plan)
+
+---
+
+**Step 3 — Plan**
+Condition: `_docs/01_solution/` has `solution_draft*.md` files AND `_docs/02_document/architecture.md` does not exist
+
+Action:
+1. The plan skill's Prereq 2 will rename the latest draft to `solution.md` — this is handled by the plan skill itself
+2. Read and execute `.cursor/skills/plan/SKILL.md`
+
+If `_docs/02_document/` exists but is incomplete (has some artifacts but no `FINAL_report.md`), the plan skill's built-in resumability handles it.
+
+---
+
+**Step 4 — UI Design (conditional)**
+Condition: `_docs/02_document/architecture.md` exists AND the autopilot state does NOT show Step 4 (UI Design) as completed or skipped AND the project is a UI project
+
+**UI Project Detection** — the project is a UI project if ANY of the following are true:
+- `package.json` exists in the workspace root or any subdirectory
+- `*.html`, `*.jsx`, `*.tsx` files exist in the workspace
+- `_docs/02_document/components/` contains a component whose `description.md` mentions UI, frontend, page, screen, dashboard, form, or view
+- `_docs/02_document/architecture.md` mentions frontend, UI layer, SPA, or client-side rendering
+- `_docs/01_solution/solution.md` mentions frontend, web interface, or user-facing UI
+
+If the project is NOT a UI project → mark Step 4 as `skipped` in the state file and auto-chain to Step 5.
+
+If the project IS a UI project → present using Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: UI project detected — generate mockups?
+══════════════════════════════════════
+ A) Generate UI mockups before decomposition (recommended)
+ B) Skip — proceed directly to decompose
+══════════════════════════════════════
+ Recommendation: A — mockups before decomposition
+ produce better task specs for frontend components
+══════════════════════════════════════
+```
+
+- If user picks A → Read and execute `.cursor/skills/ui-design/SKILL.md`. After completion, auto-chain to Step 5 (Decompose).
+- If user picks B → Mark Step 4 as `skipped` in the state file, auto-chain to Step 5 (Decompose).
+
+---
+
+**Step 5 — Decompose**
+Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_tasks/` does not exist or has no task files (excluding `_dependencies_table.md`)
+
+Action: Read and execute `.cursor/skills/decompose/SKILL.md`
+
+If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
+
+---
+
+**Step 6 — Implement**
+Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
+
+Action: Read and execute `.cursor/skills/implement/SKILL.md`
+
+If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
+
+---
+
+**Step 7 — Run Tests**
+Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state does NOT show Step 7 (Run Tests) as completed AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Read and execute `.cursor/skills/test-run/SKILL.md`
+
+---
+
+**Step 8 — Security Audit (optional)**
+Condition: the autopilot state shows Step 7 (Run Tests) is completed AND the autopilot state does NOT show Step 8 (Security Audit) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Present using Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Run security audit before deploy?
+══════════════════════════════════════
+ A) Run security audit (recommended for production deployments)
+ B) Skip — proceed directly to deploy
+══════════════════════════════════════
+ Recommendation: A — catches vulnerabilities before production
+══════════════════════════════════════
+```
+
+- If user picks A → Read and execute `.cursor/skills/security/SKILL.md`. After completion, auto-chain to Step 9 (Performance Test).
+- If user picks B → Mark Step 8 as `skipped` in the state file, auto-chain to Step 9 (Performance Test).
+
+---
+
+**Step 9 — Performance Test (optional)**
+Condition: the autopilot state shows Step 8 (Security Audit) is completed or skipped AND the autopilot state does NOT show Step 9 (Performance Test) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Present using Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Run performance/load tests before deploy?
+══════════════════════════════════════
+ A) Run performance tests (recommended for latency-sensitive or high-load systems)
+ B) Skip — proceed directly to deploy
+══════════════════════════════════════
+ Recommendation: [A or B — base on whether acceptance criteria
+ include latency, throughput, or load requirements]
+══════════════════════════════════════
+```
+
+- If user picks A → Run performance tests:
+  1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
+  3. Present results vs acceptance criteria thresholds
+  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
+  5. After completion, auto-chain to Step 10 (Deploy)
+- If user picks B → Mark Step 9 as `skipped` in the state file, auto-chain to Step 10 (Deploy).
+
+---
+
+**Step 10 — Deploy**
+Condition: the autopilot state shows Step 7 (Run Tests) is completed AND (Step 8 is completed or skipped) AND (Step 9 is completed or skipped) AND (`_docs/04_deploy/` does not exist or is incomplete)
+
+Action: Read and execute `.cursor/skills/deploy/SKILL.md`
+
+---
+
+**Done**
+Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md)
+
+Action: Report project completion with summary. If the user runs autopilot again after greenfield completion, Flow Resolution rule 3 routes to the existing-code flow (re-entry after completion) so they can add new features.
+
+## Auto-Chain Rules
+
+| Completed Step | Next Action |
+|---------------|-------------|
+| Problem (1) | Auto-chain → Research (2) |
+| Research (2) | Auto-chain → Research Decision (ask user: another round or proceed?) |
+| Research Decision → proceed | Auto-chain → Plan (3) |
+| Plan (3) | Auto-chain → UI Design detection (4) |
+| UI Design (4, done or skipped) | Auto-chain → Decompose (5) |
+| Decompose (5) | **Session boundary** — suggest new conversation before Implement |
+| Implement (6) | Auto-chain → Run Tests (7) |
+| Run Tests (7, all pass) | Auto-chain → Security Audit choice (8) |
+| Security Audit (8, done or skipped) | Auto-chain → Performance Test choice (9) |
+| Performance Test (9, done or skipped) | Auto-chain → Deploy (10) |
+| Deploy (10) | Report completion |
+
+## Status Summary Template
+
+```
+═══════════════════════════════════════════════════
+ AUTOPILOT STATUS (greenfield)
+═══════════════════════════════════════════════════
+ Step 1   Problem             [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2   Research            [DONE (N drafts) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 3   Plan                [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 4   UI Design           [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 5   Decompose           [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 6   Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
+ Step 7   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 8   Security Audit      [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 9   Performance Test    [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 10  Deploy              [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+═══════════════════════════════════════════════════
+ Current: Step N — Name
+ SubStep: M — [sub-skill internal step name]
+ Retry:   [N/3 if retrying, omit if 0]
+ Action:  [what will happen next]
+═══════════════════════════════════════════════════
+```
diff --git a/.cursor/skills/autopilot/protocols.md b/.cursor/skills/autopilot/protocols.md
new file mode 100644
index 0000000..406bf72
--- /dev/null
+++ b/.cursor/skills/autopilot/protocols.md
@@ -0,0 +1,314 @@
+# Autopilot Protocols
+
+## User Interaction Protocol
+
+Every time the autopilot or a sub-skill needs a user decision, use the **Choose A / B / C / D** format. This applies to:
+
+- State transitions where multiple valid next actions exist
+- Sub-skill BLOCKING gates that require user judgment
+- Any fork where the autopilot cannot confidently pick the right path
+- Trade-off decisions (tech choices, scope, risk acceptance)
+
+### When to Ask (MUST ask)
+
+- The next action is ambiguous (e.g., "another research round or proceed?")
+- The decision has irreversible consequences (e.g., architecture choices, skipping a step)
+- The user's intent or preference cannot be inferred from existing artifacts
+- A sub-skill's BLOCKING gate explicitly requires user confirmation
+- Multiple valid approaches exist with meaningfully different trade-offs
+
+### When NOT to Ask (auto-transition)
+
+- Only one logical next step exists (e.g., Problem complete → Research is the only option)
+- The transition is deterministic from the state (e.g., Plan complete → Decompose)
+- The decision is low-risk and reversible
+- Existing artifacts or prior decisions already imply the answer
+
+### Choice Format
+
+Always present decisions in this format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: [brief context]
+══════════════════════════════════════
+ A) [Option A — short description]
+ B) [Option B — short description]
+ C) [Option C — short description, if applicable]
+ D) [Option D — short description, if applicable]
+══════════════════════════════════════
+ Recommendation: [A/B/C/D] — [one-line reason]
+══════════════════════════════════════
+```
+
+Rules:
+1. Always provide 2–4 concrete options (never open-ended questions)
+2. Always include a recommendation with a brief justification
+3. Keep option descriptions to one line each
+4. If only 2 options make sense, use A/B only — do not pad with filler options
+5. Play the notification sound (per `human-attention-sound.mdc`) before presenting the choice
+6. Record every user decision in the state file's `Key Decisions` section
+7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
+
+## Work Item Tracker Authentication
+
+Several workflow steps create work items (epics, tasks, links). The system supports **Jira MCP** and **Azure DevOps MCP** as interchangeable backends. Detect which is configured by listing available MCP servers.
+
+### Tracker Detection
+
+1. Check for available MCP servers: Jira MCP (`user-Jira-MCP-Server`) or Azure DevOps MCP (`user-AzureDevops`)
+2. If both are available, ask the user which to use (Choose format)
+3. Record the choice in the state file: `tracker: jira` or `tracker: ado`
+4. If neither is available, set `tracker: local` and proceed without external tracking
+
+### Steps That Require Work Item Tracker
+
+| Flow | Step | Sub-Step | Tracker Action |
+|------|------|----------|----------------|
+| greenfield | 3 (Plan) | Step 6 — Epics | Create epics for each component |
+| greenfield | 5 (Decompose) | Step 1–3 — All tasks | Create ticket per task, link to epic |
+| existing-code | 3 (Decompose Tests) | Step 1t + Step 3 — All test tasks | Create ticket per task, link to epic |
+| existing-code | 7 (New Task) | Step 7 — Ticket | Create ticket per task, link to epic |
+
+### Authentication Gate
+
+Before entering a step that requires work item tracking (see table above) for the first time, the autopilot must:
+
+1. Call `mcp_auth` on the detected tracker's MCP server
+2. If authentication succeeds → proceed normally
+3. If the user **skips** or authentication fails → present using Choose format:
+
+```
+══════════════════════════════════════
+ Tracker authentication failed
+══════════════════════════════════════
+ A) Retry authentication (retry mcp_auth)
+ B) Continue without tracker (tasks saved locally only)
+══════════════════════════════════════
+ Recommendation: A — Tracker IDs drive task referencing,
+ dependency tracking, and implementation batching.
+ Without tracker, task files use numeric prefixes instead.
+══════════════════════════════════════
+```
+
+If user picks **B** (continue without tracker):
+- Set a flag in the state file: `tracker: local`
+- All skills that would create tickets instead save metadata locally in the task/epic files with `Tracker: pending` status
+- Task files keep numeric prefixes (e.g., `01_initial_structure.md`) instead of tracker ID prefixes
+- The workflow proceeds normally in all other respects
+
+### Re-Authentication
+
+If the tracker MCP was already authenticated in a previous invocation (verify by listing available tools beyond `mcp_auth`), skip the auth gate.
+
+## Error Handling
+
+All error situations that require user input MUST use the **Choose A / B / C / D** format.
+
+| Situation | Action |
+|-----------|--------|
+| State detection is ambiguous (artifacts suggest two different steps) | Present findings and use Choose format with the candidate steps as options |
+| Sub-skill fails or hits an unrecoverable blocker | Use Choose format: A) retry, B) skip with warning, C) abort and fix manually |
+| User wants to skip a step | Use Choose format: A) skip (with dependency warning), B) execute the step |
+| User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
+| User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |
+
+## Skill Failure Retry Protocol
+
+Sub-skills can return a **failed** result. Failures are often caused by missing user input, environment issues, or transient errors that resolve on retry. The autopilot auto-retries before escalating.
+
+### Retry Flow
+
+```
+Skill execution → FAILED
+  │
+  ├─ retry_count < 3 ?
+  │    YES → increment retry_count in state file
+  │         → log failure reason in state file (Retry Log section)
+  │         → re-read the sub-skill's SKILL.md
+  │         → re-execute from the current sub_step
+  │         → (loop back to check result)
+  │
+  │    NO (retry_count = 3) →
+  │         → set status: failed in Current Step
+  │         → add entry to Blockers section:
+  │             "[Skill Name] failed 3 consecutive times at sub_step [M].
+  │              Last failure: [reason]. Auto-retry exhausted."
+  │         → present warning to user (see Escalation below)
+  │         → do NOT auto-retry again until user intervenes
+```
+
+### Retry Rules
+
+1. **Auto-retry immediately**: when a skill fails, retry it without asking the user — the failure is often transient (missing user confirmation in a prior step, docker not running, file lock, etc.)
+2. **Preserve sub_step**: retry from the last recorded `sub_step`, not from the beginning of the skill — unless the failure indicates corruption, in which case restart from sub_step 1
+3. **Increment `retry_count`**: update `retry_count` in the state file's `Current Step` section on each retry attempt
+4. **Log each failure**: append the failure reason and timestamp to the state file's `Retry Log` section
+5. **Reset on success**: when the skill eventually succeeds, reset `retry_count: 0` and clear the `Retry Log` for that step
+
+### Escalation (after 3 consecutive failures)
+
+After 3 failed auto-retries of the same skill, the failure is likely not user-related. Stop retrying and escalate:
+
+1. Update the state file:
+   - Set `status: failed` in `Current Step`
+   - Set `retry_count: 3`
+   - Add a blocker entry describing the repeated failure
+2. Play notification sound (per `human-attention-sound.mdc`)
+3. Present using Choose format:
+
+```
+══════════════════════════════════════
+ SKILL FAILED: [Skill Name] — 3 consecutive failures
+══════════════════════════════════════
+ Step: [N] — [Name]
+ SubStep: [M] — [sub-step name]
+ Last failure reason: [reason]
+══════════════════════════════════════
+ A) Retry with fresh context (new conversation)
+ B) Skip this step with warning
+ C) Abort — investigate and fix manually
+══════════════════════════════════════
+ Recommendation: A — fresh context often resolves
+ persistent failures
+══════════════════════════════════════
+```
+
+### Re-Entry After Failure
+
+On the next autopilot invocation (new conversation), if the state file shows `status: failed` and `retry_count: 3`:
+
+- Present the blocker to the user before attempting execution
+- If the user chooses to retry → reset `retry_count: 0`, set `status: in_progress`, and re-execute
+- If the user chooses to skip → mark step as `skipped`, proceed to next step
+- Do NOT silently auto-retry — the user must acknowledge the persistent failure first
+
+## Error Recovery Protocol
+
+### Stuck Detection
+
+When executing a sub-skill, monitor for these signals:
+
+- Same artifact overwritten 3+ times without meaningful change
+- Sub-skill repeatedly asks the same question after receiving an answer
+- No new artifacts saved for an extended period despite active execution
+
+### Recovery Actions (ordered)
+
+1. **Re-read state**: read `_docs/_autopilot_state.md` and cross-check against `_docs/` folders
+2. **Retry current sub-step**: re-read the sub-skill's SKILL.md and restart from the current sub-step
+3. **Escalate**: after 2 failed retries, present diagnostic summary to user using Choose format:
+
+```
+══════════════════════════════════════
+ RECOVERY: [skill name] stuck at [sub-step]
+══════════════════════════════════════
+ A) Retry with fresh context (new conversation)
+ B) Skip this sub-step with warning
+ C) Abort and fix manually
+══════════════════════════════════════
+ Recommendation: A — fresh context often resolves stuck loops
+══════════════════════════════════════
+```
+
+### Circuit Breaker
+
+If the same autopilot step fails 3 consecutive times across conversations:
+
+- Record the failure pattern in the state file's `Blockers` section
+- Do NOT auto-retry on next invocation
+- Present the blocker and ask user for guidance before attempting again
+
+## Context Management Protocol
+
+### Principle
+
+Disk is memory. Never rely on in-context accumulation — read from `_docs/` artifacts, not from conversation history.
+
+### Minimal Re-Read Set Per Skill
+
+When re-entering a skill (new conversation or context refresh):
+
+- Always read: `_docs/_autopilot_state.md`
+- Always read: the active skill's `SKILL.md`
+- Conditionally read: only the `_docs/` artifacts the current sub-step requires (listed in each skill's Context Resolution section)
+- Never bulk-read: do not load all `_docs/` files at once
+
+### Mid-Skill Interruption
+
+If context is filling up during a long skill (e.g., document, implement):
+
+1. Save current sub-step progress to the skill's artifact directory
+2. Update `_docs/_autopilot_state.md` with exact sub-step position
+3. Suggest a new conversation: "Context is getting long — recommend continuing in a fresh conversation for better results"
+4. On re-entry, the skill's resumability protocol picks up from the saved sub-step
+
+### Large Artifact Handling
+
+When a skill needs to read large files (e.g., full solution.md, architecture.md):
+
+- Read only the sections relevant to the current sub-step
+- Use search tools (Grep, SemanticSearch) to find specific sections rather than reading entire files
+- Summarize key decisions from prior steps in the state file so they don't need to be re-read
+
+### Context Budget Heuristic
+
+Agents cannot programmatically query context window usage. Use these heuristics to avoid degradation:
+
+| Zone | Indicators | Action |
+|------|-----------|--------|
+| **Safe** | State file + SKILL.md + 2–3 focused artifacts loaded | Continue normally |
+| **Caution** | 5+ artifacts loaded, or 3+ large files (architecture, solution, discovery), or conversation has 20+ tool calls | Complete current sub-step, then suggest session break |
+| **Danger** | Repeated truncation in tool output, tool calls failing unexpectedly, responses becoming shallow or repetitive | Save immediately, update state file, force session boundary |
+
+**Skill-specific guidelines**:
+
+| Skill | Recommended session breaks |
+|-------|---------------------------|
+| **document** | After every ~5 modules in Step 1; between Step 4 (Verification) and Step 5 (Solution Extraction) |
+| **implement** | Each batch is a natural checkpoint; if more than 2 batches completed in one session, suggest break |
+| **plan** | Between Step 5 (Test Specifications) and Step 6 (Epics) for projects with many components |
+| **research** | Between Mode A rounds; between Mode A and Mode B |
+
+**How to detect caution/danger zone without API**:
+
+1. Count tool calls made so far — if approaching 20+, context is likely filling up
+2. If reading a file returns truncated content, context is under pressure
+3. If the agent starts producing shorter or less detailed responses than earlier in the conversation, context quality is degrading
+4. When in doubt, save and suggest a new conversation — re-entry is cheap thanks to the state file
+
+## Rollback Protocol
+
+### Implementation Steps (git-based)
+
+Handled by `/implement` skill — each batch commit is a rollback checkpoint via `git revert`.
+
+### Planning/Documentation Steps (artifact-based)
+
+For steps that produce `_docs/` artifacts (problem, research, plan, decompose, document):
+
+1. **Before overwriting**: if re-running a step that already has artifacts, the sub-skill's prerequisite check asks the user (resume/overwrite/skip)
+2. **Rollback to previous step**: use Choose format:
+
+```
+══════════════════════════════════════
+ ROLLBACK: Re-run [step name]?
+══════════════════════════════════════
+ A) Re-run the step (overwrites current artifacts)
+ B) Stay on current step
+══════════════════════════════════════
+ Warning: This will overwrite files in _docs/[folder]/
+══════════════════════════════════════
+```
+
+3. **Git safety net**: artifacts are committed with each autopilot step completion. To roll back: `git log --oneline _docs/` to find the commit, then `git checkout <commit> -- _docs/<folder>/`
+4. **State file rollback**: when rolling back artifacts, also update `_docs/_autopilot_state.md` to reflect the rolled-back step (set it to `in_progress`, clear completed date)
+
+## Status Summary
+
+On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback). Use the Status Summary Template from the active flow file (`flows/greenfield.md` or `flows/existing-code.md`).
+
+For re-entry (state file exists), also include:
+- Key decisions from the state file's `Key Decisions` section
+- Last session context from the `Last Session` section
+- Any blockers from the `Blockers` section
diff --git a/.cursor/skills/autopilot/state.md b/.cursor/skills/autopilot/state.md
new file mode 100644
index 0000000..57e6444
--- /dev/null
+++ b/.cursor/skills/autopilot/state.md
@@ -0,0 +1,122 @@
+# Autopilot State Management
+
+## State File: `_docs/_autopilot_state.md`
+
+The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
+
+### Format
+
+```markdown
+# Autopilot State
+
+## Current Step
+flow: [greenfield | existing-code]
+step: [1-10 for greenfield, 1-12 for existing-code, or "done"]
+name: [step name from the active flow's Step Reference Table]
+status: [not_started / in_progress / completed / skipped / failed]
+sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
+retry_count: [0-3 — number of consecutive auto-retry attempts for current step, reset to 0 on success]
+
+When updating `Current Step`, always write it as:
+  flow: existing-code   ← active flow
+  step: N               ← autopilot step (sequential integer)
+  sub_step: M           ← sub-skill's own internal step/phase number + name
+  retry_count: 0        ← reset on new step or success; increment on each failed retry
+Example:
+  flow: greenfield
+  step: 3
+  name: Plan
+  status: in_progress
+  sub_step: 4 — Architecture Review & Risk Assessment
+  retry_count: 0
+Example (failed after 3 retries):
+  flow: existing-code
+  step: 2
+  name: Test Spec
+  status: failed
+  sub_step: 1b — Test Case Generation
+  retry_count: 3
+
+## Completed Steps
+
+| Step | Name | Completed | Key Outcome |
+|------|------|-----------|-------------|
+| 1 | [name] | [date] | [one-line summary] |
+| 2 | [name] | [date] | [one-line summary] |
+| ... | ... | ... | ... |
+
+## Key Decisions
+- [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
+- [decision N]
+
+## Last Session
+date: [date]
+ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
+reason: [completed step / session boundary / user paused / context limit]
+notes: [any context for next session]
+
+## Retry Log
+| Attempt | Step | Name | SubStep | Failure Reason | Timestamp |
+|---------|------|------|---------|----------------|-----------|
+| 1 | [step] | [name] | [sub_step] | [reason] | [date-time] |
+| ... | ... | ... | ... | ... | ... |
+
+(Clear this table when the step succeeds or user resets. Append a row on each failed auto-retry.)
+
+## Blockers
+- [blocker 1, if any]
+- [none]
+```
+
+### State File Rules
+
+1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 1)
+2. **Update** the state file after every step completion, every session boundary, every BLOCKING gate confirmation, and every failed retry attempt
+3. **Read** the state file as the first action on every invocation — before folder scanning
+4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 3 but `_docs/02_document/architecture.md` already exists), trust the folder structure and update the state file to match
+5. **Never delete** the state file. It accumulates history across the entire project lifecycle
+6. **Retry tracking**: increment `retry_count` on each failed auto-retry; reset to `0` when the step succeeds or the user manually resets. If `retry_count` reaches 3, set `status: failed` and add an entry to `Blockers`
+7. **Failed state on re-entry**: if the state file shows `status: failed` with `retry_count: 3`, do NOT auto-retry — present the blocker to the user and wait for their decision before proceeding
+
+## State Detection
+
+Read `_docs/_autopilot_state.md` first. If it exists and is consistent with the folder structure, use the `Current Step` from the state file. If the state file doesn't exist or is inconsistent, fall back to folder scanning.
+
+### Folder Scan Rules (fallback)
+
+Scan `_docs/` to determine the current workflow position. The detection rules are defined in each flow file (`flows/greenfield.md` and `flows/existing-code.md`). Check the existing-code flow first (Step 1 detection), then greenfield flow rules. First match wins.
+
+## Re-Entry Protocol
+
+When the user invokes `/autopilot` and work already exists:
+
+1. Read `_docs/_autopilot_state.md`
+2. Cross-check against `_docs/` folder structure
+3. Present Status Summary with context from state file (key decisions, last session, blockers)
+4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
+5. Continue execution from detected state
+
+## Session Boundaries
+
+After any decompose/planning step completes, **do not auto-chain to implement**. Instead:
+
+1. Update state file: mark the step as completed, set current step to the next implement step with status `not_started`
+   - Existing-code flow: After Step 3 (Decompose Tests) → set current step to 4 (Implement Tests)
+   - Existing-code flow: After Step 7 (New Task) → set current step to 8 (Implement)
+   - Greenfield flow: After Step 5 (Decompose) → set current step to 6 (Implement)
+2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
+3. Present a summary: number of tasks, estimated batches, total complexity points
+4. Use Choose format:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Decompose complete — start implementation?
+══════════════════════════════════════
+ A) Start a new conversation for implementation (recommended for context freshness)
+ B) Continue implementation in this conversation
+══════════════════════════════════════
+ Recommendation: A — implementation is the longest phase, fresh context helps
+══════════════════════════════════════
+```
+
+These are the only hard session boundaries. All other transitions auto-chain.
diff --git a/.cursor/skills/code-review/SKILL.md b/.cursor/skills/code-review/SKILL.md
index 1c5bd4f..041013a 100644
--- a/.cursor/skills/code-review/SKILL.md
+++ b/.cursor/skills/code-review/SKILL.md
@@ -46,7 +46,7 @@ For each task, verify implementation satisfies every acceptance criterion:
 
 - Walk through each AC (Given/When/Then) and trace it in the code
 - Check that unit tests cover each AC
-- Check that integration tests exist where specified in the task spec
+- Check that blackbox tests exist where specified in the task spec
 - Flag any AC that is not demonstrably satisfied as a **Spec-Gap** finding (severity: High)
 - Flag any scope creep (implementation beyond what the spec asked for) as a **Scope** finding (severity: Low)
 
@@ -152,3 +152,42 @@ The `/implement` skill invokes this skill after each batch completes:
 2. Passes task spec paths + changed files to this skill
 3. If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms
 4. If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info)
+
+## Integration Contract
+
+### Inputs (provided by the implement skill)
+
+| Input | Type | Source | Required |
+|-------|------|--------|----------|
+| `task_specs` | list of file paths | Task `.md` files from `_docs/02_tasks/` for the current batch | Yes |
+| `changed_files` | list of file paths | Files modified by implementer agents (from `git diff` or agent reports) | Yes |
+| `batch_number` | integer | Current batch number (for report naming) | Yes |
+| `project_restrictions` | file path | `_docs/00_problem/restrictions.md` | If exists |
+| `solution_overview` | file path | `_docs/01_solution/solution.md` | If exists |
+
+### Invocation Pattern
+
+The implement skill invokes code-review by:
+
+1. Reading `.cursor/skills/code-review/SKILL.md`
+2. Providing the inputs above as context (read the files, pass content to the review phases)
+3. Executing all 6 phases sequentially
+4. Consuming the verdict from the output
+
+### Outputs (returned to the implement skill)
+
+| Output | Type | Description |
+|--------|------|-------------|
+| `verdict` | `PASS` / `PASS_WITH_WARNINGS` / `FAIL` | Drives the implement skill's auto-fix gate |
+| `findings` | structured list | Each finding has: severity, category, file:line, title, description, suggestion, task reference |
+| `critical_count` | integer | Number of Critical findings |
+| `high_count` | integer | Number of High findings |
+| `report_path` | file path | `_docs/03_implementation/reviews/batch_[NN]_review.md` |
+
+### Report Persistence
+
+Save the review report to `_docs/03_implementation/reviews/batch_[NN]_review.md` (create the `reviews/` directory if it does not exist). The report uses the Output Format defined above.
+
+The implement skill uses `verdict` to decide:
+- `PASS` / `PASS_WITH_WARNINGS` → proceed to commit
+- `FAIL` → enter auto-fix loop (up to 2 attempts), then escalate to user
diff --git a/.cursor/skills/decompose/SKILL.md b/.cursor/skills/decompose/SKILL.md
index 8fac9a3..ac1cb2c 100644
--- a/.cursor/skills/decompose/SKILL.md
+++ b/.cursor/skills/decompose/SKILL.md
@@ -2,12 +2,13 @@
 name: decompose
 description: |
   Decompose planned components into atomic implementable tasks with bootstrap structure plan.
-  4-step workflow: bootstrap structure plan, component task decomposition, integration test task decomposition, and cross-task verification.
-  Supports full decomposition (_docs/ structure) and single component mode.
+  4-step workflow: bootstrap structure plan, component task decomposition, blackbox test task decomposition, and cross-task verification.
+  Supports full decomposition (_docs/ structure), single component mode, and tests-only mode.
   Trigger phrases:
   - "decompose", "decompose features", "feature decomposition"
   - "task decomposition", "break down components"
   - "prepare for implementation"
+  - "decompose tests", "test decomposition"
 category: build
 tags: [decomposition, tasks, dependencies, jira, implementation-prep]
 disable-model-invocation: true
@@ -32,18 +33,26 @@ Decompose planned components into atomic, implementable task specs with a bootst
 Determine the operating mode based on invocation before any other logic runs.
 
 **Default** (no explicit input file provided):
-- PLANS_DIR: `_docs/02_plans/`
+- DOCUMENT_DIR: `_docs/02_document/`
 - TASKS_DIR: `_docs/02_tasks/`
-- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, PLANS_DIR
-- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (integration tests) + Step 4 (cross-verification)
+- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, DOCUMENT_DIR
+- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (blackbox tests) + Step 4 (cross-verification)
 
-**Single component mode** (provided file is within `_docs/02_plans/` and inside a `components/` subdirectory):
-- PLANS_DIR: `_docs/02_plans/`
+**Single component mode** (provided file is within `_docs/02_document/` and inside a `components/` subdirectory):
+- DOCUMENT_DIR: `_docs/02_document/`
 - TASKS_DIR: `_docs/02_tasks/`
 - Derive component number and component name from the file path
 - Ask user for the parent Epic ID
 - Runs Step 2 (that component only, appending to existing task numbering)
 
+**Tests-only mode** (provided file/directory is within `tests/`, or `DOCUMENT_DIR/tests/` exists and input explicitly requests test decomposition):
+- DOCUMENT_DIR: `_docs/02_document/`
+- TASKS_DIR: `_docs/02_tasks/`
+- TESTS_DIR: `DOCUMENT_DIR/tests/`
+- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR
+- Runs Step 1t (test infrastructure bootstrap) + Step 3 (blackbox test decomposition) + Step 4 (cross-verification against test coverage)
+- Skips Step 1 (project bootstrap) and Step 2 (component decomposition) — the codebase already exists
+
 Announce the detected mode and resolved paths to the user before proceeding.
 
 ## Input Specification
@@ -58,10 +67,10 @@ Announce the detected mode and resolved paths to the user before proceeding.
 | `_docs/00_problem/restrictions.md` | Constraints and limitations |
 | `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
 | `_docs/01_solution/solution.md` | Finalized solution |
-| `PLANS_DIR/architecture.md` | Architecture from plan skill |
-| `PLANS_DIR/system-flows.md` | System flows from plan skill |
-| `PLANS_DIR/components/[##]_[name]/description.md` | Component specs from plan skill |
-| `PLANS_DIR/integration_tests/` | Integration test specs from plan skill |
+| `DOCUMENT_DIR/architecture.md` | Architecture from plan skill |
+| `DOCUMENT_DIR/system-flows.md` | System flows from plan skill |
+| `DOCUMENT_DIR/components/[##]_[name]/description.md` | Component specs from plan skill |
+| `DOCUMENT_DIR/tests/` | Blackbox test specs from plan skill |
 
 **Single component mode:**
 
@@ -70,16 +79,38 @@ Announce the detected mode and resolved paths to the user before proceeding.
 | The provided component `description.md` | Component spec to decompose |
 | Corresponding `tests.md` in the same directory (if available) | Test specs for context |
 
+**Tests-only mode:**
+
+| File | Purpose |
+|------|---------|
+| `TESTS_DIR/environment.md` | Test environment specification (Docker services, networks, volumes) |
+| `TESTS_DIR/test-data.md` | Test data management (seed data, mocks, isolation) |
+| `TESTS_DIR/blackbox-tests.md` | Blackbox functional scenarios (positive + negative) |
+| `TESTS_DIR/performance-tests.md` | Performance test scenarios |
+| `TESTS_DIR/resilience-tests.md` | Resilience test scenarios |
+| `TESTS_DIR/security-tests.md` | Security test scenarios |
+| `TESTS_DIR/resource-limit-tests.md` | Resource limit test scenarios |
+| `TESTS_DIR/traceability-matrix.md` | AC/restriction coverage mapping |
+| `_docs/00_problem/problem.md` | Problem context |
+| `_docs/00_problem/restrictions.md` | Constraints for test design |
+| `_docs/00_problem/acceptance_criteria.md` | Acceptance criteria being verified |
+
 ### Prerequisite Checks (BLOCKING)
 
 **Default:**
-1. PLANS_DIR contains `architecture.md` and `components/` — **STOP if missing**
+1. DOCUMENT_DIR contains `architecture.md` and `components/` — **STOP if missing**
 2. Create TASKS_DIR if it does not exist
 3. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?**
 
 **Single component mode:**
 1. The provided component file exists and is non-empty — **STOP if missing**
 
+**Tests-only mode:**
+1. `TESTS_DIR/blackbox-tests.md` exists and is non-empty — **STOP if missing**
+2. `TESTS_DIR/environment.md` exists — **STOP if missing**
+3. Create TASKS_DIR if it does not exist
+4. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?**
+
 ## Artifact Management
 
 ### Directory Structure
@@ -100,8 +131,9 @@ TASKS_DIR/
 | Step | Save immediately after | Filename |
 |------|------------------------|----------|
 | Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` |
+| Step 1t | Test infrastructure bootstrap complete + Jira ticket created + file renamed | `[JIRA-ID]_test_infrastructure.md` |
 | Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
-| Step 3 | Each integration test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
+| Step 3 | Each blackbox test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
 | Step 4 | Cross-task verification complete | `_dependencies_table.md` |
 
 ### Resumability
@@ -118,13 +150,49 @@ At the start of execution, create a TodoWrite with all applicable steps. Update
 
 ## Workflow
 
+### Step 1t: Test Infrastructure Bootstrap (tests-only mode only)
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Produce `01_test_infrastructure.md` — the first task describing the test project scaffold
+**Constraints**: This is a plan document, not code. The `/implement` skill executes it.
+
+1. Read `TESTS_DIR/environment.md` and `TESTS_DIR/test-data.md`
+2. Read problem.md, restrictions.md, acceptance_criteria.md for domain context
+3. Document the test infrastructure plan using `templates/test-infrastructure-task.md`
+
+The test infrastructure bootstrap must include:
+- Test project folder layout (`e2e/` directory structure)
+- Mock/stub service definitions for each external dependency
+- `docker-compose.test.yml` structure from environment.md
+- Test runner configuration (framework, plugins, fixtures)
+- Test data fixture setup from test-data.md seed data sets
+- Test reporting configuration (format, output path)
+- Data isolation strategy
+
+**Self-verification**:
+- [ ] Every external dependency from environment.md has a mock service defined
+- [ ] Docker Compose structure covers all services from environment.md
+- [ ] Test data fixtures cover all seed data sets from test-data.md
+- [ ] Test runner configuration matches the consumer app tech stack from environment.md
+- [ ] Data isolation strategy is defined
+
+**Save action**: Write `01_test_infrastructure.md` (temporary numeric name)
+
+**Jira action**: Create a Jira ticket for this task under the "Blackbox Tests" epic. Write the Jira ticket ID and Epic ID back into the task header.
+
+**Rename action**: Rename the file from `01_test_infrastructure.md` to `[JIRA-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename.
+
+**BLOCKING**: Present test infrastructure plan summary to user. Do NOT proceed until user confirms.
+
+---
+
 ### Step 1: Bootstrap Structure Plan (default mode only)
 
 **Role**: Professional software architect
 **Goal**: Produce `01_initial_structure.md` — the first task describing the project skeleton
 **Constraints**: This is a plan document, not code. The `/implement` skill executes it.
 
-1. Read architecture.md, all component specs, system-flows.md, data_model.md, and `deployment/` from PLANS_DIR
+1. Read architecture.md, all component specs, system-flows.md, data_model.md, and `deployment/` from DOCUMENT_DIR
 2. Read problem, solution, and restrictions from `_docs/00_problem/` and `_docs/01_solution/`
 3. Research best implementation patterns for the identified tech stack
 4. Document the structure plan using `templates/initial-structure-task.md`
@@ -134,27 +202,27 @@ The bootstrap structure plan must include:
 - Shared models, interfaces, and DTOs
 - Dockerfile per component (multi-stage, non-root, health checks, pinned base images)
 - `docker-compose.yml` for local development (all components + database + dependencies)
-- `docker-compose.test.yml` for integration test environment (black-box test runner)
+- `docker-compose.test.yml` for blackbox test environment (blackbox test runner)
 - `.dockerignore`
 - CI/CD pipeline file (`.github/workflows/ci.yml` or `azure-pipelines.yml`) with stages from `deployment/ci_cd_pipeline.md`
 - Database migration setup and initial seed data scripts
 - Observability configuration: structured logging setup, health check endpoints (`/health/live`, `/health/ready`), metrics endpoint (`/metrics`)
 - Environment variable documentation (`.env.example`)
-- Test structure with unit and integration test locations
+- Test structure with unit and blackbox test locations
 
 **Self-verification**:
 - [ ] All components have corresponding folders in the layout
 - [ ] All inter-component interfaces have DTOs defined
 - [ ] Dockerfile defined for each component
 - [ ] `docker-compose.yml` covers all components and dependencies
-- [ ] `docker-compose.test.yml` enables black-box integration testing
+- [ ] `docker-compose.test.yml` enables blackbox testing
 - [ ] CI/CD pipeline file defined with lint, test, security, build, deploy stages
 - [ ] Database migration setup included
 - [ ] Health check endpoints specified for each service
 - [ ] Structured logging configuration included
 - [ ] `.env.example` with all required environment variables
 - [ ] Environment strategy covers dev, staging, production
-- [ ] Test structure includes unit and integration test locations
+- [ ] Test structure includes unit and blackbox test locations
 
 **Save action**: Write `01_initial_structure.md` (temporary numeric name)
 
@@ -166,7 +234,7 @@ The bootstrap structure plan must include:
 
 ---
 
-### Step 2: Task Decomposition (all modes)
+### Step 2: Task Decomposition (default and single component modes)
 
 **Role**: Professional software architect
 **Goal**: Decompose each component into atomic, implementable task specs — numbered sequentially starting from 02
@@ -200,52 +268,66 @@ For each component (or the single provided component):
 
 ---
 
-### Step 3: Integration Test Task Decomposition (default mode only)
+### Step 3: Blackbox Test Task Decomposition (default and tests-only modes)
 
 **Role**: Professional Quality Assurance Engineer
-**Goal**: Decompose integration test specs into atomic, implementable task specs
+**Goal**: Decompose blackbox test specs into atomic, implementable task specs
 **Constraints**: Behavioral specs only — describe what, not how. No test code.
 
-**Numbering**: Continue sequential numbering from where Step 2 left off.
+**Numbering**:
+- In default mode: continue sequential numbering from where Step 2 left off.
+- In tests-only mode: start from 02 (01 is the test infrastructure bootstrap from Step 1t).
 
-1. Read all test specs from `PLANS_DIR/integration_tests/` (functional_tests.md, non_functional_tests.md)
+1. Read all test specs from `DOCUMENT_DIR/tests/` (`blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, `resource-limit-tests.md`)
 2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
-3. Each task should reference the specific test scenarios it implements and the environment/test_data specs
-4. Dependencies: integration test tasks depend on the component implementation tasks they exercise
+3. Each task should reference the specific test scenarios it implements and the environment/test-data specs
+4. Dependencies:
+   - In default mode: blackbox test tasks depend on the component implementation tasks they exercise
+   - In tests-only mode: blackbox test tasks depend on the test infrastructure bootstrap task (Step 1t)
 5. Write each task spec using `templates/task.md`
 6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
 7. Note task dependencies (referencing Jira IDs of already-created dependency tasks)
-8. **Immediately after writing each task file**: create a Jira ticket under the "Integration Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
+8. **Immediately after writing each task file**: create a Jira ticket under the "Blackbox Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
 
 **Self-verification**:
-- [ ] Every functional test scenario from `integration_tests/functional_tests.md` is covered by a task
-- [ ] Every non-functional test scenario from `integration_tests/non_functional_tests.md` is covered by a task
+- [ ] Every scenario from `tests/blackbox-tests.md` is covered by a task
+- [ ] Every scenario from `tests/performance-tests.md`, `tests/resilience-tests.md`, `tests/security-tests.md`, and `tests/resource-limit-tests.md` is covered by a task
 - [ ] No task exceeds 5 complexity points
-- [ ] Dependencies correctly reference the component tasks being tested
-- [ ] Every task has a Jira ticket linked to the "Integration Tests" epic
+- [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode)
+- [ ] Every task has a Jira ticket linked to the "Blackbox Tests" epic
 
 **Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`.
 
 ---
 
-### Step 4: Cross-Task Verification (default mode only)
+### Step 4: Cross-Task Verification (default and tests-only modes)
 
 **Role**: Professional software architect and analyst
 **Goal**: Verify task consistency and produce `_dependencies_table.md`
 **Constraints**: Review step — fix gaps found, do not add new tasks
 
 1. Verify task dependencies across all tasks are consistent
-2. Check no gaps: every interface in architecture.md has tasks covering it
-3. Check no overlaps: tasks don't duplicate work across components
+2. Check no gaps:
+   - In default mode: every interface in architecture.md has tasks covering it
+   - In tests-only mode: every test scenario in `traceability-matrix.md` is covered by a task
+3. Check no overlaps: tasks don't duplicate work
 4. Check no circular dependencies in the task graph
 5. Produce `_dependencies_table.md` using `templates/dependencies-table.md`
 
 **Self-verification**:
+
+Default mode:
 - [ ] Every architecture interface is covered by at least one task
 - [ ] No circular dependencies in the task graph
 - [ ] Cross-component dependencies are explicitly noted in affected task specs
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
 
+Tests-only mode:
+- [ ] Every test scenario from traceability-matrix.md "Covered" entries has a corresponding task
+- [ ] No circular dependencies in the task graph
+- [ ] Test task dependencies reference the test infrastructure bootstrap
+- [ ] `_dependencies_table.md` contains every task with correct dependencies
+
 **Save action**: Write `_dependencies_table.md`
 
 **BLOCKING**: Present dependency summary to user. Do NOT proceed until user confirms.
@@ -270,7 +352,7 @@ For each component (or the single provided component):
 |-----------|--------|
 | Ambiguous component boundaries | ASK user |
 | Task complexity exceeds 5 points after splitting | ASK user |
-| Missing component specs in PLANS_DIR | ASK user |
+| Missing component specs in DOCUMENT_DIR | ASK user |
 | Cross-component dependency conflict | ASK user |
 | Jira epic not found for a component | ASK user for Epic ID |
 | Task naming | PROCEED, confirm at next BLOCKING gate |
@@ -279,15 +361,27 @@ For each component (or the single provided component):
 
 ```
 ┌────────────────────────────────────────────────────────────────┐
-│          Task Decomposition (4-Step Method)                     │
+│          Task Decomposition (Multi-Mode)                        │
 ├────────────────────────────────────────────────────────────────┤
-│ CONTEXT: Resolve mode (default / single component)             │
-│ 1. Bootstrap Structure  → [JIRA-ID]_initial_structure.md       │
-│    [BLOCKING: user confirms structure]                         │
-│ 2. Component Tasks      → [JIRA-ID]_[short_name].md each      │
-│ 3. Integration Tests    → [JIRA-ID]_[short_name].md each      │
-│ 4. Cross-Verification   → _dependencies_table.md              │
-│    [BLOCKING: user confirms dependencies]                      │
+│ CONTEXT: Resolve mode (default / single component / tests-only)│
+│                                                                │
+│ DEFAULT MODE:                                                   │
+│  1.  Bootstrap Structure  → [JIRA-ID]_initial_structure.md     │
+│      [BLOCKING: user confirms structure]                       │
+│  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
+│  3.  Blackbox Tests       → [JIRA-ID]_[short_name].md each    │
+│  4.  Cross-Verification   → _dependencies_table.md            │
+│      [BLOCKING: user confirms dependencies]                    │
+│                                                                │
+│ TESTS-ONLY MODE:                                                │
+│  1t. Test Infrastructure  → [JIRA-ID]_test_infrastructure.md   │
+│      [BLOCKING: user confirms test scaffold]                   │
+│  3.  Blackbox Tests       → [JIRA-ID]_[short_name].md each    │
+│  4.  Cross-Verification   → _dependencies_table.md            │
+│      [BLOCKING: user confirms dependencies]                    │
+│                                                                │
+│ SINGLE COMPONENT MODE:                                          │
+│  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
 ├────────────────────────────────────────────────────────────────┤
 │ Principles: Atomic tasks · Behavioral specs · Flat structure   │
 │   Jira inline · Rename to Jira ID · Save now · Ask don't assume│
diff --git a/.cursor/skills/decompose/templates/initial-structure-task.md b/.cursor/skills/decompose/templates/initial-structure-task.md
index 9642f65..371e5e0 100644
--- a/.cursor/skills/decompose/templates/initial-structure-task.md
+++ b/.cursor/skills/decompose/templates/initial-structure-task.md
@@ -49,7 +49,7 @@ project-root/
 | Build | Compile/bundle the application | Every push |
 | Lint / Static Analysis | Code quality and style checks | Every push |
 | Unit Tests | Run unit test suite | Every push |
-| Integration Tests | Run integration test suite | Every push |
+| Blackbox Tests | Run blackbox test suite | Every push |
 | Security Scan | SAST / dependency check | Every push |
 | Deploy to Staging | Deploy to staging environment | Merge to staging branch |
 
diff --git a/.cursor/skills/decompose/templates/task.md b/.cursor/skills/decompose/templates/task.md
index d8547a9..f36ea38 100644
--- a/.cursor/skills/decompose/templates/task.md
+++ b/.cursor/skills/decompose/templates/task.md
@@ -64,7 +64,7 @@ Then [expected result]
 |--------|-------------|-----------------|
 | AC-1 | [test subject] | [expected result] |
 
-## Integration Tests
+## Blackbox Tests
 
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
diff --git a/.cursor/skills/decompose/templates/test-infrastructure-task.md b/.cursor/skills/decompose/templates/test-infrastructure-task.md
new file mode 100644
index 0000000..a07cb42
--- /dev/null
+++ b/.cursor/skills/decompose/templates/test-infrastructure-task.md
@@ -0,0 +1,129 @@
+# Test Infrastructure Task Template
+
+Use this template for the test infrastructure bootstrap (Step 1t in tests-only mode). Save as `TASKS_DIR/01_test_infrastructure.md` initially, then rename to `TASKS_DIR/[JIRA-ID]_test_infrastructure.md` after Jira ticket creation.
+
+---
+
+```markdown
+# Test Infrastructure
+
+**Task**: [JIRA-ID]_test_infrastructure
+**Name**: Test Infrastructure
+**Description**: Scaffold the Blackbox test project — test runner, mock services, Docker test environment, test data fixtures, reporting
+**Complexity**: [3|5] points
+**Dependencies**: None
+**Component**: Blackbox Tests
+**Jira**: [TASK-ID]
+**Epic**: [EPIC-ID]
+
+## Test Project Folder Layout
+
+```
+e2e/
+├── conftest.py
+├── requirements.txt
+├── Dockerfile
+├── mocks/
+│   ├── [mock_service_1]/
+│   │   ├── Dockerfile
+│   │   └── [entrypoint file]
+│   └── [mock_service_2]/
+│       ├── Dockerfile
+│       └── [entrypoint file]
+├── fixtures/
+│   └── [test data files]
+├── tests/
+│   ├── test_[category_1].py
+│   ├── test_[category_2].py
+│   └── ...
+└── docker-compose.test.yml
+```
+
+### Layout Rationale
+
+[Brief explanation of directory structure choices — framework conventions, separation of mocks from tests, fixture management]
+
+## Mock Services
+
+| Mock Service | Replaces | Endpoints | Behavior |
+|-------------|----------|-----------|----------|
+| [name] | [external service] | [endpoints it serves] | [response behavior, configurable via control API] |
+
+### Mock Control API
+
+Each mock service exposes a `POST /mock/config` endpoint for test-time behavior control (e.g., simulate downtime, inject errors). A `GET /mock/[resource]` endpoint returns recorded interactions for assertion.
+
+## Docker Test Environment
+
+### docker-compose.test.yml Structure
+
+| Service | Image / Build | Purpose | Depends On |
+|---------|--------------|---------|------------|
+| [system-under-test] | [build context] | Main system being tested | [mock services] |
+| [mock-1] | [build context] | Mock for [external service] | — |
+| [e2e-consumer] | [build from e2e/] | Test runner | [system-under-test] |
+
+### Networks and Volumes
+
+[Isolated test network, volume mounts for test data, model files, results output]
+
+## Test Runner Configuration
+
+**Framework**: [e.g., pytest]
+**Plugins**: [e.g., pytest-csv, sseclient-py, requests]
+**Entry point**: [e.g., pytest --csv=/results/report.csv]
+
+### Fixture Strategy
+
+| Fixture | Scope | Purpose |
+|---------|-------|---------|
+| [name] | [session/module/function] | [what it provides] |
+
+## Test Data Fixtures
+
+| Data Set | Source | Format | Used By |
+|----------|--------|--------|---------|
+| [name] | [volume mount / generated / API seed] | [format] | [test categories] |
+
+### Data Isolation
+
+[Strategy: fresh containers per run, volume cleanup, mock state reset]
+
+## Test Reporting
+
+**Format**: [e.g., CSV]
+**Columns**: [e.g., Test ID, Test Name, Execution Time (ms), Result, Error Message]
+**Output path**: [e.g., /results/report.csv → mounted to host]
+
+## Acceptance Criteria
+
+**AC-1: Test environment starts**
+Given the docker-compose.test.yml
+When `docker compose -f docker-compose.test.yml up` is executed
+Then all services start and the system-under-test is reachable
+
+**AC-2: Mock services respond**
+Given the test environment is running
+When the e2e-consumer sends requests to mock services
+Then mock services respond with configured behavior
+
+**AC-3: Test runner executes**
+Given the test environment is running
+When the e2e-consumer starts
+Then the test runner discovers and executes test files
+
+**AC-4: Test report generated**
+Given tests have been executed
+When the test run completes
+Then a report file exists at the configured output path with correct columns
+```
+
+---
+
+## Guidance Notes
+
+- This is a PLAN document, not code. The `/implement` skill executes it.
+- Focus on test infrastructure decisions, not individual test implementations.
+- Reference environment.md and test-data.md from the test specs — don't repeat everything.
+- Mock services must be deterministic: same input always produces same output.
+- The Docker environment must be self-contained: `docker compose up` sufficient.
diff --git a/.cursor/skills/deploy/SKILL.md b/.cursor/skills/deploy/SKILL.md
index 8767761..d325667 100644
--- a/.cursor/skills/deploy/SKILL.md
+++ b/.cursor/skills/deploy/SKILL.md
@@ -20,7 +20,7 @@ Plan and document the full deployment lifecycle: check deployment status and env
 
 ## Core Principles
 
-- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker
+- **Docker-first**: every component runs in a container; local dev, blackbox tests, and production all use Docker
 - **Infrastructure as code**: all deployment configuration is version-controlled
 - **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
 - **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
@@ -32,12 +32,12 @@ Plan and document the full deployment lifecycle: check deployment status and env
 
 Fixed paths:
 
-- PLANS_DIR: `_docs/02_plans/`
+- DOCUMENT_DIR: `_docs/02_document/`
 - DEPLOY_DIR: `_docs/04_deploy/`
 - REPORTS_DIR: `_docs/04_deploy/reports/`
 - SCRIPTS_DIR: `scripts/`
-- ARCHITECTURE: `_docs/02_plans/architecture.md`
-- COMPONENTS_DIR: `_docs/02_plans/components/`
+- ARCHITECTURE: `_docs/02_document/architecture.md`
+- COMPONENTS_DIR: `_docs/02_document/components/`
 
 Announce the resolved paths to the user before proceeding.
 
@@ -45,18 +45,18 @@ Announce the resolved paths to the user before proceeding.
 
 ### Required Files
 
-| File | Purpose |
-|------|---------|
-| `_docs/00_problem/problem.md` | Problem description and context |
-| `_docs/00_problem/restrictions.md` | Constraints and limitations |
-| `_docs/01_solution/solution.md` | Finalized solution |
-| `PLANS_DIR/architecture.md` | Architecture from plan skill |
-| `PLANS_DIR/components/` | Component specs |
+| File | Purpose | Required |
+|------|---------|----------|
+| `_docs/00_problem/problem.md` | Problem description and context | Greenfield only |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations | Greenfield only |
+| `_docs/01_solution/solution.md` | Finalized solution | Greenfield only |
+| `DOCUMENT_DIR/architecture.md` | Architecture (from plan or document skill) | Always |
+| `DOCUMENT_DIR/components/` | Component specs | Always |
 
 ### Prerequisite Checks (BLOCKING)
 
 1. `architecture.md` exists — **STOP if missing**, run `/plan` first
-2. At least one component spec exists in `PLANS_DIR/components/` — **STOP if missing**
+2. At least one component spec exists in `DOCUMENT_DIR/components/` — **STOP if missing**
 3. Create DEPLOY_DIR, REPORTS_DIR, and SCRIPTS_DIR if they do not exist
 4. If DEPLOY_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
 
@@ -157,7 +157,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 ### Step 2: Containerization
 
 **Role**: DevOps / Platform engineer
-**Goal**: Define Docker configuration for every component, local development, and integration test environments
+**Goal**: Define Docker configuration for every component, local development, and blackbox test environments
 **Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
 
 1. Read architecture.md and all component specs
@@ -176,7 +176,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
    - Any message queues, caches, or external service mocks
    - Shared network
    - Environment variable files (`.env`)
-6. Define `docker-compose.test.yml` for integration tests:
+6. Define `docker-compose.test.yml` for blackbox tests:
    - Application components under test
    - Test runner container (black-box, no internal imports)
    - Isolated database with seed data
@@ -189,7 +189,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 - [ ] Non-root user for all containers
 - [ ] Health checks defined for every service
 - [ ] docker-compose.yml covers all components + dependencies
-- [ ] docker-compose.test.yml enables black-box integration testing
+- [ ] docker-compose.test.yml enables black-box testing
 - [ ] `.dockerignore` defined
 
 **Save action**: Write `containerization.md` using `templates/containerization.md`
@@ -212,7 +212,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 | Stage | Trigger | Steps | Quality Gate |
 |-------|---------|-------|-------------|
 | **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
-| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage |
+| **Test** | Every push | Unit tests, blackbox tests, coverage report | 75%+ coverage (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds) |
 | **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
 | **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
 | **Push** | After build | Push to container registry | Push succeeds |
@@ -458,7 +458,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 
 - **Implementing during planning**: Steps 1–6 produce documents, not code (Step 7 is the exception — it creates scripts)
 - **Hardcoding secrets**: never include real credentials in deployment documents or scripts
-- **Ignoring integration test containerization**: the test environment must be containerized alongside the app
+- **Ignoring blackbox test containerization**: the test environment must be containerized alongside the app
 - **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
 - **Using `:latest` tags**: always pin base image versions
 - **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
diff --git a/.cursor/skills/deploy/templates/ci_cd_pipeline.md b/.cursor/skills/deploy/templates/ci_cd_pipeline.md
index 57b8b41..16102e3 100644
--- a/.cursor/skills/deploy/templates/ci_cd_pipeline.md
+++ b/.cursor/skills/deploy/templates/ci_cd_pipeline.md
@@ -28,7 +28,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`.
 
 ### Test
 - Unit tests: [framework and command]
-- Integration tests: [framework and command, uses docker-compose.test.yml]
+- Blackbox tests: [framework and command, uses docker-compose.test.yml]
 - Coverage threshold: 75% overall, 90% critical paths
 - Coverage report published as pipeline artifact
 
@@ -54,7 +54,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`.
 - Automated rollback on health check failure
 
 ### Smoke Tests
-- Subset of integration tests targeting staging environment
+- Subset of blackbox tests targeting staging environment
 - Validates critical user flows
 - Timeout: [maximum duration]
 
diff --git a/.cursor/skills/deploy/templates/containerization.md b/.cursor/skills/deploy/templates/containerization.md
index d1025be..d6c7073 100644
--- a/.cursor/skills/deploy/templates/containerization.md
+++ b/.cursor/skills/deploy/templates/containerization.md
@@ -48,7 +48,7 @@ networks:
   [shared network]
 ```
 
-## Docker Compose — Integration Tests
+## Docker Compose — Blackbox Tests
 
 ```yaml
 # docker-compose.test.yml structure
diff --git a/.cursor/skills/deploy/templates/deploy_scripts.md b/.cursor/skills/deploy/templates/deploy_scripts.md
new file mode 100644
index 0000000..24e915c
--- /dev/null
+++ b/.cursor/skills/deploy/templates/deploy_scripts.md
@@ -0,0 +1,114 @@
+# Deployment Scripts Documentation Template
+
+Save as `_docs/04_deploy/deploy_scripts.md`.
+
+---
+
+```markdown
+# [System Name] — Deployment Scripts
+
+## Overview
+
+| Script | Purpose | Location |
+|--------|---------|----------|
+| `deploy.sh` | Main deployment orchestrator | `scripts/deploy.sh` |
+| `pull-images.sh` | Pull Docker images from registry | `scripts/pull-images.sh` |
+| `start-services.sh` | Start all services | `scripts/start-services.sh` |
+| `stop-services.sh` | Graceful shutdown | `scripts/stop-services.sh` |
+| `health-check.sh` | Verify deployment health | `scripts/health-check.sh` |
+
+## Prerequisites
+
+- Docker and Docker Compose installed on target machine
+- SSH access to target machine (configured via `DEPLOY_HOST`)
+- Container registry credentials configured
+- `.env` file with required environment variables (see `.env.example`)
+
+## Environment Variables
+
+All scripts source `.env` from the project root or accept variables from the environment.
+
+| Variable | Required By | Purpose |
+|----------|------------|---------|
+| `DEPLOY_HOST` | All (remote mode) | SSH target for remote deployment |
+| `REGISTRY_URL` | `pull-images.sh` | Container registry URL |
+| `REGISTRY_USER` | `pull-images.sh` | Registry authentication |
+| `REGISTRY_PASS` | `pull-images.sh` | Registry authentication |
+| `IMAGE_TAG` | `pull-images.sh`, `start-services.sh` | Image version to deploy (default: latest git SHA) |
+| [add project-specific variables] | | |
+
+## Script Details
+
+### deploy.sh
+
+Main orchestrator that runs the full deployment flow.
+
+**Usage**:
+- `./scripts/deploy.sh` — Deploy latest version
+- `./scripts/deploy.sh --rollback` — Rollback to previous version
+- `./scripts/deploy.sh --help` — Show usage
+
+**Flow**:
+1. Validate required environment variables
+2. Call `pull-images.sh`
+3. Call `stop-services.sh`
+4. Call `start-services.sh`
+5. Call `health-check.sh`
+6. Report success or failure
+
+**Rollback**: When `--rollback` is passed, reads the previous image tags saved by `stop-services.sh` and redeploys those versions.
+
+### pull-images.sh
+
+**Usage**: `./scripts/pull-images.sh [--help]`
+
+**Steps**:
+1. Authenticate with container registry (`REGISTRY_URL`)
+2. Pull all required images with specified `IMAGE_TAG`
+3. Verify image integrity via digest check
+4. Report pull results per image
+
+### start-services.sh
+
+**Usage**: `./scripts/start-services.sh [--help]`
+
+**Steps**:
+1. Run `docker compose up -d` with the correct env file
+2. Configure networks and volumes
+3. Wait for all containers to report healthy state
+4. Report startup status per service
+
+### stop-services.sh
+
+**Usage**: `./scripts/stop-services.sh [--help]`
+
+**Steps**:
+1. Save current image tags to `previous_tags.env` (for rollback)
+2. Stop services with graceful shutdown period (30s)
+3. Clean up orphaned containers and networks
+
+### health-check.sh
+
+**Usage**: `./scripts/health-check.sh [--help]`
+
+**Checks**:
+
+| Service | Endpoint | Expected |
+|---------|----------|----------|
+| [Component 1] | `http://localhost:[port]/health/live` | HTTP 200 |
+| [Component 2] | `http://localhost:[port]/health/ready` | HTTP 200 |
+| [add all services] | | |
+
+**Exit codes**:
+- `0` — All services healthy
+- `1` — One or more services unhealthy
+
+## Common Script Properties
+
+All scripts:
+- Use `#!/bin/bash` with `set -euo pipefail`
+- Support `--help` flag for usage information
+- Source `.env` from project root if present
+- Are idempotent where possible
+- Support remote execution via SSH when `DEPLOY_HOST` is set
+```
diff --git a/.cursor/skills/document/SKILL.md b/.cursor/skills/document/SKILL.md
new file mode 100644
index 0000000..c920555
--- /dev/null
+++ b/.cursor/skills/document/SKILL.md
@@ -0,0 +1,515 @@
+---
+name: document
+description: |
+  Bottom-up codebase documentation skill. Analyzes existing code from modules up through components
+  to architecture, then retrospectively derives problem/restrictions/acceptance criteria.
+  Produces the same _docs/ artifacts as the problem, research, and plan skills, but from code
+  analysis instead of user interview.
+  Trigger phrases:
+  - "document", "document codebase", "document this project"
+  - "documentation", "generate documentation", "create documentation"
+  - "reverse-engineer docs", "code to docs"
+  - "analyze and document"
+category: build
+tags: [documentation, code-analysis, reverse-engineering, architecture, bottom-up]
+disable-model-invocation: true
+---
+
+# Bottom-Up Codebase Documentation
+
+Analyze an existing codebase from the bottom up — individual modules first, then components, then system-level architecture — and produce the same `_docs/` artifacts that the `problem` and `plan` skills generate, without requiring user interview.
+
+## Core Principles
+
+- **Bottom-up always**: module docs -> component specs -> architecture/flows -> solution -> problem extraction. Every higher level is synthesized from the level below.
+- **Dependencies first**: process modules in topological order (leaves first). When documenting module X, all of X's dependencies already have docs.
+- **Incremental context**: each module's doc uses already-written dependency docs as context — no ever-growing chain.
+- **Verify against code**: cross-reference every entity in generated docs against actual codebase. Catch hallucinations.
+- **Save immediately**: write each artifact as soon as its step completes. Enable resume from any checkpoint.
+- **Ask, don't assume**: when code intent is ambiguous, ASK the user before proceeding.
+
+## Context Resolution
+
+Fixed paths:
+
+- DOCUMENT_DIR: `_docs/02_document/`
+- SOLUTION_DIR: `_docs/01_solution/`
+- PROBLEM_DIR: `_docs/00_problem/`
+
+Optional input:
+
+- FOCUS_DIR: a specific directory subtree provided by the user (e.g., `/document @src/api/`). When set, only this subtree and its transitive dependencies are analyzed.
+
+Announce resolved paths (and FOCUS_DIR if set) to user before proceeding.
+
+## Mode Detection
+
+Determine the execution mode before any other logic:
+
+| Mode | Trigger | Scope |
+|------|---------|-------|
+| **Full** | No input file, no existing state | Entire codebase |
+| **Focus Area** | User provides a directory path (e.g., `@src/api/`) | Only the specified subtree + transitive dependencies |
+| **Resume** | `state.json` exists in DOCUMENT_DIR | Continue from last checkpoint |
+
+Focus Area mode produces module + component docs for the targeted area only. It can be run repeatedly for different areas — each run appends to the existing module and component docs without overwriting other areas.
+
+## Prerequisite Checks
+
+1. If `_docs/` already exists and contains files AND mode is **Full**, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?**
+2. Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
+3. If DOCUMENT_DIR contains a `state.json`, offer to **resume from last checkpoint or start fresh**
+4. If FOCUS_DIR is set, verify the directory exists and contains source files — **STOP if missing**
+
+## Progress Tracking
+
+Create a TodoWrite with all steps (0 through 7). Update status as each step completes.
+
+## Workflow
+
+### Step 0: Codebase Discovery
+
+**Role**: Code analyst
+**Goal**: Build a complete map of the codebase (or targeted subtree) before analyzing any code.
+
+**Focus Area scoping**: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.
+
+Scan and catalog:
+
+1. Directory tree (ignore `node_modules`, `.git`, `__pycache__`, `bin/`, `obj/`, build artifacts)
+2. Language detection from file extensions and config files
+3. Package manifests: `package.json`, `requirements.txt`, `pyproject.toml`, `*.csproj`, `Cargo.toml`, `go.mod`
+4. Config files: `Dockerfile`, `docker-compose.yml`, `.env.example`, CI/CD configs (`.github/workflows/`, `.gitlab-ci.yml`, `azure-pipelines.yml`)
+5. Entry points: `main.*`, `app.*`, `index.*`, `Program.*`, startup scripts
+6. Test structure: test directories, test frameworks, test runner configs
+7. Existing documentation: README, `docs/`, wiki references, inline doc coverage
+8. **Dependency graph**: build a module-level dependency graph by analyzing imports/references. Identify:
+   - Leaf modules (no internal dependencies)
+   - Entry points (no internal dependents)
+   - Cycles (mark for grouped analysis)
+   - Topological processing order
+   - If FOCUS_DIR: mark which modules are in-scope vs dependency-only
+
+**Save**: `DOCUMENT_DIR/00_discovery.md` containing:
+- Directory tree (concise, relevant directories only)
+- Tech stack summary table (language, framework, database, infra)
+- Dependency graph (textual list + Mermaid diagram)
+- Topological processing order
+- Entry points and leaf modules
+
+**Save**: `DOCUMENT_DIR/state.json` with initial state:
+```json
+{
+  "current_step": "module-analysis",
+  "completed_steps": ["discovery"],
+  "focus_dir": null,
+  "modules_total": 0,
+  "modules_documented": [],
+  "modules_remaining": [],
+  "module_batch": 0,
+  "components_written": [],
+  "last_updated": ""
+}
+```
+
+Set `focus_dir` to the FOCUS_DIR path if in Focus Area mode, or `null` for Full mode.
+
+---
+
+### Step 1: Module-Level Documentation
+
+**Role**: Code analyst
+**Goal**: Document every identified module individually, processing in topological order (leaves first).
+
+**Batched processing**: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update `state.json`, present a progress summary. Between batches, evaluate whether to suggest a session break.
+
+For each module in topological order:
+
+1. **Read**: read the module's source code. Assess complexity and what context is needed.
+2. **Gather context**: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
+3. **Write module doc** with these sections:
+   - **Purpose**: one-sentence responsibility
+   - **Public interface**: exported functions/classes/methods with signatures, input/output types
+   - **Internal logic**: key algorithms, patterns, non-obvious behavior
+   - **Dependencies**: what it imports internally and why
+   - **Consumers**: what uses this module (from the dependency graph)
+   - **Data models**: entities/types defined in this module
+   - **Configuration**: env vars, config keys consumed
+   - **External integrations**: HTTP calls, DB queries, queue operations, file I/O
+   - **Security**: auth checks, encryption, input validation, secrets access
+   - **Tests**: what tests exist for this module, what they cover
+4. **Verify**: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.
+
+**Cycle handling**: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.
+
+**Large modules**: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.
+
+**Save**: `DOCUMENT_DIR/modules/[module_name].md` for each module.
+**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`). Increment `module_batch` after each batch of ~5.
+
+**Session break heuristic**: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:
+
+```
+══════════════════════════════════════
+ SESSION BREAK SUGGESTED
+══════════════════════════════════════
+ Modules documented: [X] of [Y]
+ Batches completed this session: [N]
+══════════════════════════════════════
+ A) Continue in this conversation
+ B) Save and continue in a fresh conversation (recommended)
+══════════════════════════════════════
+ Recommendation: B — fresh context improves
+ analysis quality for remaining modules
+══════════════════════════════════════
+```
+
+Re-entry is seamless: `state.json` tracks exactly which modules are done.
+
+---
+
+### Step 2: Component Assembly
+
+**Role**: Software architect
+**Goal**: Group related modules into logical components and produce component specs.
+
+1. Analyze module docs from Step 1 to identify natural groupings:
+   - By directory structure (most common)
+   - By shared data models or common purpose
+   - By dependency clusters (tightly coupled modules)
+2. For each identified component, synthesize its module docs into a single component specification using `templates/component-spec.md` as structure:
+   - High-level overview: purpose, pattern, upstream/downstream
+   - Internal interfaces: method signatures, DTOs (from actual module code)
+   - External API specification (if the component exposes HTTP/gRPC endpoints)
+   - Data access patterns: queries, caching, storage estimates
+   - Implementation details: algorithmic complexity, state management, key libraries
+   - Extensions and helpers: shared utilities needed
+   - Caveats and edge cases: limitations, race conditions, bottlenecks
+   - Dependency graph: implementation order relative to other components
+   - Logging strategy
+3. Identify common helpers shared across multiple components -> document in `common-helpers/`
+4. Generate component relationship diagram (Mermaid)
+
+**Self-verification**:
+- [ ] Every module from Step 1 is covered by exactly one component
+- [ ] No component has overlapping responsibility with another
+- [ ] Inter-component interfaces are explicit (who calls whom, with what)
+- [ ] Component dependency graph has no circular dependencies
+
+**Save**:
+- `DOCUMENT_DIR/components/[##]_[name]/description.md` per component
+- `DOCUMENT_DIR/common-helpers/[##]_helper_[name].md` per shared helper
+- `DOCUMENT_DIR/diagrams/components.md` (Mermaid component diagram)
+
+**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.
+
+---
+
+### Step 3: System-Level Synthesis
+
+**Role**: Software architect
+**Goal**: From component docs, synthesize system-level documents.
+
+All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.
+
+#### 3a. Architecture
+
+Using `templates/architecture.md` as structure:
+
+- System context and boundaries from entry points and external integrations
+- Tech stack table from discovery (Step 0) + component specs
+- Deployment model from Dockerfiles, CI configs, environment strategies
+- Data model overview from per-component data access sections
+- Integration points from inter-component interfaces
+- NFRs from test thresholds, config limits, health checks
+- Security architecture from per-module security observations
+- Key ADRs inferred from technology choices and patterns
+
+**Save**: `DOCUMENT_DIR/architecture.md`
+
+#### 3b. System Flows
+
+Using `templates/system-flows.md` as structure:
+
+- Trace main flows through the component interaction graph
+- Entry point -> component chain -> output for each major flow
+- Mermaid sequence diagrams and flowcharts
+- Error scenarios from exception handling patterns
+- Data flow tables per flow
+
+**Save**: `DOCUMENT_DIR/system-flows.md` and `DOCUMENT_DIR/diagrams/flows/flow_[name].md`
+
+#### 3c. Data Model
+
+- Consolidate all data models from module docs
+- Entity-relationship diagram (Mermaid ERD)
+- Migration strategy (if ORM/migration tooling detected)
+- Seed data observations
+- Backward compatibility approach (if versioning found)
+
+**Save**: `DOCUMENT_DIR/data_model.md`
+
+#### 3d. Deployment (if Dockerfile/CI configs exist)
+
+- Containerization summary
+- CI/CD pipeline structure
+- Environment strategy (dev, staging, production)
+- Observability (logging patterns, metrics, health checks found in code)
+
+**Save**: `DOCUMENT_DIR/deployment/` (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)
+
+---
+
+### Step 4: Verification Pass
+
+**Role**: Quality verifier
+**Goal**: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.
+
+For each document generated in Steps 1-3:
+
+1. **Entity verification**: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
+2. **Interface accuracy**: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
+3. **Flow correctness**: for each system flow diagram, trace the actual code path and verify the sequence matches.
+4. **Completeness check**: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
+5. **Consistency check**: do component docs agree with architecture doc? Do flow diagrams match component interfaces?
+
+Apply corrections inline to the documents that need them.
+
+**Save**: `DOCUMENT_DIR/04_verification_log.md` with:
+- Total entities verified vs flagged
+- Corrections applied (which document, what changed)
+- Remaining gaps or uncertainties
+- Completeness score (modules covered / total modules)
+
+**BLOCKING**: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.
+
+**Session boundary**: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:
+
+```
+══════════════════════════════════════
+ VERIFICATION COMPLETE — session break?
+══════════════════════════════════════
+ Steps 0–4 (analysis + verification) are done.
+ Steps 5–7 (solution + problem extraction + report)
+ can run in a fresh conversation.
+══════════════════════════════════════
+ A) Continue in this conversation
+ B) Save and continue in a new conversation (recommended)
+══════════════════════════════════════
+```
+
+If **Focus Area mode**: Steps 5–7 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run `/document` again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.
+
+---
+
+### Step 5: Solution Extraction (Retrospective)
+
+**Role**: Software architect
+**Goal**: From all verified technical documentation, retrospectively create `solution.md` — the same artifact the research skill produces. This makes downstream skills (`plan`, `deploy`, `decompose`) compatible with the documented codebase.
+
+Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):
+
+1. **Product Solution Description**: what the system is, brief component interaction diagram (Mermaid)
+2. **Architecture**: the architecture that is implemented, with per-component solution tables:
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| [actual implementation] | [libs/platforms used] | [observed strengths] | [observed limitations] | [requirements met] | [security approach] | [cost indicators] | [fitness assessment] |
+
+3. **Testing Strategy**: summarize integration/functional tests and non-functional tests found in the codebase
+4. **References**: links to key config files, Dockerfiles, CI configs that evidence the solution choices
+
+**Save**: `SOLUTION_DIR/solution.md` (`_docs/01_solution/solution.md`)
+
+---
+
+### Step 6: Problem Extraction (Retrospective)
+
+**Role**: Business analyst
+**Goal**: From all verified technical docs, retrospectively derive the high-level problem definition — producing the same documents the `problem` skill creates through interview.
+
+This is the inverse of normal workflow: instead of problem -> solution -> code, we go code -> technical docs -> problem understanding.
+
+#### 6a. `problem.md`
+
+- Synthesize from architecture overview + component purposes + system flows
+- What is this system? What problem does it solve? Who are the users? How does it work at a high level?
+- Cross-reference with README if one exists
+- Free-form text, concise, readable by someone unfamiliar with the project
+
+#### 6b. `restrictions.md`
+
+- Extract from: tech stack choices, Dockerfile specs (OS, base images), CI configs (platform constraints), dependency versions, environment configs
+- Categorize with headers: Hardware, Software, Environment, Operational
+- Each restriction should be specific and testable
+
+#### 6c. `acceptance_criteria.md`
+
+- Derive from: test assertions (expected values, thresholds), performance configs (timeouts, rate limits, batch sizes), health check endpoints, validation rules in code
+- Categorize with headers by domain
+- Every criterion must have a measurable value — if only implied, note the source
+
+#### 6d. `input_data/`
+
+- Document data schemas found (DB schemas, API request/response types, config file formats)
+- Create `data_parameters.md` describing what data the system consumes, formats, volumes, update patterns
+
+#### 6e. `security_approach.md` (only if security code found)
+
+- Authentication mechanisms, authorization patterns, encryption, secrets handling, CORS, rate limiting, input sanitization — all from code observations
+- If no security-relevant code found, skip this file
+
+**Save**: all files to `PROBLEM_DIR/` (`_docs/00_problem/`)
+
+**BLOCKING**: Present all problem documents to user. These are the most abstracted and therefore most prone to interpretation error. Do NOT proceed until user confirms or requests corrections.
+
+---
+
+### Step 7: Final Report
+
+**Role**: Technical writer
+**Goal**: Produce `FINAL_report.md` integrating all generated documentation.
+
+Using `templates/final-report.md` as structure:
+
+- Executive summary from architecture + problem docs
+- Problem statement (transformed from problem.md, not copy-pasted)
+- Architecture overview with tech stack one-liner
+- Component summary table (number, name, purpose, dependencies)
+- System flows summary table
+- Risk observations from verification log (Step 4)
+- Open questions (uncertainties flagged during analysis)
+- Artifact index listing all generated documents with paths
+
+**Save**: `DOCUMENT_DIR/FINAL_report.md`
+
+**State**: update `state.json` with `current_step: "complete"`.
+
+---
+
+## Artifact Management
+
+### Directory Structure
+
+```
+_docs/
+├── 00_problem/                          # Step 6 (retrospective)
+│   ├── problem.md
+│   ├── restrictions.md
+│   ├── acceptance_criteria.md
+│   ├── input_data/
+│   │   └── data_parameters.md
+│   └── security_approach.md
+├── 01_solution/                         # Step 5 (retrospective)
+│   └── solution.md
+└── 02_document/                         # DOCUMENT_DIR
+    ├── 00_discovery.md                  # Step 0
+    ├── modules/                         # Step 1
+    │   ├── [module_name].md
+    │   └── ...
+    ├── components/                      # Step 2
+    │   ├── 01_[name]/description.md
+    │   ├── 02_[name]/description.md
+    │   └── ...
+    ├── common-helpers/                  # Step 2
+    ├── architecture.md                  # Step 3
+    ├── system-flows.md                  # Step 3
+    ├── data_model.md                    # Step 3
+    ├── deployment/                      # Step 3
+    ├── diagrams/                        # Steps 2-3
+    │   ├── components.md
+    │   └── flows/
+    ├── 04_verification_log.md           # Step 4
+    ├── FINAL_report.md                  # Step 7
+    └── state.json                       # Resumability
+```
+
+### Resumability
+
+Maintain `DOCUMENT_DIR/state.json`:
+
+```json
+{
+  "current_step": "module-analysis",
+  "completed_steps": ["discovery"],
+  "focus_dir": null,
+  "modules_total": 12,
+  "modules_documented": ["utils/helpers", "models/user"],
+  "modules_remaining": ["services/auth", "api/endpoints"],
+  "module_batch": 1,
+  "components_written": [],
+  "last_updated": "2026-03-21T14:00:00Z"
+}
+```
+
+Update after each module/component completes. If interrupted, resume from next undocumented module.
+
+When resuming:
+1. Read `state.json`
+2. Cross-check against actual files in DOCUMENT_DIR (trust files over state if they disagree)
+3. Continue from the next incomplete item
+4. Inform user which steps are being skipped
+
+### Save Principles
+
+1. **Save immediately**: write each module doc as soon as analysis completes
+2. **Incremental context**: each subsequent module uses already-written docs as context
+3. **Preserve intermediates**: keep all module docs even after synthesis into component docs
+4. **Enable recovery**: state file tracks exact progress for resume
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Minified/obfuscated code detected | WARN user, skip module, note in verification log |
+| Module too large for context window | Split into sub-sections, analyze parts separately, combine |
+| Cycle in dependency graph | Group cycled modules, analyze together as one doc |
+| Generated code (protobuf, swagger-gen) | Note as generated, document the source spec instead |
+| No tests found in codebase | Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only |
+| Contradictions between code and README | Flag in verification log, ASK user |
+| Binary files or non-code assets | Skip, note in discovery |
+| `_docs/` already exists | ASK user: overwrite, merge, or use `_docs_generated/` |
+| Code intent is ambiguous | ASK user, do not guess |
+
+## Common Mistakes
+
+- **Top-down guessing**: never infer architecture before documenting modules. Build up, don't assume down.
+- **Hallucinating entities**: always verify that referenced classes/functions/endpoints actually exist in code.
+- **Skipping modules**: every source module must appear in exactly one module doc and one component.
+- **Monolithic analysis**: don't try to analyze the entire codebase in one pass. Module by module, in order.
+- **Inventing restrictions**: only document constraints actually evidenced in code, configs, or Dockerfiles.
+- **Vague acceptance criteria**: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
+- **Writing code**: this skill produces documents, never implementation code.
+
+## Methodology Quick Reference
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│          Bottom-Up Codebase Documentation (8-Step)               │
+├──────────────────────────────────────────────────────────────────┤
+│ MODE: Full / Focus Area (@dir) / Resume (state.json)             │
+│ PREREQ: Check _docs/ exists (overwrite/merge/new?)               │
+│ PREREQ: Check state.json for resume                              │
+│                                                                  │
+│ 0. Discovery          → dependency graph, tech stack, topo order │
+│    (Focus Area: scoped to FOCUS_DIR + transitive deps)           │
+│ 1. Module Docs        → per-module analysis (leaves first)       │
+│    (batched ~5 modules; session break between batches)           │
+│ 2. Component Assembly → group modules, write component specs     │
+│    [BLOCKING: user confirms components]                          │
+│ 3. System Synthesis   → architecture, flows, data model, deploy  │
+│ 4. Verification       → compare all docs vs code, fix errors     │
+│    [BLOCKING: user reviews corrections]                          │
+│    [SESSION BREAK suggested before Steps 5–7]                    │
+│    ── Focus Area mode stops here ──                              │
+│ 5. Solution Extraction → retrospective solution.md               │
+│ 6. Problem Extraction → retrospective problem, restrictions, AC  │
+│    [BLOCKING: user confirms problem docs]                        │
+│ 7. Final Report       → FINAL_report.md                          │
+├──────────────────────────────────────────────────────────────────┤
+│ Principles: Bottom-up always · Dependencies first                │
+│             Incremental context · Verify against code            │
+│             Save immediately · Resume from checkpoint            │
+│             Batch modules · Session breaks for large codebases   │
+└──────────────────────────────────────────────────────────────────┘
+```
diff --git a/.cursor/skills/implement/SKILL.md b/.cursor/skills/implement/SKILL.md
index fb24044..cf44a57 100644
--- a/.cursor/skills/implement/SKILL.md
+++ b/.cursor/skills/implement/SKILL.md
@@ -73,9 +73,9 @@ For each task in the batch:
 - Determine: files OWNED (exclusive write), files READ-ONLY (shared interfaces, types), files FORBIDDEN (other agents' owned files)
 - If two tasks in the same batch would modify the same file, schedule them sequentially instead of in parallel
 
-### 5. Update Jira Status → In Progress
+### 5. Update Tracker Status → In Progress
 
-For each task in the batch, transition its Jira ticket status to **In Progress** via Jira MCP before launching the implementer.
+For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `protocols.md` for detection) before launching the implementer. If `tracker: local`, skip this step.
 
 ### 6. Launch Implementer Subagents
 
@@ -93,15 +93,30 @@ Launch all subagents immediately — no user confirmation.
 - Collect structured status reports from each implementer
 - If any implementer reports "Blocked", log the blocker and continue with others
 
+**Stuck detection** — while monitoring, watch for these signals per subagent:
+- Same file modified 3+ times without test pass rate improving → flag as stuck, stop the subagent, report as Blocked
+- Subagent has not produced new output for an extended period → flag as potentially hung
+- If a subagent is flagged as stuck, do NOT let it continue looping — stop it and record the blocker in the batch report
+
 ### 8. Code Review
 
 - Run `/code-review` skill on the batch's changed files + corresponding task specs
 - The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL
 
-### 9. Gate
+### 9. Auto-Fix Gate
 
-- If verdict is **FAIL**: present findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding.
-- If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically.
+Auto-fix loop with bounded retries (max 2 attempts) before escalating to user:
+
+1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 10
+2. If verdict is **FAIL** (attempt 1 or 2):
+   - Parse the code review findings (Critical and High severity items)
+   - For each finding, attempt an automated fix using the finding's location, description, and suggestion
+   - Re-run `/code-review` on the modified files
+   - If now PASS or PASS_WITH_WARNINGS → continue to step 10
+   - If still FAIL → increment retry counter, repeat from (2) up to max 2 attempts
+3. If still **FAIL** after 2 auto-fix attempts: present all findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding.
+
+Track `auto_fix_attempts` count in the batch report for retrospective analysis.
 
 ### 10. Test
 
@@ -112,12 +127,12 @@ Launch all subagents immediately — no user confirmation.
 
 - After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
   - `git add` all changed files from the batch
-  - `git commit` with a message that includes ALL JIRA-IDs of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[JIRA-ID-1] [JIRA-ID-2] ... Summary of changes`
+  - `git commit` with a message that includes ALL task IDs (Jira IDs, ADO IDs, or numeric prefixes) of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[TASK-ID-1] [TASK-ID-2] ... Summary of changes`
   - `git push` to the remote branch
 
-### 12. Update Jira Status → In Testing
+### 12. Update Tracker Status → In Testing
 
-After the batch is committed and pushed, transition the Jira ticket status of each task in the batch to **In Testing** via Jira MCP.
+After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step.
 
 ### 13. Loop
 
@@ -146,6 +161,8 @@ After each batch, produce a structured report:
 | [JIRA-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |
 
 ## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
+## Auto-Fix Attempts: [0/1/2]
+## Stuck Agents: [count or None]
 
 ## Next Batch: [task list] or "All tasks complete"
 ```
@@ -173,5 +190,5 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed:
 
 - Never launch tasks whose dependencies are not yet completed
 - Never allow two parallel agents to write to the same file
-- If a subagent fails, do NOT retry automatically — report and let user decide
+- If a subagent fails or is flagged as stuck, stop it and report — do not let it loop indefinitely
 - Always run tests after each batch completes
diff --git a/.cursor/skills/new-task/SKILL.md b/.cursor/skills/new-task/SKILL.md
new file mode 100644
index 0000000..e68ff4c
--- /dev/null
+++ b/.cursor/skills/new-task/SKILL.md
@@ -0,0 +1,302 @@
+---
+name: new-task
+description: |
+  Interactive skill for adding new functionality to an existing codebase.
+  Guides the user through describing the feature, assessing complexity,
+  optionally running research, analyzing the codebase for insertion points,
+  validating assumptions with the user, and producing a task spec with Jira ticket.
+  Supports a loop — the user can add multiple tasks in one session.
+  Trigger phrases:
+  - "new task", "add feature", "new functionality"
+  - "I want to add", "new component", "extend"
+category: build
+tags: [task, feature, interactive, planning, jira]
+disable-model-invocation: true
+---
+
+# New Task (Interactive Feature Planning)
+
+Guide the user through defining new functionality for an existing codebase. Produces one or more task specifications with Jira tickets, optionally running deep research for complex features.
+
+## Core Principles
+
+- **User-driven**: every task starts with the user's description; never invent requirements
+- **Right-size research**: only invoke the research skill when the change is big enough to warrant it
+- **Validate before committing**: surface all assumptions and uncertainties to the user before writing the task file
+- **Save immediately**: write task files to disk as soon as they are ready; never accumulate unsaved work
+- **Ask, don't assume**: when scope, insertion point, or approach is unclear, STOP and ask the user
+
+## Context Resolution
+
+Fixed paths:
+
+- TASKS_DIR: `_docs/02_tasks/`
+- PLANS_DIR: `_docs/02_task_plans/`
+- DOCUMENT_DIR: `_docs/02_document/`
+- DEPENDENCIES_TABLE: `_docs/02_tasks/_dependencies_table.md`
+
+Create TASKS_DIR and PLANS_DIR if they don't exist.
+
+If TASKS_DIR already contains task files, scan them to determine the next numeric prefix for temporary file naming.
+
+## Workflow
+
+The skill runs as a loop. Each iteration produces one task. After each task the user chooses to add another or finish.
+
+---
+
+### Step 1: Gather Feature Description
+
+**Role**: Product analyst
+**Goal**: Get a clear, detailed description of the new functionality from the user.
+
+Ask the user:
+
+```
+══════════════════════════════════════
+ NEW TASK: Describe the functionality
+══════════════════════════════════════
+ Please describe in detail the new functionality you want to add:
+ - What should it do?
+ - Who is it for?
+ - Any specific requirements or constraints?
+══════════════════════════════════════
+```
+
+**BLOCKING**: Do NOT proceed until the user provides a description.
+
+Record the description verbatim for use in subsequent steps.
+
+---
+
+### Step 2: Analyze Complexity
+
+**Role**: Technical analyst
+**Goal**: Determine whether deep research is needed.
+
+Read the user's description and the existing codebase documentation from DOCUMENT_DIR (architecture.md, components/, system-flows.md).
+
+Assess the change along these dimensions:
+- **Scope**: how many components/files are affected?
+- **Novelty**: does it involve libraries, protocols, or patterns not already in the codebase?
+- **Risk**: could it break existing functionality or require architectural changes?
+
+Classification:
+
+| Category | Criteria | Action |
+|----------|----------|--------|
+| **Needs research** | New libraries/frameworks, unfamiliar protocols, significant architectural change, multiple unknowns | Proceed to Step 3 (Research) |
+| **Skip research** | Extends existing functionality, uses patterns already in codebase, straightforward new component with known tech | Skip to Step 4 (Codebase Analysis) |
+
+Present the assessment to the user:
+
+```
+══════════════════════════════════════
+ COMPLEXITY ASSESSMENT
+══════════════════════════════════════
+ Scope:   [low / medium / high]
+ Novelty: [low / medium / high]
+ Risk:    [low / medium / high]
+══════════════════════════════════════
+ Recommendation: [Research needed / Skip research]
+ Reason: [one-line justification]
+══════════════════════════════════════
+```
+
+**BLOCKING**: Ask the user to confirm or override the recommendation before proceeding.
+
+---
+
+### Step 3: Research (conditional)
+
+**Role**: Researcher
+**Goal**: Investigate unknowns before task specification.
+
+This step only runs if Step 2 determined research is needed.
+
+1. Create a problem description file at `PLANS_DIR/<task_slug>/problem.md` summarizing the feature request and the specific unknowns to investigate
+2. Invoke `.cursor/skills/research/SKILL.md` in standalone mode:
+   - INPUT_FILE: `PLANS_DIR/<task_slug>/problem.md`
+   - BASE_DIR: `PLANS_DIR/<task_slug>/`
+3. After research completes, read the solution draft from `PLANS_DIR/<task_slug>/01_solution/solution_draft01.md`
+4. Extract the key findings relevant to the task specification
+
+The `<task_slug>` is a short kebab-case name derived from the feature description (e.g., `auth-provider-integration`, `real-time-notifications`).
+
+---
+
+### Step 4: Codebase Analysis
+
+**Role**: Software architect
+**Goal**: Determine where and how to insert the new functionality.
+
+1. Read the codebase documentation from DOCUMENT_DIR:
+   - `architecture.md` — overall structure
+   - `components/` — component specs
+   - `system-flows.md` — data flows (if exists)
+   - `data_model.md` — data model (if exists)
+2. If research was performed (Step 3), incorporate findings
+3. Analyze and determine:
+   - Which existing components are affected
+   - Where new code should be inserted (which layers, modules, files)
+   - What interfaces need to change
+   - What new interfaces or models are needed
+   - How data flows through the change
+4. If the change is complex enough, read the actual source files (not just docs) to verify insertion points
+
+Present the analysis:
+
+```
+══════════════════════════════════════
+ CODEBASE ANALYSIS
+══════════════════════════════════════
+ Affected components: [list]
+ Insertion points:    [list of modules/layers]
+ Interface changes:   [list or "None"]
+ New interfaces:      [list or "None"]
+ Data flow impact:    [summary]
+══════════════════════════════════════
+```
+
+---
+
+### Step 5: Validate Assumptions
+
+**Role**: Quality gate
+**Goal**: Surface every uncertainty and get user confirmation.
+
+Review all decisions and assumptions made in Steps 2–4. For each uncertainty:
+1. State the assumption clearly
+2. Propose a solution or approach
+3. List alternatives if they exist
+
+Present using the Choose format for each decision that has meaningful alternatives:
+
+```
+══════════════════════════════════════
+ ASSUMPTION VALIDATION
+══════════════════════════════════════
+ 1. [Assumption]: [proposed approach]
+    Alternative: [other option, if any]
+ 2. [Assumption]: [proposed approach]
+    Alternative: [other option, if any]
+ ...
+══════════════════════════════════════
+ Please confirm or correct these assumptions.
+══════════════════════════════════════
+```
+
+**BLOCKING**: Do NOT proceed until the user confirms or corrects all assumptions.
+
+---
+
+### Step 6: Create Task
+
+**Role**: Technical writer
+**Goal**: Produce the task specification file.
+
+1. Determine the next numeric prefix by scanning TASKS_DIR for existing files
+2. Write the task file using `.cursor/skills/decompose/templates/task.md`:
+   - Fill all fields from the gathered information
+   - Set **Complexity** based on the assessment from Step 2
+   - Set **Dependencies** by cross-referencing existing tasks in TASKS_DIR
+   - Set **Jira** and **Epic** to `pending` (filled in Step 7)
+3. Save as `TASKS_DIR/[##]_[short_name].md`
+
+**Self-verification**:
+- [ ] Problem section clearly describes the user need
+- [ ] Acceptance criteria are testable (Gherkin format)
+- [ ] Scope boundaries are explicit
+- [ ] Complexity points match the assessment
+- [ ] Dependencies reference existing task Jira IDs where applicable
+- [ ] No implementation details leaked into the spec
+
+---
+
+### Step 7: Work Item Ticket
+
+**Role**: Project coordinator
+**Goal**: Create a work item ticket and link it to the task file.
+
+1. Create a ticket via the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md` for detection):
+   - Summary: the task's **Name** field
+   - Description: the task's **Problem** and **Acceptance Criteria** sections
+   - Story points: the task's **Complexity** value
+   - Link to the appropriate epic (ask user if unclear which epic)
+2. Write the ticket ID and Epic ID back into the task file header:
+   - Update **Task** field: `[TICKET-ID]_[short_name]`
+   - Update **Jira** field: `[TICKET-ID]`
+   - Update **Epic** field: `[EPIC-ID]`
+3. Rename the file from `[##]_[short_name].md` to `[TICKET-ID]_[short_name].md`
+
+If the work item tracker is not authenticated or unavailable (`tracker: local`):
+- Keep the numeric prefix
+- Set **Jira** to `pending`
+- Set **Epic** to `pending`
+- The task is still valid and can be implemented; tracker sync happens later
+
+---
+
+### Step 8: Loop Gate
+
+Ask the user:
+
+```
+══════════════════════════════════════
+ Task created: [JIRA-ID or ##] — [task name]
+══════════════════════════════════════
+ A) Add another task
+ B) Done — finish and update dependencies
+══════════════════════════════════════
+```
+
+- If **A** → loop back to Step 1
+- If **B** → proceed to Finalize
+
+---
+
+### Finalize
+
+After the user chooses **Done**:
+
+1. Update (or create) `TASKS_DIR/_dependencies_table.md` — add all newly created tasks to the dependencies table
+2. Present a summary of all tasks created in this session:
+
+```
+══════════════════════════════════════
+ NEW TASK SUMMARY
+══════════════════════════════════════
+ Tasks created: N
+ Total complexity: M points
+ ─────────────────────────────────────
+ [JIRA-ID] [name] ([complexity] pts)
+ [JIRA-ID] [name] ([complexity] pts)
+ ...
+══════════════════════════════════════
+```
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| User description is vague or incomplete | **ASK** for more detail — do not guess |
+| Unclear which epic to link to | **ASK** user for the epic |
+| Research skill hits a blocker | Follow research skill's own escalation rules |
+| Codebase analysis reveals conflicting architectures | **ASK** user which pattern to follow |
+| Complexity exceeds 5 points | **WARN** user and suggest splitting into multiple tasks |
+| Jira MCP unavailable | **WARN**, continue with local-only task files |
+
+## Trigger Conditions
+
+When the user wants to:
+- Add new functionality to an existing codebase
+- Plan a new feature or component
+- Create task specifications for upcoming work
+
+**Keywords**: "new task", "add feature", "new functionality", "extend", "I want to add"
+
+**Differentiation**:
+- User wants to decompose an existing plan into tasks → use `/decompose`
+- User wants to research a topic without creating tasks → use `/research`
+- User wants to refactor existing code → use `/refactor`
+- User wants to define and plan a new feature → use this skill
diff --git a/.cursor/skills/new-task/templates/task.md b/.cursor/skills/new-task/templates/task.md
new file mode 100644
index 0000000..3a52cf9
--- /dev/null
+++ b/.cursor/skills/new-task/templates/task.md
@@ -0,0 +1,2 @@
+<!-- This skill uses the shared task template at .cursor/skills/decompose/templates/task.md -->
+<!-- See that file for the full template structure. -->
diff --git a/.cursor/skills/plan/SKILL.md b/.cursor/skills/plan/SKILL.md
index 5ee6222..b1cc48d 100644
--- a/.cursor/skills/plan/SKILL.md
+++ b/.cursor/skills/plan/SKILL.md
@@ -3,7 +3,7 @@ name: plan
 description: |
   Decompose a solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics.
   Systematic 6-step planning workflow with BLOCKING gates, self-verification, and structured artifact management.
-  Uses _docs/ + _docs/02_plans/ structure.
+  Uses _docs/ + _docs/02_document/ structure.
   Trigger phrases:
   - "plan", "decompose solution", "architecture planning"
   - "break down the solution", "create planning documents"
@@ -31,13 +31,11 @@ Fixed paths — no mode detection needed:
 
 - PROBLEM_FILE: `_docs/00_problem/problem.md`
 - SOLUTION_FILE: `_docs/01_solution/solution.md`
-- PLANS_DIR: `_docs/02_plans/`
+- DOCUMENT_DIR: `_docs/02_document/`
 
 Announce the resolved paths to the user before proceeding.
 
-## Input Specification
-
-### Required Files
+## Required Files
 
 | File | Purpose |
 |------|---------|
@@ -47,170 +45,23 @@ Announce the resolved paths to the user before proceeding.
 | `_docs/00_problem/input_data/` | Reference data examples |
 | `_docs/01_solution/solution.md` | Finalized solution to decompose |
 
-### Prerequisite Checks (BLOCKING)
+## Prerequisites
 
-Run sequentially before any planning step:
-
-**Prereq 1: Data Gate**
-
-1. `_docs/00_problem/acceptance_criteria.md` exists and is non-empty — **STOP if missing**
-2. `_docs/00_problem/restrictions.md` exists and is non-empty — **STOP if missing**
-3. `_docs/00_problem/input_data/` exists and contains at least one data file — **STOP if missing**
-4. `_docs/00_problem/problem.md` exists and is non-empty — **STOP if missing**
-
-All four are mandatory. If any is missing or empty, STOP and ask the user to provide them. If the user cannot provide the required data, planning cannot proceed — just stop.
-
-**Prereq 2: Finalize Solution Draft**
-
-Only runs after the Data Gate passes:
-
-1. Scan `_docs/01_solution/` for files matching `solution_draft*.md`
-2. Identify the highest-numbered draft (e.g. `solution_draft06.md`)
-3. **Rename** it to `_docs/01_solution/solution.md`
-4. If `solution.md` already exists, ask the user whether to overwrite or keep existing
-5. Verify `solution.md` is non-empty — **STOP if missing or empty**
-
-**Prereq 3: Workspace Setup**
-
-1. Create PLANS_DIR if it does not exist
-2. If PLANS_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
+Read and follow `steps/00_prerequisites.md`. All three prerequisite checks are **BLOCKING** — do not start the workflow until they pass.
 
 ## Artifact Management
 
-### Directory Structure
-
-All artifacts are written directly under PLANS_DIR:
-
-```
-PLANS_DIR/
-├── integration_tests/
-│   ├── environment.md
-│   ├── test_data.md
-│   ├── functional_tests.md
-│   ├── non_functional_tests.md
-│   └── traceability_matrix.md
-├── architecture.md
-├── system-flows.md
-├── data_model.md
-├── deployment/
-│   ├── containerization.md
-│   ├── ci_cd_pipeline.md
-│   ├── environment_strategy.md
-│   ├── observability.md
-│   └── deployment_procedures.md
-├── risk_mitigations.md
-├── risk_mitigations_02.md          (iterative, ## as sequence)
-├── components/
-│   ├── 01_[name]/
-│   │   ├── description.md
-│   │   └── tests.md
-│   ├── 02_[name]/
-│   │   ├── description.md
-│   │   └── tests.md
-│   └── ...
-├── common-helpers/
-│   ├── 01_helper_[name]/
-│   ├── 02_helper_[name]/
-│   └── ...
-├── diagrams/
-│   ├── components.drawio
-│   └── flows/
-│       ├── flow_[name].md          (Mermaid)
-│       └── ...
-└── FINAL_report.md
-```
-
-### Save Timing
-
-| Step | Save immediately after | Filename |
-|------|------------------------|----------|
-| Step 1 | Integration test environment spec | `integration_tests/environment.md` |
-| Step 1 | Integration test data spec | `integration_tests/test_data.md` |
-| Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
-| Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
-| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
-| Step 2 | Architecture analysis complete | `architecture.md` |
-| Step 2 | System flows documented | `system-flows.md` |
-| Step 2 | Data model documented | `data_model.md` |
-| Step 2 | Deployment plan complete | `deployment/` (5 files) |
-| Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
-| Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
-| Step 3 | Diagrams generated | `diagrams/` |
-| Step 4 | Risk assessment complete | `risk_mitigations.md` |
-| Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
-| Step 6 | Epics created in Jira | Jira via MCP |
-| Final | All steps complete | `FINAL_report.md` |
-
-### Save Principles
-
-1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
-2. **Incremental updates**: same file can be updated multiple times; append or replace
-3. **Preserve process**: keep all intermediate files even after integration into final report
-4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
-
-### Resumability
-
-If PLANS_DIR already contains artifacts:
-
-1. List existing files and match them to the save timing table above
-2. Identify the last completed step based on which artifacts exist
-3. Resume from the next incomplete step
-4. Inform the user which steps are being skipped
+Read `steps/01_artifact-management.md` for directory structure, save timing, save principles, and resumability rules. Refer to it throughout the workflow.
 
 ## Progress Tracking
 
-At the start of execution, create a TodoWrite with all steps (1 through 6). Update status as each step completes.
+At the start of execution, create a TodoWrite with all steps (1 through 6 plus Final). Update status as each step completes.
 
 ## Workflow
 
-### Step 1: Integration Tests
+### Step 1: Blackbox Tests
 
-**Role**: Professional Quality Assurance Engineer
-**Goal**: Analyze input data completeness and produce detailed black-box integration test specifications
-**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
-
-#### Phase 1a: Input Data Completeness Analysis
-
-1. Read `_docs/01_solution/solution.md` (finalized in Prereq 2)
-2. Read `acceptance_criteria.md`, `restrictions.md`
-3. Read testing strategy from solution.md
-4. Analyze `input_data/` contents against:
-   - Coverage of acceptance criteria scenarios
-   - Coverage of restriction edge cases
-   - Coverage of testing strategy requirements
-5. Threshold: at least 70% coverage of the scenarios
-6. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/`
-7. Present coverage assessment to user
-
-**BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient.
-
-#### Phase 1b: Black-Box Test Scenario Specification
-
-Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
-
-1. Define test environment using `templates/integration-environment.md` as structure
-2. Define test data management using `templates/integration-test-data.md` as structure
-3. Write functional test scenarios (positive + negative) using `templates/integration-functional-tests.md` as structure
-4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `templates/integration-non-functional-tests.md` as structure
-5. Build traceability matrix using `templates/integration-traceability-matrix.md` as structure
-
-**Self-verification**:
-- [ ] Every acceptance criterion is covered by at least one test scenario
-- [ ] Every restriction is verified by at least one test scenario
-- [ ] Positive and negative scenarios are balanced
-- [ ] Consumer app has no direct access to system internals
-- [ ] Docker environment is self-contained (`docker compose up` sufficient)
-- [ ] External dependencies have mock/stub services defined
-- [ ] Traceability matrix has no uncovered AC or restrictions
-
-**Save action**: Write all files under `integration_tests/`:
-- `environment.md`
-- `test_data.md`
-- `functional_tests.md`
-- `non_functional_tests.md`
-- `traceability_matrix.md`
-
-**BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed.
+Read and execute `.cursor/skills/test-spec/SKILL.md`.
 
 Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3.
 
@@ -218,263 +69,37 @@ Capture any new questions, findings, or insights that arise during test specific
 
 ### Step 2: Solution Analysis
 
-**Role**: Professional software architect
-**Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
-**Constraints**: No code, no component-level detail yet; focus on system-level view
-
-#### Phase 2a: Architecture & Flows
-
-1. Read all input files thoroughly
-2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
-3. Research unknown or questionable topics via internet; ask user about ambiguities
-4. Document architecture using `templates/architecture.md` as structure
-5. Document system flows using `templates/system-flows.md` as structure
-
-**Self-verification**:
-- [ ] Architecture covers all capabilities mentioned in solution.md
-- [ ] System flows cover all main user/system interactions
-- [ ] No contradictions with problem.md or restrictions.md
-- [ ] Technology choices are justified
-- [ ] Integration test findings are reflected in architecture decisions
-
-**Save action**: Write `architecture.md` and `system-flows.md`
-
-**BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
-
-#### Phase 2b: Data Model
-
-**Role**: Professional software architect
-**Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
-
-1. Extract core entities from architecture.md and solution.md
-2. Define entity attributes, types, and constraints
-3. Define relationships between entities (Mermaid ERD)
-4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
-5. Define seed data requirements per environment (dev, staging)
-6. Define backward compatibility approach for schema changes (additive-only by default)
-
-**Self-verification**:
-- [ ] Every entity mentioned in architecture.md is defined
-- [ ] Relationships are explicit with cardinality
-- [ ] Migration strategy specifies reversibility requirement
-- [ ] Seed data requirements defined
-- [ ] Backward compatibility approach documented
-
-**Save action**: Write `data_model.md`
-
-#### Phase 2c: Deployment Planning
-
-**Role**: DevOps / Platform engineer
-**Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
-
-Use the `/deploy` skill's templates as structure for each artifact:
-
-1. Read architecture.md and restrictions.md for infrastructure constraints
-2. Research Docker best practices for the project's tech stack
-3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
-4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
-5. Define environment strategy: dev, staging, production with secrets management
-6. Define observability: structured logging, metrics, tracing, alerting
-7. Define deployment procedures: strategy, health checks, rollback, checklist
-
-**Self-verification**:
-- [ ] Every component has a Docker specification
-- [ ] CI/CD pipeline covers lint, test, security, build, deploy
-- [ ] Environment strategy covers dev, staging, production
-- [ ] Observability covers logging, metrics, tracing, alerting
-- [ ] Deployment procedures include rollback and health checks
-
-**Save action**: Write all 5 files under `deployment/`:
-- `containerization.md`
-- `ci_cd_pipeline.md`
-- `environment_strategy.md`
-- `observability.md`
-- `deployment_procedures.md`
+Read and follow `steps/02_solution-analysis.md`.
 
 ---
 
 ### Step 3: Component Decomposition
 
-**Role**: Professional software architect
-**Goal**: Decompose the architecture into components with detailed specs
-**Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
-
-1. Identify components from the architecture; think about separation, reusability, and communication patterns
-2. Use integration test scenarios from Step 1 to validate component boundaries
-3. If additional components are needed (data preparation, shared helpers), create them
-4. For each component, write a spec using `templates/component-spec.md` as structure
-5. Generate diagrams:
-   - draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
-   - Mermaid flowchart per main control flow
-6. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified.
-
-**Self-verification**:
-- [ ] Each component has a single, clear responsibility
-- [ ] No functionality is spread across multiple components
-- [ ] All inter-component interfaces are defined (who calls whom, with what)
-- [ ] Component dependency graph has no circular dependencies
-- [ ] All components from architecture.md are accounted for
-- [ ] Every integration test scenario can be traced through component interactions
-
-**Save action**: Write:
- - each component `components/[##]_[name]/description.md`
- - common helper `common-helpers/[##]_helper_[name].md`
- - diagrams `diagrams/`
-
-**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
+Read and follow `steps/03_component-decomposition.md`.
 
 ---
 
 ### Step 4: Architecture Review & Risk Assessment
 
-**Role**: Professional software architect and analyst
-**Goal**: Validate all artifacts for consistency, then identify and mitigate risks
-**Constraints**: This is a review step — fix problems found, do not add new features
-
-#### 4a. Evaluator Pass (re-read ALL artifacts)
-
-Review checklist:
-- [ ] All components follow Single Responsibility Principle
-- [ ] All components follow dumb code / smart data principle
-- [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
-- [ ] No circular dependencies in the dependency graph
-- [ ] No missing interactions between components
-- [ ] No over-engineering — is there a simpler decomposition?
-- [ ] Security considerations addressed in component design
-- [ ] Performance bottlenecks identified
-- [ ] API contracts are consistent across components
-
-Fix any issues found before proceeding to risk identification.
-
-#### 4b. Risk Identification
-
-1. Identify technical and project risks
-2. Assess probability and impact using `templates/risk-register.md`
-3. Define mitigation strategies
-4. Apply mitigations to architecture, flows, and component documents where applicable
-
-**Self-verification**:
-- [ ] Every High/Critical risk has a concrete mitigation strategy
-- [ ] Mitigations are reflected in the relevant component or architecture docs
-- [ ] No new risks introduced by the mitigations themselves
-
-**Save action**: Write `risk_mitigations.md`
-
-**BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
-
-**Iterative**: If user requests another round, repeat Step 4 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
+Read and follow `steps/04_review-risk.md`.
 
 ---
 
 ### Step 5: Test Specifications
 
-**Role**: Professional Quality Assurance Engineer
-
-**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
-
-**Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
-
-1. For each component, write tests using `templates/test-spec.md` as structure
-2. Cover all 4 types: integration, performance, security, acceptance
-3. Include test data management (setup, teardown, isolation)
-4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
-
-**Self-verification**:
-- [ ] Every acceptance criterion has at least one test covering it
-- [ ] Test inputs are realistic and well-defined
-- [ ] Expected results are specific and measurable
-- [ ] No component is left without tests
-
-**Save action**: Write each `components/[##]_[name]/tests.md`
+Read and follow `steps/05_test-specifications.md`.
 
 ---
 
 ### Step 6: Jira Epics
 
-**Role**: Professional product manager
-
-**Goal**: Create Jira epics from components, ordered by dependency
-
-**Constraints**: Be concise — fewer words with the same meaning is better
-
-1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
-2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md`
-3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
-4. Include effort estimation per epic (T-shirt size or story points range)
-5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
-6. Generate updated draw.io diagram showing component-to-epic mapping
-
-**Self-verification**:
-- [ ] "Bootstrap & Initial Structure" epic exists and is first in order
-- [ ] "Integration Tests" epic exists
-- [ ] Every component maps to exactly one epic
-- [ ] Dependency order is respected (no epic depends on a later one)
-- [ ] Acceptance criteria are measurable
-- [ ] Effort estimates are realistic
-
-7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
-
-**Save action**: Epics created in Jira via MCP
+Read and follow `steps/06_jira-epics.md`.
 
 ---
 
-## Quality Checklist (before FINAL_report.md)
+### Final: Quality Checklist
 
-Before writing the final report, verify ALL of the following:
-
-### Integration Tests
-- [ ] Every acceptance criterion is covered in traceability_matrix.md
-- [ ] Every restriction is verified by at least one test
-- [ ] Positive and negative scenarios are balanced
-- [ ] Docker environment is self-contained
-- [ ] Consumer app treats main system as black box
-- [ ] CI/CD integration and reporting defined
-
-### Architecture
-- [ ] Covers all capabilities from solution.md
-- [ ] Technology choices are justified
-- [ ] Deployment model is defined
-- [ ] Integration test findings are reflected in architecture decisions
-
-### Data Model
-- [ ] Every entity from architecture.md is defined
-- [ ] Relationships have explicit cardinality
-- [ ] Migration strategy with reversibility requirement
-- [ ] Seed data requirements defined
-- [ ] Backward compatibility approach documented
-
-### Deployment
-- [ ] Containerization plan covers all components
-- [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
-- [ ] Environment strategy covers dev, staging, production
-- [ ] Observability covers logging, metrics, tracing, alerting
-- [ ] Deployment procedures include rollback and health checks
-
-### Components
-- [ ] Every component follows SRP
-- [ ] No circular dependencies
-- [ ] All inter-component interfaces are defined and consistent
-- [ ] No orphan components (unused by any flow)
-- [ ] Every integration test scenario can be traced through component interactions
-
-### Risks
-- [ ] All High/Critical risks have mitigations
-- [ ] Mitigations are reflected in component/architecture docs
-- [ ] User has confirmed risk assessment is sufficient
-
-### Tests
-- [ ] Every acceptance criterion is covered by at least one test
-- [ ] All 4 test types are represented per component (where applicable)
-- [ ] Test data management is defined
-
-### Epics
-- [ ] "Bootstrap & Initial Structure" epic exists
-- [ ] "Integration Tests" epic exists
-- [ ] Every component maps to an epic
-- [ ] Dependency order is correct
-- [ ] Acceptance criteria are measurable
-
-**Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
+Read and follow `steps/07_quality-checklist.md`.
 
 ## Common Mistakes
 
@@ -486,7 +111,7 @@ Before writing the final report, verify ALL of the following:
 - **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input
 - **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output
 - **Ignoring restrictions.md**: every constraint must be traceable in the architecture or risk register
-- **Ignoring integration test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)
+- **Ignoring blackbox test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)
 
 ## Escalation Rules
 
@@ -505,31 +130,26 @@ Before writing the final report, verify ALL of the following:
 
 ```
 ┌────────────────────────────────────────────────────────────────┐
-│               Solution Planning (6-Step Method)                │
+│              Solution Planning (6-Step + Final)                  │
 ├────────────────────────────────────────────────────────────────┤
-│ PREREQ 1: Data Gate (BLOCKING)                                 │
-│   → verify AC, restrictions, input_data exist — STOP if not    │
-│ PREREQ 2: Finalize solution draft                              │
-│   → rename highest solution_draft##.md to solution.md          │
-│ PREREQ 3: Workspace setup                                      │
-│   → create PLANS_DIR/ if needed                                │
+│ PREREQ: Data Gate (BLOCKING)                                    │
+│   → verify AC, restrictions, input_data, solution exist         │
 │                                                                │
-│ 1. Integration Tests  → integration_tests/ (5 files)           │
+│ 1. Blackbox Tests      → test-spec/SKILL.md                     │
 │    [BLOCKING: user confirms test coverage]                     │
-│ 2a. Architecture      → architecture.md, system-flows.md       │
+│ 2. Solution Analysis   → architecture, data model, deployment   │
 │    [BLOCKING: user confirms architecture]                      │
-│ 2b. Data Model        → data_model.md                          │
-│ 2c. Deployment        → deployment/ (5 files)                  │
-│ 3. Component Decompose → components/[##]_[name]/description    │
-│    [BLOCKING: user confirms decomposition]                     │
-│ 4. Review & Risk      → risk_mitigations.md                    │
-│    [BLOCKING: user confirms risks, iterative]                  │
-│ 5. Test Specifications → components/[##]_[name]/tests.md       │
-│ 6. Jira Epics         → Jira via MCP                           │
+│ 3. Component Decomp    → component specs + interfaces           │
+│    [BLOCKING: user confirms components]                        │
+│ 4. Review & Risk       → risk register, iterations              │
+│    [BLOCKING: user confirms mitigations]                       │
+│ 5. Test Specifications → per-component test specs               │
+│ 6. Jira Epics          → epic per component + bootstrap         │
 │    ─────────────────────────────────────────────────           │
-│    Quality Checklist → FINAL_report.md                         │
+│ Final: Quality Checklist → FINAL_report.md                      │
 ├────────────────────────────────────────────────────────────────┤
-│ Principles: SRP · Dumb code/smart data · Save immediately      │
-│             Ask don't assume · Plan don't code                 │
+│ Principles: Single Responsibility · Dumb code, smart data       │
+│             Save immediately · Ask don't assume                │
+│             Plan don't code                                    │
 └────────────────────────────────────────────────────────────────┘
 ```
diff --git a/.cursor/skills/plan/steps/00_prerequisites.md b/.cursor/skills/plan/steps/00_prerequisites.md
new file mode 100644
index 0000000..3eccbc8
--- /dev/null
+++ b/.cursor/skills/plan/steps/00_prerequisites.md
@@ -0,0 +1,27 @@
+## Prerequisite Checks (BLOCKING)
+
+Run sequentially before any planning step:
+
+### Prereq 1: Data Gate
+
+1. `_docs/00_problem/acceptance_criteria.md` exists and is non-empty — **STOP if missing**
+2. `_docs/00_problem/restrictions.md` exists and is non-empty — **STOP if missing**
+3. `_docs/00_problem/input_data/` exists and contains at least one data file — **STOP if missing**
+4. `_docs/00_problem/problem.md` exists and is non-empty — **STOP if missing**
+
+All four are mandatory. If any is missing or empty, STOP and ask the user to provide them. If the user cannot provide the required data, planning cannot proceed — just stop.
+
+### Prereq 2: Finalize Solution Draft
+
+Only runs after the Data Gate passes:
+
+1. Scan `_docs/01_solution/` for files matching `solution_draft*.md`
+2. Identify the highest-numbered draft (e.g. `solution_draft06.md`)
+3. **Rename** it to `_docs/01_solution/solution.md`
+4. If `solution.md` already exists, ask the user whether to overwrite or keep existing
+5. Verify `solution.md` is non-empty — **STOP if missing or empty**
+
+### Prereq 3: Workspace Setup
+
+1. Create DOCUMENT_DIR if it does not exist
+2. If DOCUMENT_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
diff --git a/.cursor/skills/plan/steps/01_artifact-management.md b/.cursor/skills/plan/steps/01_artifact-management.md
new file mode 100644
index 0000000..95af1d0
--- /dev/null
+++ b/.cursor/skills/plan/steps/01_artifact-management.md
@@ -0,0 +1,87 @@
+## Artifact Management
+
+### Directory Structure
+
+All artifacts are written directly under DOCUMENT_DIR:
+
+```
+DOCUMENT_DIR/
+├── tests/
+│   ├── environment.md
+│   ├── test-data.md
+│   ├── blackbox-tests.md
+│   ├── performance-tests.md
+│   ├── resilience-tests.md
+│   ├── security-tests.md
+│   ├── resource-limit-tests.md
+│   └── traceability-matrix.md
+├── architecture.md
+├── system-flows.md
+├── data_model.md
+├── deployment/
+│   ├── containerization.md
+│   ├── ci_cd_pipeline.md
+│   ├── environment_strategy.md
+│   ├── observability.md
+│   └── deployment_procedures.md
+├── risk_mitigations.md
+├── risk_mitigations_02.md          (iterative, ## as sequence)
+├── components/
+│   ├── 01_[name]/
+│   │   ├── description.md
+│   │   └── tests.md
+│   ├── 02_[name]/
+│   │   ├── description.md
+│   │   └── tests.md
+│   └── ...
+├── common-helpers/
+│   ├── 01_helper_[name]/
+│   ├── 02_helper_[name]/
+│   └── ...
+├── diagrams/
+│   ├── components.drawio
+│   └── flows/
+│       ├── flow_[name].md          (Mermaid)
+│       └── ...
+└── FINAL_report.md
+```
+
+### Save Timing
+
+| Step | Save immediately after | Filename |
+|------|------------------------|----------|
+| Step 1 | Blackbox test environment spec | `tests/environment.md` |
+| Step 1 | Blackbox test data spec | `tests/test-data.md` |
+| Step 1 | Blackbox tests | `tests/blackbox-tests.md` |
+| Step 1 | Blackbox performance tests | `tests/performance-tests.md` |
+| Step 1 | Blackbox resilience tests | `tests/resilience-tests.md` |
+| Step 1 | Blackbox security tests | `tests/security-tests.md` |
+| Step 1 | Blackbox resource limit tests | `tests/resource-limit-tests.md` |
+| Step 1 | Blackbox traceability matrix | `tests/traceability-matrix.md` |
+| Step 2 | Architecture analysis complete | `architecture.md` |
+| Step 2 | System flows documented | `system-flows.md` |
+| Step 2 | Data model documented | `data_model.md` |
+| Step 2 | Deployment plan complete | `deployment/` (5 files) |
+| Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
+| Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
+| Step 3 | Diagrams generated | `diagrams/` |
+| Step 4 | Risk assessment complete | `risk_mitigations.md` |
+| Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
+| Step 6 | Epics created in Jira | Jira via MCP |
+| Final | All steps complete | `FINAL_report.md` |
+
+### Save Principles
+
+1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
+2. **Incremental updates**: same file can be updated multiple times; append or replace
+3. **Preserve process**: keep all intermediate files even after integration into final report
+4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
+
+### Resumability
+
+If DOCUMENT_DIR already contains artifacts:
+
+1. List existing files and match them to the save timing table above
+2. Identify the last completed step based on which artifacts exist
+3. Resume from the next incomplete step
+4. Inform the user which steps are being skipped
diff --git a/.cursor/skills/plan/steps/02_solution-analysis.md b/.cursor/skills/plan/steps/02_solution-analysis.md
new file mode 100644
index 0000000..701f409
--- /dev/null
+++ b/.cursor/skills/plan/steps/02_solution-analysis.md
@@ -0,0 +1,74 @@
+## Step 2: Solution Analysis
+
+**Role**: Professional software architect
+**Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
+**Constraints**: No code, no component-level detail yet; focus on system-level view
+
+### Phase 2a: Architecture & Flows
+
+1. Read all input files thoroughly
+2. Incorporate findings, questions, and insights discovered during Step 1 (blackbox tests)
+3. Research unknown or questionable topics via internet; ask user about ambiguities
+4. Document architecture using `templates/architecture.md` as structure
+5. Document system flows using `templates/system-flows.md` as structure
+
+**Self-verification**:
+- [ ] Architecture covers all capabilities mentioned in solution.md
+- [ ] System flows cover all main user/system interactions
+- [ ] No contradictions with problem.md or restrictions.md
+- [ ] Technology choices are justified
+- [ ] Blackbox test findings are reflected in architecture decisions
+
+**Save action**: Write `architecture.md` and `system-flows.md`
+
+**BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
+
+### Phase 2b: Data Model
+
+**Role**: Professional software architect
+**Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
+
+1. Extract core entities from architecture.md and solution.md
+2. Define entity attributes, types, and constraints
+3. Define relationships between entities (Mermaid ERD)
+4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
+5. Define seed data requirements per environment (dev, staging)
+6. Define backward compatibility approach for schema changes (additive-only by default)
+
+**Self-verification**:
+- [ ] Every entity mentioned in architecture.md is defined
+- [ ] Relationships are explicit with cardinality
+- [ ] Migration strategy specifies reversibility requirement
+- [ ] Seed data requirements defined
+- [ ] Backward compatibility approach documented
+
+**Save action**: Write `data_model.md`
+
+### Phase 2c: Deployment Planning
+
+**Role**: DevOps / Platform engineer
+**Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
+
+Use the `/deploy` skill's templates as structure for each artifact:
+
+1. Read architecture.md and restrictions.md for infrastructure constraints
+2. Research Docker best practices for the project's tech stack
+3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
+4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
+5. Define environment strategy: dev, staging, production with secrets management
+6. Define observability: structured logging, metrics, tracing, alerting
+7. Define deployment procedures: strategy, health checks, rollback, checklist
+
+**Self-verification**:
+- [ ] Every component has a Docker specification
+- [ ] CI/CD pipeline covers lint, test, security, build, deploy
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Observability covers logging, metrics, tracing, alerting
+- [ ] Deployment procedures include rollback and health checks
+
+**Save action**: Write all 5 files under `deployment/`:
+- `containerization.md`
+- `ci_cd_pipeline.md`
+- `environment_strategy.md`
+- `observability.md`
+- `deployment_procedures.md`
diff --git a/.cursor/skills/plan/steps/03_component-decomposition.md b/.cursor/skills/plan/steps/03_component-decomposition.md
new file mode 100644
index 0000000..c026e65
--- /dev/null
+++ b/.cursor/skills/plan/steps/03_component-decomposition.md
@@ -0,0 +1,29 @@
+## Step 3: Component Decomposition
+
+**Role**: Professional software architect
+**Goal**: Decompose the architecture into components with detailed specs
+**Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
+
+1. Identify components from the architecture; think about separation, reusability, and communication patterns
+2. Use blackbox test scenarios from Step 1 to validate component boundaries
+3. If additional components are needed (data preparation, shared helpers), create them
+4. For each component, write a spec using `templates/component-spec.md` as structure
+5. Generate diagrams:
+   - draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
+   - Mermaid flowchart per main control flow
+6. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified.
+
+**Self-verification**:
+- [ ] Each component has a single, clear responsibility
+- [ ] No functionality is spread across multiple components
+- [ ] All inter-component interfaces are defined (who calls whom, with what)
+- [ ] Component dependency graph has no circular dependencies
+- [ ] All components from architecture.md are accounted for
+- [ ] Every blackbox test scenario can be traced through component interactions
+
+**Save action**: Write:
+ - each component `components/[##]_[name]/description.md`
+ - common helper `common-helpers/[##]_helper_[name].md`
+ - diagrams `diagrams/`
+
+**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
diff --git a/.cursor/skills/plan/steps/04_review-risk.md b/.cursor/skills/plan/steps/04_review-risk.md
new file mode 100644
index 0000000..747b7cf
--- /dev/null
+++ b/.cursor/skills/plan/steps/04_review-risk.md
@@ -0,0 +1,38 @@
+## Step 4: Architecture Review & Risk Assessment
+
+**Role**: Professional software architect and analyst
+**Goal**: Validate all artifacts for consistency, then identify and mitigate risks
+**Constraints**: This is a review step — fix problems found, do not add new features
+
+### 4a. Evaluator Pass (re-read ALL artifacts)
+
+Review checklist:
+- [ ] All components follow Single Responsibility Principle
+- [ ] All components follow dumb code / smart data principle
+- [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
+- [ ] No circular dependencies in the dependency graph
+- [ ] No missing interactions between components
+- [ ] No over-engineering — is there a simpler decomposition?
+- [ ] Security considerations addressed in component design
+- [ ] Performance bottlenecks identified
+- [ ] API contracts are consistent across components
+
+Fix any issues found before proceeding to risk identification.
+
+### 4b. Risk Identification
+
+1. Identify technical and project risks
+2. Assess probability and impact using `templates/risk-register.md`
+3. Define mitigation strategies
+4. Apply mitigations to architecture, flows, and component documents where applicable
+
+**Self-verification**:
+- [ ] Every High/Critical risk has a concrete mitigation strategy
+- [ ] Mitigations are reflected in the relevant component or architecture docs
+- [ ] No new risks introduced by the mitigations themselves
+
+**Save action**: Write `risk_mitigations.md`
+
+**BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
+
+**Iterative**: If user requests another round, repeat Step 4 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
diff --git a/.cursor/skills/plan/steps/05_test-specifications.md b/.cursor/skills/plan/steps/05_test-specifications.md
new file mode 100644
index 0000000..9657359
--- /dev/null
+++ b/.cursor/skills/plan/steps/05_test-specifications.md
@@ -0,0 +1,20 @@
+## Step 5: Test Specifications
+
+**Role**: Professional Quality Assurance Engineer
+
+**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
+
+**Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
+
+1. For each component, write tests using `templates/test-spec.md` as structure
+2. Cover all 4 types: integration, performance, security, acceptance
+3. Include test data management (setup, teardown, isolation)
+4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
+
+**Self-verification**:
+- [ ] Every acceptance criterion has at least one test covering it
+- [ ] Test inputs are realistic and well-defined
+- [ ] Expected results are specific and measurable
+- [ ] No component is left without tests
+
+**Save action**: Write each `components/[##]_[name]/tests.md`
diff --git a/.cursor/skills/plan/steps/06_jira-epics.md b/.cursor/skills/plan/steps/06_jira-epics.md
new file mode 100644
index 0000000..e93d95e
--- /dev/null
+++ b/.cursor/skills/plan/steps/06_jira-epics.md
@@ -0,0 +1,48 @@
+## Step 6: Work Item Epics
+
+**Role**: Professional product manager
+
+**Goal**: Create epics from components, ordered by dependency
+
+**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the epic should understand the full context without needing to open separate files.
+
+1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
+2. Generate epics for each component using the configured work item tracker (Jira MCP or Azure DevOps MCP — see `autopilot/protocols.md`), structured per `templates/epic-spec.md`
+3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
+4. Include effort estimation per epic (T-shirt size or story points range)
+5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
+6. Generate Mermaid diagrams showing component-to-epic mapping and component relationships
+
+**CRITICAL — Epic description richness requirements**:
+
+Each epic description MUST include ALL of the following sections with substantial content:
+- **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections)
+- **Problem / Context**: what problem this component solves, why it exists, current pain points
+- **Scope**: detailed in-scope and out-of-scope lists
+- **Architecture notes**: relevant ADRs, technology choices, patterns used, key design decisions
+- **Interface specification**: full method signatures, input/output types, error types (from component description.md)
+- **Data flow**: how data enters and exits this component (include Mermaid sequence or flowchart diagram)
+- **Dependencies**: epic dependencies (with Jira IDs) and external dependencies (libraries, hardware, services)
+- **Acceptance criteria**: measurable criteria with specific thresholds (from component tests.md)
+- **Non-functional requirements**: latency, memory, throughput targets with failure thresholds
+- **Risks & mitigations**: relevant risks from risk_mitigations.md with concrete mitigation strategies
+- **Effort estimation**: T-shirt size and story points range
+- **Child issues**: planned task breakdown with complexity points
+- **Key constraints**: from restrictions.md that affect this component
+- **Testing strategy**: summary of test types and coverage from tests.md
+
+Do NOT create minimal epics with just a summary and short description. The epic is the primary reference document for the implementation team.
+
+**Self-verification**:
+- [ ] "Bootstrap & Initial Structure" epic exists and is first in order
+- [ ] "Blackbox Tests" epic exists
+- [ ] Every component maps to exactly one epic
+- [ ] Dependency order is respected (no epic depends on a later one)
+- [ ] Acceptance criteria are measurable
+- [ ] Effort estimates are realistic
+- [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
+- [ ] Epic descriptions are self-contained — readable without opening other files
+
+7. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`.
+
+**Save action**: Epics created via the configured tracker MCP. Also saved locally in `epics.md` with ticket IDs. If `tracker: local`, save locally only.
diff --git a/.cursor/skills/plan/steps/07_quality-checklist.md b/.cursor/skills/plan/steps/07_quality-checklist.md
new file mode 100644
index 0000000..f883e88
--- /dev/null
+++ b/.cursor/skills/plan/steps/07_quality-checklist.md
@@ -0,0 +1,57 @@
+## Quality Checklist (before FINAL_report.md)
+
+Before writing the final report, verify ALL of the following:
+
+### Blackbox Tests
+- [ ] Every acceptance criterion is covered in traceability-matrix.md
+- [ ] Every restriction is verified by at least one test
+- [ ] Positive and negative scenarios are balanced
+- [ ] Docker environment is self-contained
+- [ ] Consumer app treats main system as black box
+- [ ] CI/CD integration and reporting defined
+
+### Architecture
+- [ ] Covers all capabilities from solution.md
+- [ ] Technology choices are justified
+- [ ] Deployment model is defined
+- [ ] Blackbox test findings are reflected in architecture decisions
+
+### Data Model
+- [ ] Every entity from architecture.md is defined
+- [ ] Relationships have explicit cardinality
+- [ ] Migration strategy with reversibility requirement
+- [ ] Seed data requirements defined
+- [ ] Backward compatibility approach documented
+
+### Deployment
+- [ ] Containerization plan covers all components
+- [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Observability covers logging, metrics, tracing, alerting
+- [ ] Deployment procedures include rollback and health checks
+
+### Components
+- [ ] Every component follows SRP
+- [ ] No circular dependencies
+- [ ] All inter-component interfaces are defined and consistent
+- [ ] No orphan components (unused by any flow)
+- [ ] Every blackbox test scenario can be traced through component interactions
+
+### Risks
+- [ ] All High/Critical risks have mitigations
+- [ ] Mitigations are reflected in component/architecture docs
+- [ ] User has confirmed risk assessment is sufficient
+
+### Tests
+- [ ] Every acceptance criterion is covered by at least one test
+- [ ] All 4 test types are represented per component (where applicable)
+- [ ] Test data management is defined
+
+### Epics
+- [ ] "Bootstrap & Initial Structure" epic exists
+- [ ] "Blackbox Tests" epic exists
+- [ ] Every component maps to an epic
+- [ ] Dependency order is correct
+- [ ] Acceptance criteria are measurable
+
+**Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
diff --git a/.cursor/skills/plan/templates/architecture.md b/.cursor/skills/plan/templates/architecture.md
index 0884500..1d381cc 100644
--- a/.cursor/skills/plan/templates/architecture.md
+++ b/.cursor/skills/plan/templates/architecture.md
@@ -1,6 +1,6 @@
 # Architecture Document Template
 
-Use this template for the architecture document. Save as `_docs/02_plans/architecture.md`.
+Use this template for the architecture document. Save as `_docs/02_document/architecture.md`.
 
 ---
 
diff --git a/.cursor/skills/plan/templates/integration-functional-tests.md b/.cursor/skills/plan/templates/blackbox-tests.md
similarity index 83%
rename from .cursor/skills/plan/templates/integration-functional-tests.md
rename to .cursor/skills/plan/templates/blackbox-tests.md
index 9bb3eff..d522698 100644
--- a/.cursor/skills/plan/templates/integration-functional-tests.md
+++ b/.cursor/skills/plan/templates/blackbox-tests.md
@@ -1,24 +1,24 @@
-# E2E Functional Tests Template
+# Blackbox Tests Template
 
-Save as `PLANS_DIR/integration_tests/functional_tests.md`.
+Save as `DOCUMENT_DIR/tests/blackbox-tests.md`.
 
 ---
 
 ```markdown
-# E2E Functional Tests
+# Blackbox Tests
 
 ## Positive Scenarios
 
 ### FT-P-01: [Scenario Name]
 
-**Summary**: [One sentence: what end-to-end use case this validates]
+**Summary**: [One sentence: what black-box use case this validates]
 **Traces to**: AC-[ID], AC-[ID]
 **Category**: [which AC category — e.g., Position Accuracy, Image Processing, etc.]
 
 **Preconditions**:
 - [System state required before test]
 
-**Input data**: [reference to specific data set or file from test_data.md]
+**Input data**: [reference to specific data set or file from test-data.md]
 
 **Steps**:
 
@@ -71,8 +71,8 @@ Save as `PLANS_DIR/integration_tests/functional_tests.md`.
 
 ## Guidance Notes
 
-- Functional tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
+- Blackbox tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
 - Positive scenarios validate the system does what it should.
 - Negative scenarios validate the system rejects or handles gracefully what it shouldn't accept.
 - Expected outcomes must be specific and measurable — not "works correctly" but "returns position within 50m of ground truth."
-- Input data references should point to specific entries in test_data.md.
+- Input data references should point to specific entries in test-data.md.
diff --git a/.cursor/skills/plan/templates/epic-spec.md b/.cursor/skills/plan/templates/epic-spec.md
index f8ebcfc..6cb60e6 100644
--- a/.cursor/skills/plan/templates/epic-spec.md
+++ b/.cursor/skills/plan/templates/epic-spec.md
@@ -1,6 +1,6 @@
-# Jira Epic Template
+# Epic Template
 
-Use this template for each Jira epic. Create epics via Jira MCP.
+Use this template for each epic. Create epics via the configured work item tracker (Jira MCP or Azure DevOps MCP).
 
 ---
 
@@ -73,14 +73,14 @@ Link to architecture.md and relevant component spec.]
 
 ### Design & Architecture
 
-- Architecture doc: `_docs/02_plans/architecture.md`
-- Component spec: `_docs/02_plans/components/[##]_[name]/description.md`
-- System flows: `_docs/02_plans/system-flows.md`
+- Architecture doc: `_docs/02_document/architecture.md`
+- Component spec: `_docs/02_document/components/[##]_[name]/description.md`
+- System flows: `_docs/02_document/system-flows.md`
 
 ### Definition of Done
 
 - [ ] All in-scope capabilities implemented
-- [ ] Automated tests pass (unit + integration + e2e)
+- [ ] Automated tests pass (unit + blackbox)
 - [ ] Minimum coverage threshold met (75%)
 - [ ] Runbooks written (if applicable)
 - [ ] Documentation updated
diff --git a/.cursor/skills/plan/templates/final-report.md b/.cursor/skills/plan/templates/final-report.md
index db0828b..0e27016 100644
--- a/.cursor/skills/plan/templates/final-report.md
+++ b/.cursor/skills/plan/templates/final-report.md
@@ -1,6 +1,6 @@
 # Final Planning Report Template
 
-Use this template after completing all 5 steps and the quality checklist. Save as `_docs/02_plans/FINAL_report.md`.
+Use this template after completing all 6 steps and the quality checklist. Save as `_docs/02_document/FINAL_report.md`.
 
 ---
 
diff --git a/.cursor/skills/plan/templates/integration-non-functional-tests.md b/.cursor/skills/plan/templates/integration-non-functional-tests.md
deleted file mode 100644
index d1b5f3a..0000000
--- a/.cursor/skills/plan/templates/integration-non-functional-tests.md
+++ /dev/null
@@ -1,97 +0,0 @@
-# E2E Non-Functional Tests Template
-
-Save as `PLANS_DIR/integration_tests/non_functional_tests.md`.
-
----
-
-```markdown
-# E2E Non-Functional Tests
-
-## Performance Tests
-
-### NFT-PERF-01: [Test Name]
-
-**Summary**: [What performance characteristic this validates]
-**Traces to**: AC-[ID]
-**Metric**: [what is measured — latency, throughput, frame rate, etc.]
-
-**Preconditions**:
-- [System state, load profile, data volume]
-
-**Steps**:
-
-| Step | Consumer Action | Measurement |
-|------|----------------|-------------|
-| 1 | [action] | [what to measure and how] |
-
-**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
-**Duration**: [how long the test runs]
-
----
-
-## Resilience Tests
-
-### NFT-RES-01: [Test Name]
-
-**Summary**: [What failure/recovery scenario this validates]
-**Traces to**: AC-[ID]
-
-**Preconditions**:
-- [System state before fault injection]
-
-**Fault injection**:
-- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
-
-**Steps**:
-
-| Step | Action | Expected Behavior |
-|------|--------|------------------|
-| 1 | [inject fault] | [system behavior during fault] |
-| 2 | [observe recovery] | [system behavior after recovery] |
-
-**Pass criteria**: [recovery time, data integrity, continued operation]
-
----
-
-## Security Tests
-
-### NFT-SEC-01: [Test Name]
-
-**Summary**: [What security property this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Steps**:
-
-| Step | Consumer Action | Expected Response |
-|------|----------------|------------------|
-| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
-
-**Pass criteria**: [specific security outcome]
-
----
-
-## Resource Limit Tests
-
-### NFT-RES-LIM-01: [Test Name]
-
-**Summary**: [What resource constraint this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Preconditions**:
-- [System running under specified constraints]
-
-**Monitoring**:
-- [What resources to monitor — memory, CPU, GPU, disk, temperature]
-
-**Duration**: [how long to run]
-**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
-```
-
----
-
-## Guidance Notes
-
-- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
-- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
-- Security tests at E2E level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
-- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
diff --git a/.cursor/skills/plan/templates/performance-tests.md b/.cursor/skills/plan/templates/performance-tests.md
new file mode 100644
index 0000000..dfbcd14
--- /dev/null
+++ b/.cursor/skills/plan/templates/performance-tests.md
@@ -0,0 +1,35 @@
+# Performance Tests Template
+
+Save as `DOCUMENT_DIR/tests/performance-tests.md`.
+
+---
+
+```markdown
+# Performance Tests
+
+### NFT-PERF-01: [Test Name]
+
+**Summary**: [What performance characteristic this validates]
+**Traces to**: AC-[ID]
+**Metric**: [what is measured — latency, throughput, frame rate, etc.]
+
+**Preconditions**:
+- [System state, load profile, data volume]
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | [action] | [what to measure and how] |
+
+**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
+**Duration**: [how long the test runs]
+```
+
+---
+
+## Guidance Notes
+
+- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
+- Define clear pass/fail thresholds with specific metrics (p50, p95, p99 latency, throughput, etc.).
+- Include warm-up preconditions to separate initialization cost from steady-state performance.
diff --git a/.cursor/skills/plan/templates/resilience-tests.md b/.cursor/skills/plan/templates/resilience-tests.md
new file mode 100644
index 0000000..72890ae
--- /dev/null
+++ b/.cursor/skills/plan/templates/resilience-tests.md
@@ -0,0 +1,37 @@
+# Resilience Tests Template
+
+Save as `DOCUMENT_DIR/tests/resilience-tests.md`.
+
+---
+
+```markdown
+# Resilience Tests
+
+### NFT-RES-01: [Test Name]
+
+**Summary**: [What failure/recovery scenario this validates]
+**Traces to**: AC-[ID]
+
+**Preconditions**:
+- [System state before fault injection]
+
+**Fault injection**:
+- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | [inject fault] | [system behavior during fault] |
+| 2 | [observe recovery] | [system behavior after recovery] |
+
+**Pass criteria**: [recovery time, data integrity, continued operation]
+```
+
+---
+
+## Guidance Notes
+
+- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
+- Include specific recovery time expectations and data integrity checks.
+- Test both graceful degradation (partial failure) and full recovery scenarios.
diff --git a/.cursor/skills/plan/templates/resource-limit-tests.md b/.cursor/skills/plan/templates/resource-limit-tests.md
new file mode 100644
index 0000000..53779e3
--- /dev/null
+++ b/.cursor/skills/plan/templates/resource-limit-tests.md
@@ -0,0 +1,31 @@
+# Resource Limit Tests Template
+
+Save as `DOCUMENT_DIR/tests/resource-limit-tests.md`.
+
+---
+
+```markdown
+# Resource Limit Tests
+
+### NFT-RES-LIM-01: [Test Name]
+
+**Summary**: [What resource constraint this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Preconditions**:
+- [System running under specified constraints]
+
+**Monitoring**:
+- [What resources to monitor — memory, CPU, GPU, disk, temperature]
+
+**Duration**: [how long to run]
+**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
+```
+
+---
+
+## Guidance Notes
+
+- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
+- Define specific numeric limits that can be programmatically checked.
+- Include both the monitoring method and the threshold in the pass criteria.
diff --git a/.cursor/skills/plan/templates/risk-register.md b/.cursor/skills/plan/templates/risk-register.md
index 0983d7f..786aec9 100644
--- a/.cursor/skills/plan/templates/risk-register.md
+++ b/.cursor/skills/plan/templates/risk-register.md
@@ -1,6 +1,6 @@
 # Risk Register Template
 
-Use this template for risk assessment. Save as `_docs/02_plans/risk_mitigations.md`.
+Use this template for risk assessment. Save as `_docs/02_document/risk_mitigations.md`.
 Subsequent iterations: `risk_mitigations_02.md`, `risk_mitigations_03.md`, etc.
 
 ---
diff --git a/.cursor/skills/plan/templates/security-tests.md b/.cursor/skills/plan/templates/security-tests.md
new file mode 100644
index 0000000..b243404
--- /dev/null
+++ b/.cursor/skills/plan/templates/security-tests.md
@@ -0,0 +1,30 @@
+# Security Tests Template
+
+Save as `DOCUMENT_DIR/tests/security-tests.md`.
+
+---
+
+```markdown
+# Security Tests
+
+### NFT-SEC-01: [Test Name]
+
+**Summary**: [What security property this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
+
+**Pass criteria**: [specific security outcome]
+```
+
+---
+
+## Guidance Notes
+
+- Security tests at blackbox level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
+- Verify the system remains operational after security-related edge cases (no crash, no hang).
+- Test authentication/authorization boundaries from the consumer's perspective.
diff --git a/.cursor/skills/plan/templates/system-flows.md b/.cursor/skills/plan/templates/system-flows.md
index 4d5656f..6c887a8 100644
--- a/.cursor/skills/plan/templates/system-flows.md
+++ b/.cursor/skills/plan/templates/system-flows.md
@@ -1,7 +1,7 @@
 # System Flows Template
 
-Use this template for the system flows document. Save as `_docs/02_plans/system-flows.md`.
-Individual flow diagrams go in `_docs/02_plans/diagrams/flows/flow_[name].md`.
+Use this template for the system flows document. Save as `_docs/02_document/system-flows.md`.
+Individual flow diagrams go in `_docs/02_document/diagrams/flows/flow_[name].md`.
 
 ---
 
diff --git a/.cursor/skills/plan/templates/integration-test-data.md b/.cursor/skills/plan/templates/test-data.md
similarity index 62%
rename from .cursor/skills/plan/templates/integration-test-data.md
rename to .cursor/skills/plan/templates/test-data.md
index 041c963..0cee7fa 100644
--- a/.cursor/skills/plan/templates/integration-test-data.md
+++ b/.cursor/skills/plan/templates/test-data.md
@@ -1,11 +1,11 @@
-# E2E Test Data Template
+# Test Data Template
 
-Save as `PLANS_DIR/integration_tests/test_data.md`.
+Save as `DOCUMENT_DIR/tests/test-data.md`.
 
 ---
 
 ```markdown
-# E2E Test Data Management
+# Test Data Management
 
 ## Seed Data Sets
 
@@ -23,6 +23,12 @@ Save as `PLANS_DIR/integration_tests/test_data.md`.
 |-----------------|----------------|-------------|-----------------|
 | [filename] | `_docs/00_problem/input_data/[filename]` | [what it contains] | [test IDs that use this data] |
 
+## Expected Results Mapping
+
+| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
+|-----------------|------------|-----------------|-------------------|-----------|----------------------|
+| [test ID] | `input_data/[filename]` | [quantifiable expected output] | [exact / tolerance / pattern / threshold / file-diff] | [± value or N/A] | `input_data/expected_results/[filename]` or inline |
+
 ## External Dependency Mocks
 
 | External Service | Mock/Stub | How Provided | Behavior |
@@ -42,5 +48,8 @@ Save as `PLANS_DIR/integration_tests/test_data.md`.
 
 - Every seed data set should be traceable to specific test scenarios.
 - Input data from `_docs/00_problem/input_data/` should be mapped to test scenarios that use it.
+- Every input data item MUST have a corresponding expected result in the Expected Results Mapping table.
+- Expected results MUST be quantifiable: exact values, numeric tolerances, pattern matches, thresholds, or reference files. "Works correctly" is never acceptable.
+- For complex expected outputs, provide machine-readable reference files (JSON, CSV) in `_docs/00_problem/input_data/expected_results/` and reference them in the mapping.
 - External mocks must be deterministic — same input always produces same output.
 - Data isolation must guarantee no test can affect another test's outcome.
diff --git a/.cursor/skills/plan/templates/integration-environment.md b/.cursor/skills/plan/templates/test-environment.md
similarity index 92%
rename from .cursor/skills/plan/templates/integration-environment.md
rename to .cursor/skills/plan/templates/test-environment.md
index 6d8a0ac..b5d74fa 100644
--- a/.cursor/skills/plan/templates/integration-environment.md
+++ b/.cursor/skills/plan/templates/test-environment.md
@@ -1,16 +1,16 @@
-# E2E Test Environment Template
+# Test Environment Template
 
-Save as `PLANS_DIR/integration_tests/environment.md`.
+Save as `DOCUMENT_DIR/tests/environment.md`.
 
 ---
 
 ```markdown
-# E2E Test Environment
+# Test Environment
 
 ## Overview
 
 **System under test**: [main system name and entry points — API URLs, message queues, serial ports, etc.]
-**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals.
+**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating black-box use cases without access to internals.
 
 ## Docker Environment
 
diff --git a/.cursor/skills/plan/templates/test-spec.md b/.cursor/skills/plan/templates/test-spec.md
index 2b6ee44..5b7b83e 100644
--- a/.cursor/skills/plan/templates/test-spec.md
+++ b/.cursor/skills/plan/templates/test-spec.md
@@ -17,7 +17,7 @@ Use this template for each component's test spec. Save as `components/[##]_[name
 
 ---
 
-## Integration Tests
+## Blackbox Tests
 
 ### IT-01: [Test Name]
 
@@ -169,4 +169,4 @@ Use this template for each component's test spec. Save as `components/[##]_[name
 - If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2").
 - Performance test targets should come from the NFR section in `architecture.md`.
 - Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component.
-- Not every component needs all 4 test types. A stateless utility component may only need integration tests.
+- Not every component needs all 4 test types. A stateless utility component may only need blackbox tests.
diff --git a/.cursor/skills/plan/templates/integration-traceability-matrix.md b/.cursor/skills/plan/templates/traceability-matrix.md
similarity index 82%
rename from .cursor/skills/plan/templates/integration-traceability-matrix.md
rename to .cursor/skills/plan/templates/traceability-matrix.md
index 05ccafa..e0192ac 100644
--- a/.cursor/skills/plan/templates/integration-traceability-matrix.md
+++ b/.cursor/skills/plan/templates/traceability-matrix.md
@@ -1,11 +1,11 @@
-# E2E Traceability Matrix Template
+# Traceability Matrix Template
 
-Save as `PLANS_DIR/integration_tests/traceability_matrix.md`.
+Save as `DOCUMENT_DIR/tests/traceability-matrix.md`.
 
 ---
 
 ```markdown
-# E2E Traceability Matrix
+# Traceability Matrix
 
 ## Acceptance Criteria Coverage
 
@@ -34,7 +34,7 @@ Save as `PLANS_DIR/integration_tests/traceability_matrix.md`.
 
 | Item | Reason Not Covered | Risk | Mitigation |
 |------|-------------------|------|-----------|
-| [AC/Restriction ID] | [why it cannot be tested at E2E level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
+| [AC/Restriction ID] | [why it cannot be tested at blackbox level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
 ```
 
 ---
@@ -44,4 +44,4 @@ Save as `PLANS_DIR/integration_tests/traceability_matrix.md`.
 - Every acceptance criterion must appear in the matrix — either covered or explicitly marked as not covered with a reason.
 - Every restriction must appear in the matrix.
 - NOT COVERED items must have a reason and a mitigation strategy (e.g., "covered at component test level" or "requires real hardware").
-- Coverage percentage should be at least 75% for acceptance criteria at the E2E level.
+- Coverage percentage should be at least 75% for acceptance criteria at the blackbox test level.
diff --git a/.cursor/skills/problem/SKILL.md b/.cursor/skills/problem/SKILL.md
index 030a2a1..570fa1e 100644
--- a/.cursor/skills/problem/SKILL.md
+++ b/.cursor/skills/problem/SKILL.md
@@ -46,7 +46,7 @@ The interview is complete when the AI can write ALL of these:
 | `problem.md` | Clear problem statement: what is being built, why, for whom, what it does |
 | `restrictions.md` | All constraints identified: hardware, software, environment, operational, regulatory, budget, timeline |
 | `acceptance_criteria.md` | Measurable success criteria with specific numeric targets grouped by category |
-| `input_data/` | At least one reference data file or detailed data description document |
+| `input_data/` | At least one reference data file or detailed data description document. Must include `expected_results.md` with input→output pairs for downstream test specification |
 | `security_approach.md` | (optional) Security requirements identified, or explicitly marked as not applicable |
 
 ## Interview Protocol
@@ -187,6 +187,7 @@ At least one file. Options:
 - User provides actual data files (CSV, JSON, images, etc.) — save as-is
 - User describes data parameters — save as `data_parameters.md`
 - User provides URLs to data — save as `data_sources.md` with links and descriptions
+- `expected_results.md` — expected outputs for given inputs (required by downstream test-spec skill). During the Acceptance Criteria dimension, probe for concrete input→output pairs and save them here. Format: use the template from `.cursor/skills/test-spec/templates/expected-results.md`.
 
 ### security_approach.md (optional)
 
diff --git a/.cursor/skills/refactor/SKILL.md b/.cursor/skills/refactor/SKILL.md
index 7fe59b8..3acea10 100644
--- a/.cursor/skills/refactor/SKILL.md
+++ b/.cursor/skills/refactor/SKILL.md
@@ -34,8 +34,8 @@ Determine the operating mode based on invocation before any other logic runs.
 **Project mode** (no explicit input file provided):
 - PROBLEM_DIR: `_docs/00_problem/`
 - SOLUTION_DIR: `_docs/01_solution/`
-- COMPONENTS_DIR: `_docs/02_components/`
-- TESTS_DIR: `_docs/02_tests/`
+- COMPONENTS_DIR: `_docs/02_document/components/`
+- DOCUMENT_DIR: `_docs/02_document/`
 - REFACTOR_DIR: `_docs/04_refactoring/`
 - All existing guardrails apply.
 
@@ -155,7 +155,7 @@ Store in PROBLEM_DIR.
 
 | Metric Category | What to Capture |
 |----------------|-----------------|
-| **Coverage** | Overall, unit, integration, critical paths |
+| **Coverage** | Overall, unit, blackbox, critical paths |
 | **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
 | **Code Smells** | Total, critical, major |
 | **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
@@ -210,7 +210,7 @@ Write:
 
 Also copy to project standard locations if in project mode:
 - `SOLUTION_DIR/solution.md`
-- `COMPONENTS_DIR/system_flows.md`
+- `DOCUMENT_DIR/system_flows.md`
 
 **Self-verification**:
 - [ ] Every component in the codebase is documented
@@ -276,14 +276,14 @@ Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
 
 #### 3a. Design Test Specs
 
-Coverage requirements (must meet before refactoring):
+Coverage requirements (must meet before refactoring — see `.cursor/rules/cursor-meta.mdc` Quality Thresholds):
 - Minimum overall coverage: 75%
 - Critical path coverage: 90%
-- All public APIs must have integration tests
+- All public APIs must have blackbox tests
 - All error handling paths must be tested
 
 For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
-- Integration tests: summary, current behavior, input data, expected result, max expected time
+- Blackbox tests: summary, current behavior, input data, expected result, max expected time
 - Acceptance tests: summary, preconditions, steps with expected results
 - Coverage analysis: current %, target %, uncovered critical paths
 
@@ -297,7 +297,7 @@ For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_
 **Self-verification**:
 - [ ] Coverage requirements met (75% overall, 90% critical paths)
 - [ ] All tests pass on current codebase
-- [ ] All public APIs have integration tests
+- [ ] All public APIs have blackbox tests
 - [ ] Test data fixtures are configured
 
 **Save action**: Write test specs; implemented tests go into the project's test folder
@@ -332,7 +332,7 @@ Write `REFACTOR_DIR/coupling_analysis.md`:
 For each change in the decoupling strategy:
 
 1. Implement the change
-2. Run integration tests
+2. Run blackbox tests
 3. Fix any failures
 4. Commit with descriptive message
 
diff --git a/.cursor/skills/research/SKILL.md b/.cursor/skills/research/SKILL.md
index 1b4c159..85fd5d7 100644
--- a/.cursor/skills/research/SKILL.md
+++ b/.cursor/skills/research/SKILL.md
@@ -1,5 +1,5 @@
 ---
-name: deep-research
+name: research
 description: |
   Deep Research Methodology (8-Step Method) with two execution modes:
   - Mode A (Initial Research): Assess acceptance criteria, then research problem and produce solution draft
@@ -13,6 +13,7 @@ description: |
   - "comparative analysis", "concept comparison", "technical comparison"
 category: build
 tags: [research, analysis, solution-design, comparison, decision-support]
+disable-model-invocation: true
 ---
 
 # Deep Research (8-Step Method)
@@ -42,257 +43,51 @@ Determine the operating mode based on invocation before any other logic runs.
 
 **Standalone mode** (explicit input file provided, e.g. `/research @some_doc.md`):
 - INPUT_FILE: the provided file (treated as problem description)
-- OUTPUT_DIR: `_standalone/01_solution/`
-- RESEARCH_DIR: `_standalone/00_research/`
+- BASE_DIR: if specified by the caller, use it; otherwise default to `_standalone/`
+- OUTPUT_DIR: `BASE_DIR/01_solution/`
+- RESEARCH_DIR: `BASE_DIR/00_research/`
 - Guardrails relaxed: only INPUT_FILE must exist and be non-empty
 - `restrictions.md` and `acceptance_criteria.md` are optional — warn if absent, proceed if user confirms
 - Mode detection uses OUTPUT_DIR for `solution_draft*.md` scanning
 - Draft numbering works the same, scoped to OUTPUT_DIR
-- **Final step**: after all research is complete, move INPUT_FILE into `_standalone/`
+- **Final step**: after all research is complete, move INPUT_FILE into BASE_DIR
 
 Announce the detected mode and resolved paths to the user before proceeding.
 
 ## Project Integration
 
-### Prerequisite Guardrails (BLOCKING)
-
-Before any research begins, verify the input context exists. **Do not proceed if guardrails fail.**
-
-**Project mode:**
-1. Check INPUT_DIR exists — **STOP if missing**, ask user to create it and provide problem files
-2. Check `problem.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
-3. Check `restrictions.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
-4. Check `acceptance_criteria.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
-5. Check `input_data/` in INPUT_DIR exists and contains at least one file — **STOP if missing**
-6. Read **all** files in INPUT_DIR to ground the investigation in the project context
-7. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
-
-**Standalone mode:**
-1. Check INPUT_FILE exists and is non-empty — **STOP if missing**
-2. Warn if no `restrictions.md` or `acceptance_criteria.md` were provided alongside INPUT_FILE — proceed if user confirms
-3. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
-
-### Mode Detection
-
-After guardrails pass, determine the execution mode:
-
-1. Scan OUTPUT_DIR for files matching `solution_draft*.md`
-2. **No matches found** → **Mode A: Initial Research**
-3. **Matches found** → **Mode B: Solution Assessment** (use the highest-numbered draft as input)
-4. **User override**: if the user explicitly says "research from scratch" or "initial research", force Mode A regardless of existing drafts
-
-Inform the user which mode was detected and confirm before proceeding.
-
-### Solution Draft Numbering
-
-All final output is saved as `OUTPUT_DIR/solution_draft##.md` with a 2-digit zero-padded number:
-
-1. Scan existing files in OUTPUT_DIR matching `solution_draft*.md`
-2. Extract the highest existing number
-3. Increment by 1
-4. Zero-pad to 2 digits (e.g., `01`, `02`, ..., `10`, `11`)
-
-Example: if `solution_draft01.md` through `solution_draft10.md` exist, the next output is `solution_draft11.md`.
-
-### Working Directory & Intermediate Artifact Management
-
-#### Directory Structure
-
-At the start of research, **must** create a working directory under RESEARCH_DIR:
-
-```
-RESEARCH_DIR/
-├── 00_ac_assessment.md            # Mode A Phase 1 output: AC & restrictions assessment
-├── 00_question_decomposition.md   # Step 0-1 output
-├── 01_source_registry.md          # Step 2 output: all consulted source links
-├── 02_fact_cards.md               # Step 3 output: extracted facts
-├── 03_comparison_framework.md     # Step 4 output: selected framework and populated data
-├── 04_reasoning_chain.md          # Step 6 output: fact → conclusion reasoning
-├── 05_validation_log.md           # Step 7 output: use-case validation results
-└── raw/                           # Raw source archive (optional)
-    ├── source_1.md
-    └── source_2.md
-```
-
-### Save Timing & Content
-
-| Step | Save immediately after completion | Filename |
-|------|-----------------------------------|----------|
-| Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
-| Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
-| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
-| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
-| Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
-| Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
-| Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
-| Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
-
-### Save Principles
-
-1. **Save immediately**: Write to the corresponding file as soon as a step is completed; don't wait until the end
-2. **Incremental updates**: Same file can be updated multiple times; append or replace new content
-3. **Preserve process**: Keep intermediate files even after their content is integrated into the final report
-4. **Enable recovery**: If research is interrupted, progress can be recovered from intermediate files
+Read and follow `steps/00_project-integration.md` for prerequisite guardrails, mode detection, draft numbering, working directory setup, save timing, and output file inventory.
 
 ## Execution Flow
 
 ### Mode A: Initial Research
 
-Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the user explicitly requests initial research.
+Read and follow `steps/01_mode-a-initial-research.md`.
 
-#### Phase 1: AC & Restrictions Assessment (BLOCKING)
-
-**Role**: Professional software architect
-
-A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
-
-**Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
-
-**Task**:
-1. Read all problem context files thoroughly
-2. **ASK the user about every unclear aspect** — do not assume:
-   - Unclear problem boundaries → ask
-   - Ambiguous acceptance criteria values → ask
-   - Missing context (no `security_approach.md`, no `input_data/`) → ask what they have
-   - Conflicting restrictions → ask which takes priority
-3. Research in internet **extensively** — use multiple search queries per question, rephrase, and search from different angles:
-   - How realistic are the acceptance criteria for this specific domain? Search for industry benchmarks, standards, and typical values
-   - How critical is each criterion? Search for case studies where criteria were relaxed or tightened
-   - What domain-specific acceptance criteria are we missing? Search for industry standards, regulatory requirements, and best practices in the specific domain
-   - Impact of each criterion value on the whole system quality — search for research papers and engineering reports
-   - Cost/budget implications of each criterion — search for pricing, total cost of ownership analyses, and comparable project budgets
-   - Timeline implications — search for project timelines, development velocity reports, and comparable implementations
-   - What do practitioners in this domain consider the most important criteria? Search forums, conference talks, and experience reports
-4. Research restrictions from multiple perspectives:
-   - Are the restrictions realistic? Search for comparable projects that operated under similar constraints
-   - Should any be tightened or relaxed? Search for what constraints similar projects actually ended up with
-   - Are there additional restrictions we should add? Search for regulatory, compliance, and safety requirements in this domain
-   - What restrictions do practitioners wish they had defined earlier? Search for post-mortem reports and lessons learned
-5. Verify findings with authoritative sources (official docs, papers, benchmarks) — each key finding must have at least 2 independent sources
-
-**Uses Steps 0-3 of the 8-step engine** (question classification, decomposition, source tiering, fact extraction) scoped to AC and restrictions assessment.
-
-**📁 Save action**: Write `RESEARCH_DIR/00_ac_assessment.md` with format:
-
-```markdown
-# Acceptance Criteria Assessment
-
-## Acceptance Criteria
-
-| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
-|-----------|-----------|-------------------|---------------------|--------|
-| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
-
-## Restrictions Assessment
-
-| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
-|-------------|-----------|-------------------|---------------------|--------|
-| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
-
-## Key Findings
-[Summary of critical findings]
-
-## Sources
-[Key references used]
-```
-
-**BLOCKING**: Present the AC assessment tables to the user. Wait for confirmation or adjustments before proceeding to Phase 2. The user may update `acceptance_criteria.md` or `restrictions.md` based on findings.
-
----
-
-#### Phase 2: Problem Research & Solution Draft
-
-**Role**: Professional researcher and software architect
-
-Full 8-step research methodology. Produces the first solution draft.
-
-**Input**: All files from INPUT_DIR (possibly updated after Phase 1) + Phase 1 artifacts
-
-**Task** (drives the 8-step engine):
-1. Research existing/competitor solutions for similar problems — search broadly across industries and adjacent domains, not just the obvious competitors
-2. Research the problem thoroughly — all possible ways to solve it, split into components; search for how different fields approach analogous problems
-3. For each component, research all possible solutions and find the most efficient state-of-the-art approaches — use multiple query variants and perspectives from Step 1
-4. For each promising approach, search for real-world deployment experience: success stories, failure reports, lessons learned, and practitioner opinions
-5. Search for contrarian viewpoints — who argues against the common approaches and why? What failure modes exist?
-6. Verify that suggested tools/libraries actually exist and work as described — check official repos, latest releases, and community health (stars, recent commits, open issues)
-7. Include security considerations in each component analysis
-8. Provide rough cost estimates for proposed solutions
-
-Be concise in formulating. The fewer words, the better, but do not miss any important details.
-
-**📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
-
----
-
-#### Phase 3: Tech Stack Consolidation (OPTIONAL)
-
-**Role**: Software architect evaluating technology choices
-
-Focused synthesis step — no new 8-step cycle. Uses research already gathered in Phase 2 to make concrete technology decisions.
-
-**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + all files from INPUT_DIR
-
-**Task**:
-1. Extract technology options from the solution draft's component comparison tables
-2. Score each option against: fitness for purpose, maturity, security track record, team expertise, cost, scalability
-3. Produce a tech stack summary with selection rationale
-4. Assess risks and learning requirements per technology choice
-
-**📁 Save action**: Write `OUTPUT_DIR/tech_stack.md` with:
-- Requirements analysis (functional, non-functional, constraints)
-- Technology evaluation tables (language, framework, database, infrastructure, key libraries) with scores
-- Tech stack summary block
-- Risk assessment and learning requirements tables
-
----
-
-#### Phase 4: Security Deep Dive (OPTIONAL)
-
-**Role**: Security architect
-
-Focused analysis step — deepens the security column from the solution draft into a proper threat model and controls specification.
-
-**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + `security_approach.md` from INPUT_DIR + problem context
-
-**Task**:
-1. Build threat model: asset inventory, threat actors, attack vectors
-2. Define security requirements and proposed controls per component (with risk level)
-3. Summarize authentication/authorization, data protection, secure communication, and logging/monitoring approach
-
-**📁 Save action**: Write `OUTPUT_DIR/security_analysis.md` with:
-- Threat model (assets, actors, vectors)
-- Per-component security requirements and controls table
-- Security controls summary
+Phases: AC Assessment (BLOCKING) → Problem Research → Tech Stack (optional) → Security (optional).
 
 ---
 
 ### Mode B: Solution Assessment
 
-Triggered when `solution_draft*.md` files exist in OUTPUT_DIR.
+Read and follow `steps/02_mode-b-solution-assessment.md`.
 
-**Role**: Professional software architect
+---
 
-Full 8-step research methodology applied to assessing and improving an existing solution draft.
+## Research Engine (8-Step Method)
 
-**Input**: All files from INPUT_DIR + the latest (highest-numbered) `solution_draft##.md` from OUTPUT_DIR
+The 8-step method is the core research engine used by both modes. Steps 0-1 and Step 8 have mode-specific behavior; Steps 2-7 are identical regardless of mode.
 
-**Task** (drives the 8-step engine):
-1. Read the existing solution draft thoroughly
-2. Research in internet extensively — for each component/decision in the draft, search for:
-   - Known problems and limitations of the chosen approach
-   - What practitioners say about using it in production
-   - Better alternatives that may have emerged recently
-   - Common failure modes and edge cases
-   - How competitors/similar projects solve the same problem differently
-3. Search specifically for contrarian views: "why not [chosen approach]", "[chosen approach] criticism", "[chosen approach] failure"
-4. Identify security weak points and vulnerabilities — search for CVEs, security advisories, and known attack vectors for each technology in the draft
-5. Identify performance bottlenecks — search for benchmarks, load test results, and scalability reports
-6. For each identified weak point, search for multiple solution approaches and compare them
-7. Based on findings, form a new solution draft in the same format
+**Investigation phase** (Steps 0–3.5): Read and follow `steps/03_engine-investigation.md`.
+Covers: question classification, novelty sensitivity, question decomposition, perspective rotation, exhaustive web search, fact extraction, iterative deepening.
 
-**📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
+**Analysis phase** (Steps 4–8): Read and follow `steps/04_engine-analysis.md`.
+Covers: comparison framework, baseline alignment, reasoning chain, use-case validation, deliverable formatting.
 
-**Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions above.
+## Solution Draft Output Templates
+
+- Mode A: `templates/solution_draft_mode_a.md`
+- Mode B: `templates/solution_draft_mode_b.md`
 
 ## Escalation Rules
 
@@ -316,389 +111,12 @@ When the user wants to:
 - Gather information and evidence for a decision
 - Assess or improve an existing solution draft
 
-**Keywords**:
-- "deep research", "deep dive", "in-depth analysis"
-- "research this", "investigate", "look into"
-- "assess solution", "review draft", "improve solution"
-- "comparative analysis", "concept comparison", "technical comparison"
-
 **Differentiation from other Skills**:
 - Needs a **visual knowledge graph** → use `research-to-diagram`
 - Needs **written output** (articles/tutorials) → use `wsy-writer`
 - Needs **material organization** → use `material-to-markdown`
 - Needs **research + solution draft** → use this Skill
 
-## Research Engine (8-Step Method)
-
-The 8-step method is the core research engine used by both modes. Steps 0-1 and Step 8 have mode-specific behavior; Steps 2-7 are identical regardless of mode.
-
-### Step 0: Question Type Classification
-
-First, classify the research question type and select the corresponding strategy:
-
-| Question Type | Core Task | Focus Dimensions |
-|---------------|-----------|------------------|
-| **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
-| **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
-| **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
-| **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
-| **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |
-
-**Mode-specific classification**:
-
-| Mode / Phase | Typical Question Type |
-|--------------|----------------------|
-| Mode A Phase 1 | Knowledge Organization + Decision Support |
-| Mode A Phase 2 | Decision Support |
-| Mode B | Problem Diagnosis + Decision Support |
-
-### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)
-
-Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.
-
-**For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`
-
-Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.
-
-**📁 Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`
-
----
-
-### Step 1: Question Decomposition & Boundary Definition
-
-**Mode-specific sub-questions**:
-
-**Mode A Phase 2** (Initial Research — Problem & Solution):
-- "What existing/competitor solutions address this problem?"
-- "What are the component parts of this problem?"
-- "For each component, what are the state-of-the-art solutions?"
-- "What are the security considerations per component?"
-- "What are the cost implications of each approach?"
-
-**Mode B** (Solution Assessment):
-- "What are the weak points and potential problems in the existing draft?"
-- "What are the security vulnerabilities in the proposed architecture?"
-- "Where are the performance bottlenecks?"
-- "What solutions exist for each identified issue?"
-
-**General sub-question patterns** (use when applicable):
-- **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
-- **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
-- **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
-- **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)
-
-#### Perspective Rotation (MANDATORY)
-
-For each research problem, examine it from **at least 3 different perspectives**. Each perspective generates its own sub-questions and search queries.
-
-| Perspective | What it asks | Example queries |
-|-------------|-------------|-----------------|
-| **End-user / Consumer** | What problems do real users encounter? What do they wish were different? | "X problems", "X frustrations reddit", "X user complaints" |
-| **Implementer / Engineer** | What are the technical challenges, gotchas, hidden complexities? | "X implementation challenges", "X pitfalls", "X lessons learned" |
-| **Business / Decision-maker** | What are the costs, ROI, strategic implications? | "X total cost of ownership", "X ROI case study", "X vs Y business comparison" |
-| **Contrarian / Devil's advocate** | What could go wrong? Why might this fail? What are critics saying? | "X criticism", "why not X", "X failures", "X disadvantages real world" |
-| **Domain expert / Academic** | What does peer-reviewed research say? What are theoretical limits? | "X research paper", "X systematic review", "X benchmarks academic" |
-| **Practitioner / Field** | What do people who actually use this daily say? What works in practice vs theory? | "X in production", "X experience report", "X after 1 year" |
-
-Select at least 3 perspectives relevant to the problem. Document the chosen perspectives in `00_question_decomposition.md`.
-
-#### Question Explosion (MANDATORY)
-
-For **each sub-question**, generate **at least 3-5 search query variants** before searching. This ensures broad coverage and avoids missing relevant information due to terminology differences.
-
-**Query variant strategies**:
-- **Specificity ladder**: broad ("indoor navigation systems") → narrow ("UWB-based indoor drone navigation accuracy")
-- **Negation/failure**: "X limitations", "X failure modes", "when X doesn't work"
-- **Comparison framing**: "X vs Y for Z", "X alternative for Z", "X or Y which is better for Z"
-- **Practitioner voice**: "X in production experience", "X real-world results", "X lessons learned"
-- **Temporal**: "X 2025", "X latest developments", "X roadmap"
-- **Geographic/domain**: "X in Europe", "X for defense applications", "X in agriculture"
-
-Record all planned queries in `00_question_decomposition.md` alongside each sub-question.
-
-**⚠️ Research Subject Boundary Definition (BLOCKING - must be explicit)**:
-
-When decomposing questions, you must explicitly define the **boundaries of the research subject**:
-
-| Dimension | Boundary to define | Example |
-|-----------|--------------------|---------|
-| **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
-| **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
-| **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
-| **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
-
-**Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.
-
-**📁 Save action**:
-1. Read all files from INPUT_DIR to ground the research in the project context
-2. Create working directory `RESEARCH_DIR/`
-3. Write `00_question_decomposition.md`, including:
-   - Original question
-   - Active mode (A Phase 2 or B) and rationale
-   - Summary of relevant problem context from INPUT_DIR
-   - Classified question type and rationale
-   - **Research subject boundary definition** (population, geography, timeframe, level)
-   - List of decomposed sub-questions
-   - **Chosen perspectives** (at least 3 from the Perspective Rotation table) with rationale
-   - **Search query variants** for each sub-question (at least 3-5 per sub-question)
-4. Write TodoWrite to track progress
-
-### Step 2: Source Tiering & Exhaustive Web Investigation
-
-Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.
-
-**For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`
-
-**Tool Usage**:
-- Use `WebSearch` for broad searches; `WebFetch` to read specific pages
-- Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
-- Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
-- When citing web sources, include the URL and date accessed
-
-#### Exhaustive Search Requirements (MANDATORY)
-
-Do not stop at the first few results. The goal is to build a comprehensive evidence base.
-
-**Minimum search effort per sub-question**:
-- Execute **all** query variants generated in Step 1's Question Explosion (at least 3-5 per sub-question)
-- Consult at least **2 different source tiers** per sub-question (e.g., L1 official docs + L4 community discussion)
-- If initial searches yield fewer than 3 relevant sources for a sub-question, **broaden the search** with alternative terms, related domains, or analogous problems
-
-**Search broadening strategies** (use when results are thin):
-- Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
-- Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
-- Try different geographies: search in English + search for European/Asian approaches if relevant
-- Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
-- Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"
-
-**Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.
-
-**📁 Save action**:
-For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
-
-### Step 3: Fact Extraction & Evidence Cards
-
-Transform sources into **verifiable fact cards**:
-
-```markdown
-## Fact Cards
-
-### Fact 1
-- **Statement**: [specific fact description]
-- **Source**: [link/document section]
-- **Confidence**: High/Medium/Low
-
-### Fact 2
-...
-```
-
-**Key discipline**:
-- Pin down facts first, then reason
-- Distinguish "what officials said" from "what I infer"
-- When conflicting information is found, annotate and preserve both sides
-- Annotate confidence level:
-  - ✅ High: Explicitly stated in official documentation
-  - ⚠️ Medium: Mentioned in official blog but not formally documented
-  - ❓ Low: Inference or from unofficial sources
-
-**📁 Save action**:
-For each extracted fact, **immediately** append to `02_fact_cards.md`:
-```markdown
-## Fact #[number]
-- **Statement**: [specific fact description]
-- **Source**: [Source #number] [link]
-- **Phase**: [Phase 1 / Phase 2 / Assessment]
-- **Target Audience**: [which group this fact applies to, inherited from source or further refined]
-- **Confidence**: ✅/⚠️/❓
-- **Related Dimension**: [corresponding comparison dimension]
-```
-
-**⚠️ Target audience in fact statements**:
-- If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
-- Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
-- Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"
-
-### Step 3.5: Iterative Deepening — Follow-Up Investigation
-
-After initial fact extraction, review what you have found and identify **knowledge gaps and new questions** that emerged from the initial research. This step ensures the research doesn't stop at surface-level findings.
-
-**Process**:
-
-1. **Gap analysis**: Review fact cards and identify:
-   - Sub-questions with fewer than 3 high-confidence facts → need more searching
-   - Contradictions between sources → need tie-breaking evidence
-   - Perspectives (from Step 1) that have no or weak coverage → need targeted search
-   - Claims that rely only on L3/L4 sources → need L1/L2 verification
-
-2. **Follow-up question generation**: Based on initial findings, generate new questions:
-   - "Source X claims [fact] — is this consistent with other evidence?"
-   - "If [approach A] has [limitation], how do practitioners work around it?"
-   - "What are the second-order effects of [finding]?"
-   - "Who disagrees with [common finding] and why?"
-   - "What happened when [solution] was deployed at scale?"
-
-3. **Targeted deep-dive searches**: Execute follow-up searches focusing on:
-   - Specific claims that need verification
-   - Alternative viewpoints not yet represented
-   - Real-world case studies and experience reports
-   - Failure cases and edge conditions
-   - Recent developments that may change the picture
-
-4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md`
-
-**Exit criteria**: Proceed to Step 4 when:
-- Every sub-question has at least 3 facts with at least one from L1/L2
-- At least 3 perspectives from Step 1 have supporting evidence
-- No unresolved contradictions remain (or they are explicitly documented as open questions)
-- Follow-up searches are no longer producing new substantive information
-
-### Step 4: Build Comparison/Analysis Framework
-
-Based on the question type, select fixed analysis dimensions. **For dimension lists** (General, Concept Comparison, Decision Support): Read `references/comparison-frameworks.md`
-
-**📁 Save action**:
-Write to `03_comparison_framework.md`:
-```markdown
-# Comparison Framework
-
-## Selected Framework Type
-[Concept Comparison / Decision Support / ...]
-
-## Selected Dimensions
-1. [Dimension 1]
-2. [Dimension 2]
-...
-
-## Initial Population
-| Dimension | X | Y | Factual Basis |
-|-----------|---|---|---------------|
-| [Dimension 1] | [description] | [description] | Fact #1, #3 |
-| ... | | | |
-```
-
-### Step 5: Reference Point Baseline Alignment
-
-Ensure all compared parties have clear, consistent definitions:
-
-**Checklist**:
-- [ ] Is the reference point's definition stable/widely accepted?
-- [ ] Does it need verification, or can domain common knowledge be used?
-- [ ] Does the reader's understanding of the reference point match mine?
-- [ ] Are there ambiguities that need to be clarified first?
-
-### Step 6: Fact-to-Conclusion Reasoning Chain
-
-Explicitly write out the "fact → comparison → conclusion" reasoning process:
-
-```markdown
-## Reasoning Process
-
-### Regarding [Dimension Name]
-
-1. **Fact confirmation**: According to [source], X's mechanism is...
-2. **Compare with reference**: While Y's mechanism is...
-3. **Conclusion**: Therefore, the difference between X and Y on this dimension is...
-```
-
-**Key discipline**:
-- Conclusions come from mechanism comparison, not "gut feelings"
-- Every conclusion must be traceable to specific facts
-- Uncertain conclusions must be annotated
-
-**📁 Save action**:
-Write to `04_reasoning_chain.md`:
-```markdown
-# Reasoning Chain
-
-## Dimension 1: [Dimension Name]
-
-### Fact Confirmation
-According to [Fact #X], X's mechanism is...
-
-### Reference Comparison
-While Y's mechanism is... (Source: [Fact #Y])
-
-### Conclusion
-Therefore, the difference between X and Y on this dimension is...
-
-### Confidence
-✅/⚠️/❓ + rationale
-
----
-## Dimension 2: [Dimension Name]
-...
-```
-
-### Step 7: Use-Case Validation (Sanity Check)
-
-Validate conclusions against a typical scenario:
-
-**Validation questions**:
-- Based on my conclusions, how should this scenario be handled?
-- Is that actually the case?
-- Are there counterexamples that need to be addressed?
-
-**Review checklist**:
-- [ ] Are draft conclusions consistent with Step 3 fact cards?
-- [ ] Are there any important dimensions missed?
-- [ ] Is there any over-extrapolation?
-- [ ] Are conclusions actionable/verifiable?
-
-**📁 Save action**:
-Write to `05_validation_log.md`:
-```markdown
-# Validation Log
-
-## Validation Scenario
-[Scenario description]
-
-## Expected Based on Conclusions
-If using X: [expected behavior]
-If using Y: [expected behavior]
-
-## Actual Validation Results
-[actual situation]
-
-## Counterexamples
-[yes/no, describe if yes]
-
-## Review Checklist
-- [x] Draft conclusions consistent with fact cards
-- [x] No important dimensions missed
-- [x] No over-extrapolation
-- [ ] Issue found: [if any]
-
-## Conclusions Requiring Revision
-[if any]
-```
-
-### Step 8: Deliverable Formatting
-
-Make the output **readable, traceable, and actionable**.
-
-**📁 Save action**:
-Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` using the appropriate output template based on active mode:
-- Mode A: `templates/solution_draft_mode_a.md`
-- Mode B: `templates/solution_draft_mode_b.md`
-
-Sources to integrate:
-- Extract background from `00_question_decomposition.md`
-- Reference key facts from `02_fact_cards.md`
-- Organize conclusions from `04_reasoning_chain.md`
-- Generate references from `01_source_registry.md`
-- Supplement with use cases from `05_validation_log.md`
-- For Mode A: include AC assessment from `00_ac_assessment.md`
-
-## Solution Draft Output Templates
-
-### Mode A: Initial Research Output
-
-Use template: `templates/solution_draft_mode_a.md`
-
-### Mode B: Solution Assessment Output
-
-Use template: `templates/solution_draft_mode_b.md`
-
 ## Stakeholder Perspectives
 
 Adjust content depth based on audience:
@@ -709,75 +127,6 @@ Adjust content depth based on audience:
 | **Implementers** | Specific mechanisms, how-to | Detailed, emphasize how to do it |
 | **Technical experts** | Details, boundary conditions, limitations | In-depth, emphasize accuracy |
 
-## Output Files
-
-Default intermediate artifacts location: `RESEARCH_DIR/`
-
-**Required files** (automatically generated through the process):
-
-| File | Content | When Generated |
-|------|---------|----------------|
-| `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
-| `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
-| `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
-| `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
-| `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
-| `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
-| `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
-| `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
-| `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
-| `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
-
-**Optional files**:
-- `raw/*.md` - Raw source archives (saved when content is lengthy)
-
-## Methodology Quick Reference Card
-
-```
-┌──────────────────────────────────────────────────────────────────┐
-│              Deep Research — Mode-Aware 8-Step Method            │
-├──────────────────────────────────────────────────────────────────┤
-│ CONTEXT: Resolve mode (project vs standalone) + set paths        │
-│ GUARDRAILS: Check INPUT_DIR/INPUT_FILE exists + required files   │
-│ MODE DETECT: solution_draft*.md in 01_solution? → A or B         │
-│                                                                  │
-│ MODE A: Initial Research                                         │
-│   Phase 1: AC & Restrictions Assessment (BLOCKING)               │
-│   Phase 2: Full 8-step → solution_draft##.md                     │
-│   Phase 3: Tech Stack Consolidation (OPTIONAL) → tech_stack.md   │
-│   Phase 4: Security Deep Dive (OPTIONAL) → security_analysis.md  │
-│                                                                  │
-│ MODE B: Solution Assessment                                      │
-│   Read latest draft → Full 8-step → solution_draft##.md (N+1)    │
-│   Optional: Phase 3 / Phase 4 on revised draft                   │
-│                                                                  │
-│ 8-STEP ENGINE:                                                   │
-│  0. Classify question type → Select framework template           │
-│  0.5 Novelty sensitivity → Time windows for sources              │
-│  1. Decompose question → sub-questions + perspectives + queries  │
-│     → Perspective Rotation (3+ viewpoints, MANDATORY)            │
-│     → Question Explosion (3-5 query variants per sub-Q)          │
-│  2. Exhaustive web search → L1 > L2 > L3 > L4, broad coverage   │
-│     → Execute ALL query variants, search until saturation        │
-│  3. Extract facts → Each with source, confidence level           │
-│  3.5 Iterative deepening → gaps, contradictions, follow-ups     │
-│     → Keep searching until exit criteria met                     │
-│  4. Build framework → Fixed dimensions, structured compare       │
-│  5. Align references → Ensure unified definitions                │
-│  6. Reasoning chain → Fact→Compare→Conclude, explicit            │
-│  7. Use-case validation → Sanity check, prevent armchairing      │
-│  8. Deliverable → solution_draft##.md (mode-specific format)     │
-├──────────────────────────────────────────────────────────────────┤
-│ Key discipline: Ask don't assume · Facts before reasoning        │
-│   Conclusions from mechanism, not gut feelings                   │
-│   Search broadly, from multiple perspectives, until saturation   │
-└──────────────────────────────────────────────────────────────────┘
-```
-
-## Usage Examples
-
-For detailed execution flow examples (Mode A initial, Mode B assessment, standalone, force override): Read `references/usage-examples.md`
-
 ## Source Verifiability Requirements
 
 Every cited piece of external information must be directly verifiable by the user. All links must be publicly accessible (annotate `[login required]` if not), citations must include exact section/page/timestamp, and unverifiable information must be annotated `[limited source]`. Full checklist in `references/quality-checklists.md`.
@@ -795,7 +144,7 @@ Before completing the solution draft, run through the checklists in `references/
 
 When replying to the user after research is complete:
 
-**✅ Should include**:
+**Should include**:
 - Active mode used (A or B) and which optional phases were executed
 - One-sentence core conclusion
 - Key findings summary (3-5 points)
@@ -803,7 +152,7 @@ When replying to the user after research is complete:
 - Paths to optional artifacts if produced: `tech_stack.md`, `security_analysis.md`
 - If there are significant uncertainties, annotate points requiring further verification
 
-**❌ Must not include**:
+**Must not include**:
 - Process file listings (e.g., `00_question_decomposition.md`, `01_source_registry.md`, etc.)
 - Detailed research step descriptions
 - Working directory structure display
diff --git a/.cursor/skills/research/steps/00_project-integration.md b/.cursor/skills/research/steps/00_project-integration.md
new file mode 100644
index 0000000..f94ef4f
--- /dev/null
+++ b/.cursor/skills/research/steps/00_project-integration.md
@@ -0,0 +1,103 @@
+## Project Integration
+
+### Prerequisite Guardrails (BLOCKING)
+
+Before any research begins, verify the input context exists. **Do not proceed if guardrails fail.**
+
+**Project mode:**
+1. Check INPUT_DIR exists — **STOP if missing**, ask user to create it and provide problem files
+2. Check `problem.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
+3. Check `restrictions.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
+4. Check `acceptance_criteria.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
+5. Check `input_data/` in INPUT_DIR exists and contains at least one file — **STOP if missing**
+6. Read **all** files in INPUT_DIR to ground the investigation in the project context
+7. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
+
+**Standalone mode:**
+1. Check INPUT_FILE exists and is non-empty — **STOP if missing**
+2. Resolve BASE_DIR: use the caller-specified directory if provided; otherwise default to `_standalone/`
+3. Resolve OUTPUT_DIR (`BASE_DIR/01_solution/`) and RESEARCH_DIR (`BASE_DIR/00_research/`)
+4. Warn if no `restrictions.md` or `acceptance_criteria.md` were provided alongside INPUT_FILE — proceed if user confirms
+5. Create BASE_DIR, OUTPUT_DIR, and RESEARCH_DIR if they don't exist
+
+### Mode Detection
+
+After guardrails pass, determine the execution mode:
+
+1. Scan OUTPUT_DIR for files matching `solution_draft*.md`
+2. **No matches found** → **Mode A: Initial Research**
+3. **Matches found** → **Mode B: Solution Assessment** (use the highest-numbered draft as input)
+4. **User override**: if the user explicitly says "research from scratch" or "initial research", force Mode A regardless of existing drafts
+
+Inform the user which mode was detected and confirm before proceeding.
+
+### Solution Draft Numbering
+
+All final output is saved as `OUTPUT_DIR/solution_draft##.md` with a 2-digit zero-padded number:
+
+1. Scan existing files in OUTPUT_DIR matching `solution_draft*.md`
+2. Extract the highest existing number
+3. Increment by 1
+4. Zero-pad to 2 digits (e.g., `01`, `02`, ..., `10`, `11`)
+
+Example: if `solution_draft01.md` through `solution_draft10.md` exist, the next output is `solution_draft11.md`.
+
+### Working Directory & Intermediate Artifact Management
+
+#### Directory Structure
+
+At the start of research, **must** create a working directory under RESEARCH_DIR:
+
+```
+RESEARCH_DIR/
+├── 00_ac_assessment.md            # Mode A Phase 1 output: AC & restrictions assessment
+├── 00_question_decomposition.md   # Step 0-1 output
+├── 01_source_registry.md          # Step 2 output: all consulted source links
+├── 02_fact_cards.md               # Step 3 output: extracted facts
+├── 03_comparison_framework.md     # Step 4 output: selected framework and populated data
+├── 04_reasoning_chain.md          # Step 6 output: fact → conclusion reasoning
+├── 05_validation_log.md           # Step 7 output: use-case validation results
+└── raw/                           # Raw source archive (optional)
+    ├── source_1.md
+    └── source_2.md
+```
+
+### Save Timing & Content
+
+| Step | Save immediately after completion | Filename |
+|------|-----------------------------------|----------|
+| Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
+| Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
+| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
+| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
+| Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
+| Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
+| Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
+| Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
+
+### Save Principles
+
+1. **Save immediately**: Write to the corresponding file as soon as a step is completed; don't wait until the end
+2. **Incremental updates**: Same file can be updated multiple times; append or replace new content
+3. **Preserve process**: Keep intermediate files even after their content is integrated into the final report
+4. **Enable recovery**: If research is interrupted, progress can be recovered from intermediate files
+
+### Output Files
+
+**Required files** (automatically generated through the process):
+
+| File | Content | When Generated |
+|------|---------|----------------|
+| `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
+| `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
+| `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
+| `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
+| `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
+| `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
+| `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
+| `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
+| `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
+| `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
+
+**Optional files**:
+- `raw/*.md` - Raw source archives (saved when content is lengthy)
diff --git a/.cursor/skills/research/steps/01_mode-a-initial-research.md b/.cursor/skills/research/steps/01_mode-a-initial-research.md
new file mode 100644
index 0000000..88404cd
--- /dev/null
+++ b/.cursor/skills/research/steps/01_mode-a-initial-research.md
@@ -0,0 +1,127 @@
+## Mode A: Initial Research
+
+Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the user explicitly requests initial research.
+
+### Phase 1: AC & Restrictions Assessment (BLOCKING)
+
+**Role**: Professional software architect
+
+A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
+
+**Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
+
+**Task**:
+1. Read all problem context files thoroughly
+2. **ASK the user about every unclear aspect** — do not assume:
+   - Unclear problem boundaries → ask
+   - Ambiguous acceptance criteria values → ask
+   - Missing context (no `security_approach.md`, no `input_data/`) → ask what they have
+   - Conflicting restrictions → ask which takes priority
+3. Research in internet **extensively** — use multiple search queries per question, rephrase, and search from different angles:
+   - How realistic are the acceptance criteria for this specific domain? Search for industry benchmarks, standards, and typical values
+   - How critical is each criterion? Search for case studies where criteria were relaxed or tightened
+   - What domain-specific acceptance criteria are we missing? Search for industry standards, regulatory requirements, and best practices in the specific domain
+   - Impact of each criterion value on the whole system quality — search for research papers and engineering reports
+   - Cost/budget implications of each criterion — search for pricing, total cost of ownership analyses, and comparable project budgets
+   - Timeline implications — search for project timelines, development velocity reports, and comparable implementations
+   - What do practitioners in this domain consider the most important criteria? Search forums, conference talks, and experience reports
+4. Research restrictions from multiple perspectives:
+   - Are the restrictions realistic? Search for comparable projects that operated under similar constraints
+   - Should any be tightened or relaxed? Search for what constraints similar projects actually ended up with
+   - Are there additional restrictions we should add? Search for regulatory, compliance, and safety requirements in this domain
+   - What restrictions do practitioners wish they had defined earlier? Search for post-mortem reports and lessons learned
+5. Verify findings with authoritative sources (official docs, papers, benchmarks) — each key finding must have at least 2 independent sources
+
+**Uses Steps 0-3 of the 8-step engine** (question classification, decomposition, source tiering, fact extraction) scoped to AC and restrictions assessment.
+
+**Save action**: Write `RESEARCH_DIR/00_ac_assessment.md` with format:
+
+```markdown
+# Acceptance Criteria Assessment
+
+## Acceptance Criteria
+
+| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
+|-----------|-----------|-------------------|---------------------|--------|
+| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
+
+## Restrictions Assessment
+
+| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
+|-------------|-----------|-------------------|---------------------|--------|
+| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
+
+## Key Findings
+[Summary of critical findings]
+
+## Sources
+[Key references used]
+```
+
+**BLOCKING**: Present the AC assessment tables to the user. Wait for confirmation or adjustments before proceeding to Phase 2. The user may update `acceptance_criteria.md` or `restrictions.md` based on findings.
+
+---
+
+### Phase 2: Problem Research & Solution Draft
+
+**Role**: Professional researcher and software architect
+
+Full 8-step research methodology. Produces the first solution draft.
+
+**Input**: All files from INPUT_DIR (possibly updated after Phase 1) + Phase 1 artifacts
+
+**Task** (drives the 8-step engine):
+1. Research existing/competitor solutions for similar problems — search broadly across industries and adjacent domains, not just the obvious competitors
+2. Research the problem thoroughly — all possible ways to solve it, split into components; search for how different fields approach analogous problems
+3. For each component, research all possible solutions and find the most efficient state-of-the-art approaches — use multiple query variants and perspectives from Step 1
+4. For each promising approach, search for real-world deployment experience: success stories, failure reports, lessons learned, and practitioner opinions
+5. Search for contrarian viewpoints — who argues against the common approaches and why? What failure modes exist?
+6. Verify that suggested tools/libraries actually exist and work as described — check official repos, latest releases, and community health (stars, recent commits, open issues)
+7. Include security considerations in each component analysis
+8. Provide rough cost estimates for proposed solutions
+
+Be concise in formulating. The fewer words, the better, but do not miss any important details.
+
+**Save action**: Write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
+
+---
+
+### Phase 3: Tech Stack Consolidation (OPTIONAL)
+
+**Role**: Software architect evaluating technology choices
+
+Focused synthesis step — no new 8-step cycle. Uses research already gathered in Phase 2 to make concrete technology decisions.
+
+**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + all files from INPUT_DIR
+
+**Task**:
+1. Extract technology options from the solution draft's component comparison tables
+2. Score each option against: fitness for purpose, maturity, security track record, team expertise, cost, scalability
+3. Produce a tech stack summary with selection rationale
+4. Assess risks and learning requirements per technology choice
+
+**Save action**: Write `OUTPUT_DIR/tech_stack.md` with:
+- Requirements analysis (functional, non-functional, constraints)
+- Technology evaluation tables (language, framework, database, infrastructure, key libraries) with scores
+- Tech stack summary block
+- Risk assessment and learning requirements tables
+
+---
+
+### Phase 4: Security Deep Dive (OPTIONAL)
+
+**Role**: Security architect
+
+Focused analysis step — deepens the security column from the solution draft into a proper threat model and controls specification.
+
+**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + `security_approach.md` from INPUT_DIR + problem context
+
+**Task**:
+1. Build threat model: asset inventory, threat actors, attack vectors
+2. Define security requirements and proposed controls per component (with risk level)
+3. Summarize authentication/authorization, data protection, secure communication, and logging/monitoring approach
+
+**Save action**: Write `OUTPUT_DIR/security_analysis.md` with:
+- Threat model (assets, actors, vectors)
+- Per-component security requirements and controls table
+- Security controls summary
diff --git a/.cursor/skills/research/steps/02_mode-b-solution-assessment.md b/.cursor/skills/research/steps/02_mode-b-solution-assessment.md
new file mode 100644
index 0000000..d14d031
--- /dev/null
+++ b/.cursor/skills/research/steps/02_mode-b-solution-assessment.md
@@ -0,0 +1,27 @@
+## Mode B: Solution Assessment
+
+Triggered when `solution_draft*.md` files exist in OUTPUT_DIR.
+
+**Role**: Professional software architect
+
+Full 8-step research methodology applied to assessing and improving an existing solution draft.
+
+**Input**: All files from INPUT_DIR + the latest (highest-numbered) `solution_draft##.md` from OUTPUT_DIR
+
+**Task** (drives the 8-step engine):
+1. Read the existing solution draft thoroughly
+2. Research in internet extensively — for each component/decision in the draft, search for:
+   - Known problems and limitations of the chosen approach
+   - What practitioners say about using it in production
+   - Better alternatives that may have emerged recently
+   - Common failure modes and edge cases
+   - How competitors/similar projects solve the same problem differently
+3. Search specifically for contrarian views: "why not [chosen approach]", "[chosen approach] criticism", "[chosen approach] failure"
+4. Identify security weak points and vulnerabilities — search for CVEs, security advisories, and known attack vectors for each technology in the draft
+5. Identify performance bottlenecks — search for benchmarks, load test results, and scalability reports
+6. For each identified weak point, search for multiple solution approaches and compare them
+7. Based on findings, form a new solution draft in the same format
+
+**Save action**: Write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
+
+**Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions in `steps/01_mode-a-initial-research.md`.
diff --git a/.cursor/skills/research/steps/03_engine-investigation.md b/.cursor/skills/research/steps/03_engine-investigation.md
new file mode 100644
index 0000000..733905d
--- /dev/null
+++ b/.cursor/skills/research/steps/03_engine-investigation.md
@@ -0,0 +1,227 @@
+## Research Engine — Investigation Phase (Steps 0–3.5)
+
+### Step 0: Question Type Classification
+
+First, classify the research question type and select the corresponding strategy:
+
+| Question Type | Core Task | Focus Dimensions |
+|---------------|-----------|------------------|
+| **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
+| **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
+| **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
+| **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
+| **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |
+
+**Mode-specific classification**:
+
+| Mode / Phase | Typical Question Type |
+|--------------|----------------------|
+| Mode A Phase 1 | Knowledge Organization + Decision Support |
+| Mode A Phase 2 | Decision Support |
+| Mode B | Problem Diagnosis + Decision Support |
+
+### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)
+
+Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.
+
+**For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`
+
+Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.
+
+**Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`
+
+---
+
+### Step 1: Question Decomposition & Boundary Definition
+
+**Mode-specific sub-questions**:
+
+**Mode A Phase 2** (Initial Research — Problem & Solution):
+- "What existing/competitor solutions address this problem?"
+- "What are the component parts of this problem?"
+- "For each component, what are the state-of-the-art solutions?"
+- "What are the security considerations per component?"
+- "What are the cost implications of each approach?"
+
+**Mode B** (Solution Assessment):
+- "What are the weak points and potential problems in the existing draft?"
+- "What are the security vulnerabilities in the proposed architecture?"
+- "Where are the performance bottlenecks?"
+- "What solutions exist for each identified issue?"
+
+**General sub-question patterns** (use when applicable):
+- **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
+- **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
+- **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
+- **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)
+
+#### Perspective Rotation (MANDATORY)
+
+For each research problem, examine it from **at least 3 different perspectives**. Each perspective generates its own sub-questions and search queries.
+
+| Perspective | What it asks | Example queries |
+|-------------|-------------|-----------------|
+| **End-user / Consumer** | What problems do real users encounter? What do they wish were different? | "X problems", "X frustrations reddit", "X user complaints" |
+| **Implementer / Engineer** | What are the technical challenges, gotchas, hidden complexities? | "X implementation challenges", "X pitfalls", "X lessons learned" |
+| **Business / Decision-maker** | What are the costs, ROI, strategic implications? | "X total cost of ownership", "X ROI case study", "X vs Y business comparison" |
+| **Contrarian / Devil's advocate** | What could go wrong? Why might this fail? What are critics saying? | "X criticism", "why not X", "X failures", "X disadvantages real world" |
+| **Domain expert / Academic** | What does peer-reviewed research say? What are theoretical limits? | "X research paper", "X systematic review", "X benchmarks academic" |
+| **Practitioner / Field** | What do people who actually use this daily say? What works in practice vs theory? | "X in production", "X experience report", "X after 1 year" |
+
+Select at least 3 perspectives relevant to the problem. Document the chosen perspectives in `00_question_decomposition.md`.
+
+#### Question Explosion (MANDATORY)
+
+For **each sub-question**, generate **at least 3-5 search query variants** before searching. This ensures broad coverage and avoids missing relevant information due to terminology differences.
+
+**Query variant strategies**:
+- **Specificity ladder**: broad ("indoor navigation systems") → narrow ("UWB-based indoor drone navigation accuracy")
+- **Negation/failure**: "X limitations", "X failure modes", "when X doesn't work"
+- **Comparison framing**: "X vs Y for Z", "X alternative for Z", "X or Y which is better for Z"
+- **Practitioner voice**: "X in production experience", "X real-world results", "X lessons learned"
+- **Temporal**: "X 2025", "X latest developments", "X roadmap"
+- **Geographic/domain**: "X in Europe", "X for defense applications", "X in agriculture"
+
+Record all planned queries in `00_question_decomposition.md` alongside each sub-question.
+
+**Research Subject Boundary Definition (BLOCKING - must be explicit)**:
+
+When decomposing questions, you must explicitly define the **boundaries of the research subject**:
+
+| Dimension | Boundary to define | Example |
+|-----------|--------------------|---------|
+| **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
+| **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
+| **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
+| **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
+
+**Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.
+
+**Save action**:
+1. Read all files from INPUT_DIR to ground the research in the project context
+2. Create working directory `RESEARCH_DIR/`
+3. Write `00_question_decomposition.md`, including:
+   - Original question
+   - Active mode (A Phase 2 or B) and rationale
+   - Summary of relevant problem context from INPUT_DIR
+   - Classified question type and rationale
+   - **Research subject boundary definition** (population, geography, timeframe, level)
+   - List of decomposed sub-questions
+   - **Chosen perspectives** (at least 3 from the Perspective Rotation table) with rationale
+   - **Search query variants** for each sub-question (at least 3-5 per sub-question)
+4. Write TodoWrite to track progress
+
+---
+
+### Step 2: Source Tiering & Exhaustive Web Investigation
+
+Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.
+
+**For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`
+
+**Tool Usage**:
+- Use `WebSearch` for broad searches; `WebFetch` to read specific pages
+- Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
+- Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
+- When citing web sources, include the URL and date accessed
+
+#### Exhaustive Search Requirements (MANDATORY)
+
+Do not stop at the first few results. The goal is to build a comprehensive evidence base.
+
+**Minimum search effort per sub-question**:
+- Execute **all** query variants generated in Step 1's Question Explosion (at least 3-5 per sub-question)
+- Consult at least **2 different source tiers** per sub-question (e.g., L1 official docs + L4 community discussion)
+- If initial searches yield fewer than 3 relevant sources for a sub-question, **broaden the search** with alternative terms, related domains, or analogous problems
+
+**Search broadening strategies** (use when results are thin):
+- Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
+- Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
+- Try different geographies: search in English + search for European/Asian approaches if relevant
+- Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
+- Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"
+
+**Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.
+
+**Save action**:
+For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
+
+---
+
+### Step 3: Fact Extraction & Evidence Cards
+
+Transform sources into **verifiable fact cards**:
+
+```markdown
+## Fact Cards
+
+### Fact 1
+- **Statement**: [specific fact description]
+- **Source**: [link/document section]
+- **Confidence**: High/Medium/Low
+
+### Fact 2
+...
+```
+
+**Key discipline**:
+- Pin down facts first, then reason
+- Distinguish "what officials said" from "what I infer"
+- When conflicting information is found, annotate and preserve both sides
+- Annotate confidence level:
+  - ✅ High: Explicitly stated in official documentation
+  - ⚠️ Medium: Mentioned in official blog but not formally documented
+  - ❓ Low: Inference or from unofficial sources
+
+**Save action**:
+For each extracted fact, **immediately** append to `02_fact_cards.md`:
+```markdown
+## Fact #[number]
+- **Statement**: [specific fact description]
+- **Source**: [Source #number] [link]
+- **Phase**: [Phase 1 / Phase 2 / Assessment]
+- **Target Audience**: [which group this fact applies to, inherited from source or further refined]
+- **Confidence**: ✅/⚠️/❓
+- **Related Dimension**: [corresponding comparison dimension]
+```
+
+**Target audience in fact statements**:
+- If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
+- Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
+- Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"
+
+---
+
+### Step 3.5: Iterative Deepening — Follow-Up Investigation
+
+After initial fact extraction, review what you have found and identify **knowledge gaps and new questions** that emerged from the initial research. This step ensures the research doesn't stop at surface-level findings.
+
+**Process**:
+
+1. **Gap analysis**: Review fact cards and identify:
+   - Sub-questions with fewer than 3 high-confidence facts → need more searching
+   - Contradictions between sources → need tie-breaking evidence
+   - Perspectives (from Step 1) that have no or weak coverage → need targeted search
+   - Claims that rely only on L3/L4 sources → need L1/L2 verification
+
+2. **Follow-up question generation**: Based on initial findings, generate new questions:
+   - "Source X claims [fact] — is this consistent with other evidence?"
+   - "If [approach A] has [limitation], how do practitioners work around it?"
+   - "What are the second-order effects of [finding]?"
+   - "Who disagrees with [common finding] and why?"
+   - "What happened when [solution] was deployed at scale?"
+
+3. **Targeted deep-dive searches**: Execute follow-up searches focusing on:
+   - Specific claims that need verification
+   - Alternative viewpoints not yet represented
+   - Real-world case studies and experience reports
+   - Failure cases and edge conditions
+   - Recent developments that may change the picture
+
+4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md`
+
+**Exit criteria**: Proceed to Step 4 when:
+- Every sub-question has at least 3 facts with at least one from L1/L2
+- At least 3 perspectives from Step 1 have supporting evidence
+- No unresolved contradictions remain (or they are explicitly documented as open questions)
+- Follow-up searches are no longer producing new substantive information
diff --git a/.cursor/skills/research/steps/04_engine-analysis.md b/.cursor/skills/research/steps/04_engine-analysis.md
new file mode 100644
index 0000000..b06f7cd
--- /dev/null
+++ b/.cursor/skills/research/steps/04_engine-analysis.md
@@ -0,0 +1,146 @@
+## Research Engine — Analysis Phase (Steps 4–8)
+
+### Step 4: Build Comparison/Analysis Framework
+
+Based on the question type, select fixed analysis dimensions. **For dimension lists** (General, Concept Comparison, Decision Support): Read `references/comparison-frameworks.md`
+
+**Save action**:
+Write to `03_comparison_framework.md`:
+```markdown
+# Comparison Framework
+
+## Selected Framework Type
+[Concept Comparison / Decision Support / ...]
+
+## Selected Dimensions
+1. [Dimension 1]
+2. [Dimension 2]
+...
+
+## Initial Population
+| Dimension | X | Y | Factual Basis |
+|-----------|---|---|---------------|
+| [Dimension 1] | [description] | [description] | Fact #1, #3 |
+| ... | | | |
+```
+
+---
+
+### Step 5: Reference Point Baseline Alignment
+
+Ensure all compared parties have clear, consistent definitions:
+
+**Checklist**:
+- [ ] Is the reference point's definition stable/widely accepted?
+- [ ] Does it need verification, or can domain common knowledge be used?
+- [ ] Does the reader's understanding of the reference point match mine?
+- [ ] Are there ambiguities that need to be clarified first?
+
+---
+
+### Step 6: Fact-to-Conclusion Reasoning Chain
+
+Explicitly write out the "fact → comparison → conclusion" reasoning process:
+
+```markdown
+## Reasoning Process
+
+### Regarding [Dimension Name]
+
+1. **Fact confirmation**: According to [source], X's mechanism is...
+2. **Compare with reference**: While Y's mechanism is...
+3. **Conclusion**: Therefore, the difference between X and Y on this dimension is...
+```
+
+**Key discipline**:
+- Conclusions come from mechanism comparison, not "gut feelings"
+- Every conclusion must be traceable to specific facts
+- Uncertain conclusions must be annotated
+
+**Save action**:
+Write to `04_reasoning_chain.md`:
+```markdown
+# Reasoning Chain
+
+## Dimension 1: [Dimension Name]
+
+### Fact Confirmation
+According to [Fact #X], X's mechanism is...
+
+### Reference Comparison
+While Y's mechanism is... (Source: [Fact #Y])
+
+### Conclusion
+Therefore, the difference between X and Y on this dimension is...
+
+### Confidence
+✅/⚠️/❓ + rationale
+
+---
+## Dimension 2: [Dimension Name]
+...
+```
+
+---
+
+### Step 7: Use-Case Validation (Sanity Check)
+
+Validate conclusions against a typical scenario:
+
+**Validation questions**:
+- Based on my conclusions, how should this scenario be handled?
+- Is that actually the case?
+- Are there counterexamples that need to be addressed?
+
+**Review checklist**:
+- [ ] Are draft conclusions consistent with Step 3 fact cards?
+- [ ] Are there any important dimensions missed?
+- [ ] Is there any over-extrapolation?
+- [ ] Are conclusions actionable/verifiable?
+
+**Save action**:
+Write to `05_validation_log.md`:
+```markdown
+# Validation Log
+
+## Validation Scenario
+[Scenario description]
+
+## Expected Based on Conclusions
+If using X: [expected behavior]
+If using Y: [expected behavior]
+
+## Actual Validation Results
+[actual situation]
+
+## Counterexamples
+[yes/no, describe if yes]
+
+## Review Checklist
+- [x] Draft conclusions consistent with fact cards
+- [x] No important dimensions missed
+- [x] No over-extrapolation
+- [ ] Issue found: [if any]
+
+## Conclusions Requiring Revision
+[if any]
+```
+
+---
+
+### Step 8: Deliverable Formatting
+
+Make the output **readable, traceable, and actionable**.
+
+**Save action**:
+Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` using the appropriate output template based on active mode:
+- Mode A: `templates/solution_draft_mode_a.md`
+- Mode B: `templates/solution_draft_mode_b.md`
+
+Sources to integrate:
+- Extract background from `00_question_decomposition.md`
+- Reference key facts from `02_fact_cards.md`
+- Organize conclusions from `04_reasoning_chain.md`
+- Generate references from `01_source_registry.md`
+- Supplement with use cases from `05_validation_log.md`
+- For Mode A: include AC assessment from `00_ac_assessment.md`
diff --git a/.cursor/skills/retrospective/SKILL.md b/.cursor/skills/retrospective/SKILL.md
index 0f04f25..3b5191a 100644
--- a/.cursor/skills/retrospective/SKILL.md
+++ b/.cursor/skills/retrospective/SKILL.md
@@ -4,7 +4,7 @@ description: |
   Collect metrics from implementation batch reports and code review findings, analyze trends across cycles,
   and produce improvement reports with actionable recommendations.
   3-step workflow: collect metrics, analyze trends, produce report.
-  Outputs to _docs/05_metrics/.
+  Outputs to _docs/06_metrics/.
   Trigger phrases:
   - "retrospective", "retro", "run retro"
   - "metrics review", "feedback loop"
@@ -31,7 +31,7 @@ Collect metrics from implementation artifacts, analyze trends across development
 Fixed paths:
 
 - IMPL_DIR: `_docs/03_implementation/`
-- METRICS_DIR: `_docs/05_metrics/`
+- METRICS_DIR: `_docs/06_metrics/`
 - TASKS_DIR: `_docs/02_tasks/`
 
 Announce the resolved paths to the user before proceeding.
@@ -166,7 +166,7 @@ Present the report summary to the user.
 │                                                                │
 │ 1. Collect Metrics  → parse batch reports, compute metrics     │
 │ 2. Analyze Trends   → patterns, comparison, improvement areas  │
-│ 3. Produce Report   → _docs/05_metrics/retro_[date].md         │
+│ 3. Produce Report   → _docs/06_metrics/retro_[date].md         │
 ├────────────────────────────────────────────────────────────────┤
 │ Principles: Data-driven · Actionable · Cumulative              │
 │             Non-judgmental · Save immediately                  │
diff --git a/.cursor/skills/rollback/SKILL.md b/.cursor/skills/rollback/SKILL.md
deleted file mode 100644
index 064ef58..0000000
--- a/.cursor/skills/rollback/SKILL.md
+++ /dev/null
@@ -1,130 +0,0 @@
----
-name: rollback
-description: |
-  Revert implementation to a specific batch checkpoint using git revert, reset Jira ticket statuses,
-  verify rollback integrity with tests, and produce a rollback report.
-  Trigger phrases:
-  - "rollback", "revert", "revert batch"
-  - "undo implementation", "roll back to batch"
-category: build
-tags: [rollback, revert, recovery, implementation]
-disable-model-invocation: true
----
-
-# Implementation Rollback
-
-Revert the codebase to a specific batch checkpoint, reset Jira statuses for reverted tasks, and verify integrity.
-
-## Core Principles
-
-- **Preserve history**: always use `git revert`, never force-push
-- **Verify after revert**: run the full test suite after every rollback
-- **Update tracking**: reset Jira ticket statuses for all reverted tasks
-- **Atomic rollback**: if rollback fails midway, stop and report — do not leave the codebase in a partial state
-- **Ask, don't assume**: if the target batch is ambiguous, present options and ask
-
-## Context Resolution
-
-- IMPL_DIR: `_docs/03_implementation/`
-- Batch reports: `IMPL_DIR/batch_*_report.md`
-
-## Prerequisite Checks (BLOCKING)
-
-1. IMPL_DIR exists and contains at least one `batch_*_report.md` — **STOP if missing**
-2. Git working tree is clean (no uncommitted changes) — **STOP if dirty**, ask user to commit or stash
-
-## Input
-
-- User specifies a target batch number or commit hash
-- If not specified, present the list of available batch checkpoints and ask
-
-## Workflow
-
-### Step 1: Identify Checkpoints
-
-1. Read all `batch_*_report.md` files from IMPL_DIR
-2. Extract: batch number, date, tasks included, commit hash, code review verdict
-3. Present batch list to user
-
-**BLOCKING**: User must confirm which batch to roll back to.
-
-### Step 2: Revert Commits
-
-1. Determine which commits need to be reverted (all commits after the target batch)
-2. For each commit in reverse chronological order:
-   - Run `git revert <commit-hash> --no-edit`
-   - If merge conflicts occur: present conflicts and ask user for resolution
-3. If any revert fails and cannot be resolved, abort the rollback sequence with `git revert --abort` and report
-
-### Step 3: Verify Integrity
-
-1. Run the full test suite
-2. If tests fail: report failures to user, ask how to proceed (fix or abort)
-3. If tests pass: continue
-
-### Step 4: Update Jira
-
-1. Identify all tasks from reverted batches
-2. Reset each task's Jira ticket status to "To Do" via Jira MCP
-
-### Step 5: Finalize
-
-1. Commit with message: `[ROLLBACK] Reverted to batch [N]: [task list]`
-2. Write rollback report to `IMPL_DIR/rollback_report.md`
-
-## Output
-
-Write `_docs/03_implementation/rollback_report.md`:
-
-```markdown
-# Rollback Report
-
-**Date**: [YYYY-MM-DD]
-**Target**: Batch [N] (commit [hash])
-**Reverted Batches**: [list]
-
-## Reverted Tasks
-
-| Task | Batch | Status Before | Status After |
-|------|-------|--------------|-------------|
-| [JIRA-ID] | [batch #] | In Testing | To Do |
-
-## Test Results
-- [pass/fail count]
-
-## Jira Updates
-- [list of ticket transitions]
-
-## Notes
-- [any conflicts, manual steps, or issues encountered]
-```
-
-## Escalation Rules
-
-| Situation | Action |
-|-----------|--------|
-| No batch reports exist | **STOP** — nothing to roll back |
-| Uncommitted changes in working tree | **STOP** — ask user to commit or stash |
-| Merge conflicts during revert | **ASK user** for resolution |
-| Tests fail after rollback | **ASK user** — fix or abort |
-| Rollback fails midway | Abort with `git revert --abort`, report to user |
-
-## Methodology Quick Reference
-
-```
-┌────────────────────────────────────────────────────────────────┐
-│              Rollback (5-Step Method)                            │
-├────────────────────────────────────────────────────────────────┤
-│ PREREQ: batch reports exist, clean working tree                 │
-│                                                                │
-│ 1. Identify Checkpoints → present batch list                    │
-│    [BLOCKING: user confirms target batch]                       │
-│ 2. Revert Commits       → git revert per commit                │
-│ 3. Verify Integrity     → run full test suite                   │
-│ 4. Update Jira          → reset statuses to "To Do"            │
-│ 5. Finalize             → commit + rollback_report.md           │
-├────────────────────────────────────────────────────────────────┤
-│ Principles: Preserve history · Verify after revert              │
-│             Atomic rollback · Ask don't assume                 │
-└────────────────────────────────────────────────────────────────┘
-```
diff --git a/.cursor/skills/security/SKILL.md b/.cursor/skills/security/SKILL.md
index 5be5701..1e35084 100644
--- a/.cursor/skills/security/SKILL.md
+++ b/.cursor/skills/security/SKILL.md
@@ -1,300 +1,347 @@
 ---
-name: security-testing
-description: "Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices."
-category: specialized-testing
-priority: critical
-tokenEstimate: 1200
-agents: [qe-security-scanner, qe-api-contract-validator, qe-quality-analyzer]
-implementation_status: optimized
-optimization_version: 1.0
-last_optimized: 2025-12-02
-dependencies: []
-quick_reference_card: true
-tags: [security, owasp, sast, dast, vulnerabilities, auth, injection]
-trust_tier: 3
-validation:
-  schema_path: schemas/output.json
-  validator_path: scripts/validate-config.json
-  eval_path: evals/security-testing.yaml
+name: security
+description: |
+  OWASP-based security audit skill. Analyzes codebase for vulnerabilities across dependency scanning,
+  static analysis, OWASP Top 10 review, and secrets detection. Produces a structured security report
+  with severity-ranked findings and remediation guidance.
+  Can be invoked standalone or as part of the autopilot flow (optional step before deploy).
+  Trigger phrases:
+  - "security audit", "security scan", "OWASP review"
+  - "vulnerability scan", "security check"
+  - "check for vulnerabilities", "pentest"
+category: review
+tags: [security, owasp, sast, vulnerabilities, auth, injection, secrets]
+disable-model-invocation: true
 ---
 
-# Security Testing
+# Security Audit
 
-<default_to_action>
-When testing security or conducting audits:
-1. TEST OWASP Top 10 vulnerabilities systematically
-2. VALIDATE authentication and authorization on every endpoint
-3. SCAN dependencies for known vulnerabilities (npm audit)
-4. CHECK for injection attacks (SQL, XSS, command)
-5. VERIFY secrets aren't exposed in code/logs
+Analyze the codebase for security vulnerabilities using OWASP principles. Produces a structured report with severity-ranked findings, remediation suggestions, and a security checklist verdict.
 
-**Quick Security Checks:**
-- Access control → Test horizontal/vertical privilege escalation
-- Crypto → Verify password hashing, HTTPS, no sensitive data exposed
-- Injection → Test SQL injection, XSS, command injection
-- Auth → Test weak passwords, session fixation, MFA enforcement
-- Config → Check error messages don't leak info
+## Core Principles
 
-**Critical Success Factors:**
-- Think like an attacker, build like a defender
-- Security is built in, not added at the end
-- Test continuously in CI/CD, not just before release
-</default_to_action>
+- **OWASP-driven**: use the current OWASP Top 10 as the primary framework — verify the latest version at https://owasp.org/www-project-top-ten/ at audit start
+- **Evidence-based**: every finding must reference a specific file, line, or configuration
+- **Severity-ranked**: findings sorted Critical > High > Medium > Low
+- **Actionable**: every finding includes a concrete remediation suggestion
+- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
+- **Complement, don't duplicate**: the `/code-review` skill does a lightweight security quick-scan; this skill goes deeper
 
-## Quick Reference Card
+## Context Resolution
 
-### When to Use
-- Security audits and penetration testing
-- Testing authentication/authorization
-- Validating input sanitization
-- Reviewing security configuration
+**Project mode** (default):
+- PROBLEM_DIR: `_docs/00_problem/`
+- SOLUTION_DIR: `_docs/01_solution/`
+- DOCUMENT_DIR: `_docs/02_document/`
+- SECURITY_DIR: `_docs/05_security/`
 
-### OWASP Top 10
-Use the most recent **stable** version of the OWASP Top 10. At the start of each security audit, research the current version at https://owasp.org/www-project-top-ten/ and test against all listed categories. Do not rely on a hardcoded list — the OWASP Top 10 is updated periodically and the current version must be verified.
+**Standalone mode** (explicit target provided, e.g. `/security @src/api/`):
+- TARGET: the provided path
+- SECURITY_DIR: `_standalone/security/`
 
-### Tools
-| Type | Tool | Purpose |
-|------|------|---------|
-| SAST | SonarQube, Semgrep | Static code analysis |
-| DAST | OWASP ZAP, Burp | Dynamic scanning |
-| Deps | npm audit, Snyk | Dependency vulnerabilities |
-| Secrets | git-secrets, TruffleHog | Secret scanning |
+Announce the detected mode and resolved paths to the user before proceeding.
 
-### Agent Coordination
-- `qe-security-scanner`: Multi-layer SAST/DAST scanning
-- `qe-api-contract-validator`: API security testing
-- `qe-quality-analyzer`: Security code review
+## Prerequisite Checks
+
+1. Codebase must contain source code files — **STOP if empty**
+2. Create SECURITY_DIR if it does not exist
+3. If SECURITY_DIR already contains artifacts, ask user: **resume, overwrite, or skip?**
+4. If `_docs/00_problem/security_approach.md` exists, read it for project-specific security requirements
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all phases (1 through 5). Update status as each phase completes.
+
+## Workflow
+
+### Phase 1: Dependency Scan
+
+**Role**: Security analyst
+**Goal**: Identify known vulnerabilities in project dependencies
+**Constraints**: Scan only — no code changes
+
+1. Detect the project's package manager(s): `requirements.txt`, `package.json`, `Cargo.toml`, `*.csproj`, `go.mod`
+2. Run the appropriate audit tool:
+   - Python: `pip audit` or `safety check`
+   - Node: `npm audit`
+   - Rust: `cargo audit`
+   - .NET: `dotnet list package --vulnerable`
+   - Go: `govulncheck`
+3. If no audit tool is available, manually inspect dependency files for known CVEs using WebSearch
+4. Record findings with CVE IDs, affected packages, severity, and recommended upgrade versions
+
+**Self-verification**:
+- [ ] All package manifests scanned
+- [ ] Each finding has a CVE ID or advisory reference
+- [ ] Upgrade paths identified for Critical/High findings
+
+**Save action**: Write `SECURITY_DIR/dependency_scan.md`
 
 ---
 
-## Key Vulnerability Tests
+### Phase 2: Static Analysis (SAST)
 
-### 1. Broken Access Control
-```javascript
-// Horizontal escalation - User A accessing User B's data
-test('user cannot access another user\'s order', async () => {
-  const userAToken = await login('userA');
-  const userBOrder = await createOrder('userB');
+**Role**: Security engineer
+**Goal**: Identify code-level vulnerabilities through static analysis
+**Constraints**: Analysis only — no code changes
 
-  const response = await api.get(`/orders/${userBOrder.id}`, {
-    headers: { Authorization: `Bearer ${userAToken}` }
-  });
-  expect(response.status).toBe(403);
-});
+Scan the codebase for these vulnerability patterns:
 
-// Vertical escalation - Regular user accessing admin
-test('regular user cannot access admin', async () => {
-  const userToken = await login('regularUser');
-  expect((await api.get('/admin/users', {
-    headers: { Authorization: `Bearer ${userToken}` }
-  })).status).toBe(403);
-});
-```
+**Injection**:
+- SQL injection via string interpolation or concatenation
+- Command injection (subprocess with shell=True, exec, eval, os.system)
+- XSS via unsanitized user input in HTML output
+- Template injection
 
-### 2. Injection Attacks
-```javascript
-// SQL Injection
-test('prevents SQL injection', async () => {
-  const malicious = "' OR '1'='1";
-  const response = await api.get(`/products?search=${malicious}`);
-  expect(response.body.length).toBeLessThan(100); // Not all products
-});
+**Authentication & Authorization**:
+- Hardcoded credentials, API keys, passwords, tokens
+- Missing authentication checks on endpoints
+- Missing authorization checks (horizontal/vertical escalation paths)
+- Weak password validation rules
 
-// XSS
-test('sanitizes HTML output', async () => {
-  const xss = '<script>alert("XSS")</script>';
-  await api.post('/comments', { text: xss });
+**Cryptographic Failures**:
+- Plaintext password storage (no hashing)
+- Weak hashing algorithms (MD5, SHA1 for passwords)
+- Hardcoded encryption keys or salts
+- Missing TLS/HTTPS enforcement
 
-  const html = (await api.get('/comments')).body;
-  expect(html).toContain('&lt;script&gt;');
-  expect(html).not.toContain('<script>');
-});
-```
+**Data Exposure**:
+- Sensitive data in logs or error messages (passwords, tokens, PII)
+- Sensitive fields in API responses (password hashes, SSNs)
+- Debug endpoints or verbose error messages in production configs
+- Secrets in version control (.env files, config with credentials)
 
-### 3. Cryptographic Failures
-```javascript
-test('passwords are hashed', async () => {
-  await db.users.create({ email: 'test@example.com', password: 'MyPassword123' });
-  const user = await db.users.findByEmail('test@example.com');
+**Insecure Deserialization**:
+- Pickle/marshal deserialization of untrusted data
+- JSON/XML parsing without size limits
 
-  expect(user.password).not.toBe('MyPassword123');
-  expect(user.password).toMatch(/^\$2[aby]\$\d{2}\$/); // bcrypt
-});
+**Self-verification**:
+- [ ] All source directories scanned
+- [ ] Each finding has file path and line number
+- [ ] No false positives from test files or comments
 
-test('no sensitive data in API response', async () => {
-  const response = await api.get('/users/me');
-  expect(response.body).not.toHaveProperty('password');
-  expect(response.body).not.toHaveProperty('ssn');
-});
-```
-
-### 4. Security Misconfiguration
-```javascript
-test('errors don\'t leak sensitive info', async () => {
-  const response = await api.post('/login', { email: 'nonexistent@test.com', password: 'wrong' });
-  expect(response.body.error).toBe('Invalid credentials'); // Generic message
-});
-
-test('sensitive endpoints not exposed', async () => {
-  const endpoints = ['/debug', '/.env', '/.git', '/admin'];
-  for (let ep of endpoints) {
-    expect((await fetch(`https://example.com${ep}`)).status).not.toBe(200);
-  }
-});
-```
-
-### 5. Rate Limiting
-```javascript
-test('rate limiting prevents brute force', async () => {
-  const responses = [];
-  for (let i = 0; i < 20; i++) {
-    responses.push(await api.post('/login', { email: 'test@example.com', password: 'wrong' }));
-  }
-  expect(responses.filter(r => r.status === 429).length).toBeGreaterThan(0);
-});
-```
+**Save action**: Write `SECURITY_DIR/static_analysis.md`
 
 ---
 
-## Security Checklist
+### Phase 3: OWASP Top 10 Review
+
+**Role**: Penetration tester
+**Goal**: Systematically review the codebase against current OWASP Top 10 categories
+**Constraints**: Review and document — no code changes
+
+1. Research the current OWASP Top 10 version at https://owasp.org/www-project-top-ten/
+2. For each OWASP category, assess the codebase:
+
+| Check | What to Look For |
+|-------|-----------------|
+| Broken Access Control | Missing auth middleware, IDOR vulnerabilities, CORS misconfiguration, directory traversal |
+| Cryptographic Failures | Weak algorithms, plaintext transmission, missing encryption at rest |
+| Injection | SQL, NoSQL, OS command, LDAP injection paths |
+| Insecure Design | Missing rate limiting, no input validation strategy, trust boundary violations |
+| Security Misconfiguration | Default credentials, unnecessary features enabled, missing security headers |
+| Vulnerable Components | Outdated dependencies (from Phase 1), unpatched frameworks |
+| Auth Failures | Brute force paths, weak session management, missing MFA |
+| Data Integrity Failures | Missing signature verification, insecure CI/CD, auto-update without verification |
+| Logging Failures | Missing audit logs, sensitive data in logs, no alerting for security events |
+| SSRF | Unvalidated URL inputs, internal network access from user-controlled URLs |
+
+3. Rate each category: PASS / FAIL / NOT_APPLICABLE
+4. If `security_approach.md` exists, cross-reference its requirements against findings
+
+**Self-verification**:
+- [ ] All current OWASP Top 10 categories assessed
+- [ ] Each FAIL has at least one specific finding with evidence
+- [ ] NOT_APPLICABLE categories have justification
+
+**Save action**: Write `SECURITY_DIR/owasp_review.md`
+
+---
+
+### Phase 4: Configuration & Infrastructure Review
+
+**Role**: DevSecOps engineer
+**Goal**: Review deployment configuration for security issues
+**Constraints**: Review only — no changes
+
+If Dockerfiles, CI/CD configs, or deployment configs exist:
+
+1. **Container security**: non-root user, minimal base images, no secrets in build args, health checks
+2. **CI/CD security**: secrets management, no credentials in pipeline files, artifact signing
+3. **Environment configuration**: .env handling, secrets injection method, environment separation
+4. **Network security**: exposed ports, TLS configuration, CORS settings, security headers
+
+If no deployment configs exist, skip this phase and note it in the report.
+
+**Self-verification**:
+- [ ] All Dockerfiles reviewed
+- [ ] All CI/CD configs reviewed
+- [ ] All environment/config files reviewed
+
+**Save action**: Write `SECURITY_DIR/infrastructure_review.md`
+
+---
+
+### Phase 5: Security Report
+
+**Role**: Security analyst
+**Goal**: Produce a consolidated security audit report
+**Constraints**: Concise, actionable, severity-ranked
+
+Consolidate findings from Phases 1-4 into a structured report:
+
+```markdown
+# Security Audit Report
+
+**Date**: [YYYY-MM-DD]
+**Scope**: [project name / target path]
+**Verdict**: PASS | PASS_WITH_WARNINGS | FAIL
+
+## Summary
+
+| Severity | Count |
+|----------|-------|
+| Critical | [N] |
+| High     | [N] |
+| Medium   | [N] |
+| Low      | [N] |
+
+## OWASP Top 10 Assessment
+
+| Category | Status | Findings |
+|----------|--------|----------|
+| [category] | PASS / FAIL / N/A | [count or —] |
+
+## Findings
+
+| # | Severity | Category | Location | Title |
+|---|----------|----------|----------|-------|
+| 1 | Critical | Injection | src/api.py:42 | SQL injection via f-string |
+
+### Finding Details
+
+**F1: [title]** (Severity / Category)
+- Location: `[file:line]`
+- Description: [what is vulnerable]
+- Impact: [what an attacker could do]
+- Remediation: [specific fix]
+
+## Dependency Vulnerabilities
+
+| Package | CVE | Severity | Fix Version |
+|---------|-----|----------|-------------|
+| [name] | [CVE-ID] | [sev] | [version] |
+
+## Recommendations
+
+### Immediate (Critical/High)
+- [action items]
+
+### Short-term (Medium)
+- [action items]
+
+### Long-term (Low / Hardening)
+- [action items]
+```
+
+**Self-verification**:
+- [ ] All findings from Phases 1-4 included
+- [ ] No duplicate findings
+- [ ] Every finding has remediation guidance
+- [ ] Verdict matches severity logic
+
+**Save action**: Write `SECURITY_DIR/security_report.md`
+
+**BLOCKING**: Present report summary to user.
+
+## Verdict Logic
+
+- **FAIL**: any Critical or High finding exists
+- **PASS_WITH_WARNINGS**: only Medium or Low findings
+- **PASS**: no findings
+
+## Security Checklist (Quick Reference)
 
 ### Authentication
 - [ ] Strong password requirements (12+ chars)
 - [ ] Password hashing (bcrypt, scrypt, Argon2)
 - [ ] MFA for sensitive operations
 - [ ] Account lockout after failed attempts
-- [ ] Session ID changes after login
-- [ ] Session timeout
+- [ ] Session timeout and rotation
 
 ### Authorization
 - [ ] Check authorization on every request
 - [ ] Least privilege principle
-- [ ] No horizontal escalation
-- [ ] No vertical escalation
+- [ ] No horizontal/vertical escalation paths
 
 ### Data Protection
 - [ ] HTTPS everywhere
 - [ ] Encrypted at rest
-- [ ] Secrets not in code/logs
+- [ ] Secrets not in code/logs/version control
 - [ ] PII compliance (GDPR)
 
 ### Input Validation
-- [ ] Server-side validation
+- [ ] Server-side validation on all inputs
 - [ ] Parameterized queries (no SQL injection)
 - [ ] Output encoding (no XSS)
-- [ ] Rate limiting
+- [ ] Rate limiting on sensitive endpoints
 
----
+### CI/CD Security
+- [ ] Dependency audit in pipeline
+- [ ] Secret scanning (git-secrets, TruffleHog)
+- [ ] SAST in pipeline (Semgrep, SonarQube)
+- [ ] No secrets in pipeline config files
 
-## CI/CD Integration
+## Escalation Rules
 
-```yaml
-# GitHub Actions
-security-checks:
-  steps:
-    - name: Dependency audit
-      run: npm audit --audit-level=high
-
-    - name: SAST scan
-      run: npm run sast
-
-    - name: Secret scan
-      uses: trufflesecurity/trufflehog@main
-
-    - name: DAST scan
-      if: github.ref == 'refs/heads/main'
-      run: docker run owasp/zap2docker-stable zap-baseline.py -t https://staging.example.com
-```
-
-**Pre-commit hooks:**
-```bash
-#!/bin/sh
-git-secrets --scan
-npm run lint:security
-```
-
----
-
-## Agent-Assisted Security Testing
-
-```typescript
-// Comprehensive multi-layer scan
-await Task("Security Scan", {
-  target: 'src/',
-  layers: { sast: true, dast: true, dependencies: true, secrets: true },
-  severity: ['critical', 'high', 'medium']
-}, "qe-security-scanner");
-
-// OWASP Top 10 testing
-await Task("OWASP Scan", {
-  categories: ['broken-access-control', 'injection', 'cryptographic-failures'],
-  depth: 'comprehensive'
-}, "qe-security-scanner");
-
-// Validate fix
-await Task("Validate Fix", {
-  vulnerability: 'CVE-2024-12345',
-  expectedResolution: 'upgrade package to v2.0.0',
-  retestAfterFix: true
-}, "qe-security-scanner");
-```
-
----
-
-## Agent Coordination Hints
-
-### Memory Namespace
-```
-aqe/security/
-├── scans/*           - Scan results
-├── vulnerabilities/* - Found vulnerabilities
-├── fixes/*           - Remediation tracking
-└── compliance/*      - Compliance status
-```
-
-### Fleet Coordination
-```typescript
-const securityFleet = await FleetManager.coordinate({
-  strategy: 'security-testing',
-  agents: [
-    'qe-security-scanner',
-    'qe-api-contract-validator',
-    'qe-quality-analyzer',
-    'qe-deployment-readiness'
-  ],
-  topology: 'parallel'
-});
-```
-
----
+| Situation | Action |
+|-----------|--------|
+| Critical vulnerability found | **WARN user immediately** — do not defer to report |
+| No audit tools available | Use manual code review + WebSearch for CVEs |
+| Codebase too large for full scan | **ASK user** to prioritize areas (API endpoints, auth, data access) |
+| Finding requires runtime testing (DAST) | Note as "requires DAST verification" — this skill does static analysis only |
+| Conflicting security requirements | **ASK user** to prioritize |
 
 ## Common Mistakes
 
-### ❌ Security by Obscurity
-Hiding admin at `/super-secret-admin` → **Use proper auth**
+- **Security by obscurity**: hiding admin at secret URLs instead of proper auth
+- **Client-side validation only**: JavaScript validation can be bypassed; always validate server-side
+- **Trusting user input**: assume all input is malicious until proven otherwise
+- **Hardcoded secrets**: use environment variables and secret management, never code
+- **Skipping dependency scan**: known CVEs in dependencies are the lowest-hanging fruit for attackers
 
-### ❌ Client-Side Validation Only
-JavaScript validation can be bypassed → **Always validate server-side**
+## Trigger Conditions
 
-### ❌ Trusting User Input
-Assuming input is safe → **Sanitize, validate, escape all input**
+When the user wants to:
+- Conduct a security audit of the codebase
+- Check for vulnerabilities before deployment
+- Review security posture after implementation
+- Validate security requirements from `security_approach.md`
 
-### ❌ Hardcoded Secrets
-API keys in code → **Environment variables, secret management**
+**Keywords**: "security audit", "security scan", "OWASP", "vulnerability scan", "security check", "pentest"
 
----
+**Differentiation**:
+- Lightweight security checks during implementation → handled by `/code-review` Phase 4
+- Full security audit → use this skill
+- Security requirements gathering → handled by `/problem` (security dimension)
 
-## Related Skills
-- [agentic-quality-engineering](../agentic-quality-engineering/) - Security with agents
-- [api-testing-patterns](../api-testing-patterns/) - API security testing
-- [compliance-testing](../compliance-testing/) - GDPR, HIPAA, SOC2
+## Methodology Quick Reference
 
----
-
-## Remember
-
-**Think like an attacker:** What would you try to break? Test that.
-**Build like a defender:** Assume input is malicious until proven otherwise.
-**Test continuously:** Security testing is ongoing, not one-time.
-
-**With Agents:** Agents automate vulnerability scanning, track remediation, and validate fixes. Use agents to maintain security posture at scale.
+```
+┌────────────────────────────────────────────────────────────────┐
+│              Security Audit (5-Phase Method)                    │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: Source code exists, SECURITY_DIR created               │
+│                                                                │
+│ 1. Dependency Scan    → dependency_scan.md                     │
+│ 2. Static Analysis    → static_analysis.md                     │
+│ 3. OWASP Top 10      → owasp_review.md                        │
+│ 4. Infrastructure     → infrastructure_review.md               │
+│ 5. Security Report    → security_report.md                     │
+│    [BLOCKING: user reviews report]                             │
+├────────────────────────────────────────────────────────────────┤
+│ Verdict: PASS / PASS_WITH_WARNINGS / FAIL                      │
+│ Principles: OWASP-driven · Evidence-based · Severity-ranked    │
+│             Actionable · Save immediately                      │
+└────────────────────────────────────────────────────────────────┘
+```
diff --git a/.cursor/skills/security/evals/security-testing.yaml b/.cursor/skills/security/evals/security-testing.yaml
deleted file mode 100644
index b299935..0000000
--- a/.cursor/skills/security/evals/security-testing.yaml
+++ /dev/null
@@ -1,789 +0,0 @@
-# =============================================================================
-# AQE Skill Evaluation Test Suite: Security Testing v1.0.0
-# =============================================================================
-#
-# Comprehensive evaluation suite for the security-testing skill per ADR-056.
-# Tests OWASP Top 10 2021 detection, severity classification, remediation
-# quality, and cross-model consistency.
-#
-# Schema: .claude/skills/.validation/schemas/skill-eval.schema.json
-# Validator: .claude/skills/security-testing/scripts/validate-config.json
-#
-# Coverage:
-# - OWASP A01:2021 - Broken Access Control
-# - OWASP A02:2021 - Cryptographic Failures
-# - OWASP A03:2021 - Injection (SQL, XSS, Command)
-# - OWASP A07:2021 - Identification and Authentication Failures
-# - Negative tests (no false positives on secure code)
-#
-# =============================================================================
-
-skill: security-testing
-version: 1.0.0
-description: >
-  Comprehensive evaluation suite for the security-testing skill.
-  Tests OWASP Top 10 2021 detection capabilities, CWE classification accuracy,
-  CVSS scoring, severity classification, and remediation quality.
-  Supports multi-model testing and integrates with ReasoningBank for
-  continuous improvement.
-
-# =============================================================================
-# Multi-Model Configuration
-# =============================================================================
-
-models_to_test:
-  - claude-3.5-sonnet    # Primary model (high accuracy expected)
-  - claude-3-haiku       # Fast model (minimum quality threshold)
-  - gpt-4o               # Cross-vendor validation
-
-# =============================================================================
-# MCP Integration Configuration
-# =============================================================================
-
-mcp_integration:
-  enabled: true
-  namespace: skill-validation
-
-  # Query existing security patterns before running evals
-  query_patterns: true
-
-  # Track each test outcome for learning feedback loop
-  track_outcomes: true
-
-  # Store successful patterns after evals complete
-  store_patterns: true
-
-  # Share learning with fleet coordinator agents
-  share_learning: true
-
-  # Update quality gate with validation metrics
-  update_quality_gate: true
-
-  # Target agents for learning distribution
-  target_agents:
-    - qe-learning-coordinator
-    - qe-queen-coordinator
-    - qe-security-scanner
-    - qe-security-auditor
-
-# =============================================================================
-# ReasoningBank Learning Configuration
-# =============================================================================
-
-learning:
-  store_success_patterns: true
-  store_failure_patterns: true
-  pattern_ttl_days: 90
-  min_confidence_to_store: 0.7
-  cross_model_comparison: true
-
-# =============================================================================
-# Result Format Configuration
-# =============================================================================
-
-result_format:
-  json_output: true
-  markdown_report: true
-  include_raw_output: false
-  include_timing: true
-  include_token_usage: true
-
-# =============================================================================
-# Environment Setup
-# =============================================================================
-
-setup:
-  required_tools:
-    - jq       # JSON parsing (required)
-    - npm      # Dependency audit (optional but recommended)
-
-  environment_variables:
-    SECURITY_SCAN_DEPTH: "deep"
-    OWASP_ENABLED: "true"
-    SEVERITY_THRESHOLD: "medium"
-
-  fixtures:
-    - name: vulnerable_express_app
-      path: fixtures/vulnerable-express-app.js
-      content: |
-        const express = require('express');
-        const app = express();
-
-        // SQL Injection vulnerability
-        app.get('/user', (req, res) => {
-          const query = `SELECT * FROM users WHERE id = ${req.params.id}`;
-          db.query(query);
-        });
-
-        // XSS vulnerability
-        app.get('/profile', (req, res) => {
-          res.send(`<h1>Hello ${req.query.name}</h1>`);
-        });
-
-        // Path Traversal vulnerability
-        app.get('/file', (req, res) => {
-          const path = './uploads/' + req.query.filename;
-          res.sendFile(path);
-        });
-
-# =============================================================================
-# TEST CASES
-# =============================================================================
-
-test_cases:
-  # ---------------------------------------------------------------------------
-  # CATEGORY: SQL Injection (OWASP A03:2021)
-  # ---------------------------------------------------------------------------
-
-  - id: tc001_sql_injection_string_concat
-    description: "Detect SQL injection via string concatenation in Node.js"
-    category: injection
-    priority: critical
-
-    input:
-      code: |
-        const express = require('express');
-        const mysql = require('mysql');
-        const app = express();
-
-        app.get('/api/users', (req, res) => {
-          const userId = req.params.id;
-          const query = `SELECT * FROM users WHERE id = ${userId}`;
-          db.query(query, (err, results) => {
-            res.json(results);
-          });
-        });
-      context:
-        language: javascript
-        framework: express
-        environment: production
-
-    expected_output:
-      must_contain:
-        - "SQL injection"
-        - "parameterized"
-      must_not_contain:
-        - "no vulnerabilities"
-        - "secure"
-      must_match_regex:
-        - "CWE-89|CWE-564"
-        - "A03:20[21][0-9]"
-      severity_classification: critical
-      finding_count:
-        min: 1
-        max: 3
-      recommendation_count:
-        min: 1
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.8
-      reasoning_quality_min: 0.7
-      grading_rubric:
-        completeness: 0.3
-        accuracy: 0.5
-        actionability: 0.2
-
-    timeout_ms: 30000
-
-  - id: tc002_sql_injection_parameterized_safe
-    description: "Verify parameterized queries are NOT flagged as vulnerable"
-    category: injection
-    priority: high
-
-    input:
-      code: |
-        app.get('/api/users', (req, res) => {
-          const userId = parseInt(req.params.id, 10);
-          db.query('SELECT * FROM users WHERE id = ?', [userId], (err, results) => {
-            res.json(results);
-          });
-        });
-      context:
-        language: javascript
-        framework: express
-
-    expected_output:
-      must_contain:
-        - "parameterized"
-        - "secure"
-      must_not_contain:
-        - "SQL injection"
-        - "critical"
-        - "vulnerable"
-      severity_classification: info
-      finding_count:
-        max: 1
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.7
-      allow_partial: true
-
-  # ---------------------------------------------------------------------------
-  # CATEGORY: Cross-Site Scripting (OWASP A03:2021)
-  # ---------------------------------------------------------------------------
-
-  - id: tc003_xss_reflected_html_output
-    description: "Detect reflected XSS in unescaped HTML output"
-    category: injection
-    priority: critical
-
-    input:
-      code: |
-        app.get('/profile', (req, res) => {
-          const name = req.query.name;
-          res.send(`
-            <html>
-              <body>
-                <h1>Welcome, ${name}!</h1>
-                <p>Your profile has been loaded.</p>
-              </body>
-            </html>
-          `);
-        });
-      context:
-        language: javascript
-        framework: express
-
-    expected_output:
-      must_contain:
-        - "XSS"
-        - "cross-site scripting"
-        - "sanitize"
-        - "escape"
-      must_match_regex:
-        - "CWE-79"
-      severity_classification: high
-      finding_count:
-        min: 1
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.8
-      reasoning_quality_min: 0.75
-
-  - id: tc004_xss_dom_based_innerhtml
-    description: "Detect DOM-based XSS via innerHTML assignment"
-    category: injection
-    priority: high
-
-    input:
-      code: |
-        // Client-side JavaScript
-        const params = new URLSearchParams(window.location.search);
-        const message = params.get('msg');
-        document.getElementById('output').innerHTML = message;
-      context:
-        language: javascript
-        framework: vanilla
-        environment: production
-
-    expected_output:
-      must_contain:
-        - "DOM"
-        - "XSS"
-        - "innerHTML"
-        - "textContent"
-      must_match_regex:
-        - "CWE-79"
-      severity_classification: high
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.7
-
-  # ---------------------------------------------------------------------------
-  # CATEGORY: Authentication Failures (OWASP A07:2021)
-  # ---------------------------------------------------------------------------
-
-  - id: tc005_hardcoded_credentials
-    description: "Detect hardcoded credentials and API keys"
-    category: authentication
-    priority: critical
-
-    input:
-      code: |
-        const ADMIN_PASSWORD = 'admin123';
-        const API_KEY = 'sk-1234567890abcdef';
-        const DATABASE_URL = 'postgres://admin:password123@localhost/db';
-
-        app.post('/login', (req, res) => {
-          if (req.body.password === ADMIN_PASSWORD) {
-            req.session.isAdmin = true;
-            res.send('Login successful');
-          }
-        });
-      context:
-        language: javascript
-        framework: express
-
-    expected_output:
-      must_contain:
-        - "hardcoded"
-        - "credentials"
-        - "secret"
-        - "environment variable"
-      must_match_regex:
-        - "CWE-798|CWE-259"
-      severity_classification: critical
-      finding_count:
-        min: 2
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.8
-      reasoning_quality_min: 0.8
-
-  - id: tc006_weak_password_hashing
-    description: "Detect weak password hashing algorithms (MD5, SHA1)"
-    category: authentication
-    priority: high
-
-    input:
-      code: |
-        const crypto = require('crypto');
-
-        function hashPassword(password) {
-          return crypto.createHash('md5').update(password).digest('hex');
-        }
-
-        function verifyPassword(password, hash) {
-          return hashPassword(password) === hash;
-        }
-      context:
-        language: javascript
-        framework: nodejs
-
-    expected_output:
-      must_contain:
-        - "MD5"
-        - "weak"
-        - "bcrypt"
-        - "argon2"
-      must_match_regex:
-        - "CWE-327|CWE-328|CWE-916"
-      severity_classification: high
-      finding_count:
-        min: 1
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.8
-
-  # ---------------------------------------------------------------------------
-  # CATEGORY: Broken Access Control (OWASP A01:2021)
-  # ---------------------------------------------------------------------------
-
-  - id: tc007_idor_missing_authorization
-    description: "Detect IDOR vulnerability with missing authorization check"
-    category: authorization
-    priority: critical
-
-    input:
-      code: |
-        app.get('/api/users/:id/profile', (req, res) => {
-          // No authorization check - any user can access any profile
-          const userId = req.params.id;
-          db.query('SELECT * FROM profiles WHERE user_id = ?', [userId])
-            .then(profile => res.json(profile));
-        });
-
-        app.delete('/api/users/:id', (req, res) => {
-          // No check if requesting user owns this account
-          db.query('DELETE FROM users WHERE id = ?', [req.params.id]);
-          res.send('User deleted');
-        });
-      context:
-        language: javascript
-        framework: express
-
-    expected_output:
-      must_contain:
-        - "authorization"
-        - "access control"
-        - "IDOR"
-        - "ownership"
-      must_match_regex:
-        - "CWE-639|CWE-284|CWE-862"
-        - "A01:2021"
-      severity_classification: critical
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.7
-
-  # ---------------------------------------------------------------------------
-  # CATEGORY: Cryptographic Failures (OWASP A02:2021)
-  # ---------------------------------------------------------------------------
-
-  - id: tc008_weak_encryption_des
-    description: "Detect use of weak encryption algorithms (DES, RC4)"
-    category: cryptography
-    priority: high
-
-    input:
-      code: |
-        const crypto = require('crypto');
-
-        function encryptData(data, key) {
-          const cipher = crypto.createCipher('des', key);
-          return cipher.update(data, 'utf8', 'hex') + cipher.final('hex');
-        }
-
-        function decryptData(data, key) {
-          const decipher = crypto.createDecipher('des', key);
-          return decipher.update(data, 'hex', 'utf8') + decipher.final('utf8');
-        }
-      context:
-        language: javascript
-        framework: nodejs
-
-    expected_output:
-      must_contain:
-        - "DES"
-        - "weak"
-        - "deprecated"
-        - "AES"
-      must_match_regex:
-        - "CWE-327|CWE-328"
-        - "A02:2021"
-      severity_classification: high
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.7
-
-  - id: tc009_plaintext_password_storage
-    description: "Detect plaintext password storage"
-    category: cryptography
-    priority: critical
-
-    input:
-      code: |
-        class User {
-          constructor(email, password) {
-            this.email = email;
-            this.password = password;  // Stored in plaintext!
-          }
-
-          save() {
-            db.query('INSERT INTO users (email, password) VALUES (?, ?)',
-                     [this.email, this.password]);
-          }
-        }
-      context:
-        language: javascript
-        framework: nodejs
-
-    expected_output:
-      must_contain:
-        - "plaintext"
-        - "password"
-        - "hash"
-        - "bcrypt"
-      must_match_regex:
-        - "CWE-256|CWE-312"
-        - "A02:2021"
-      severity_classification: critical
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.8
-
-  # ---------------------------------------------------------------------------
-  # CATEGORY: Path Traversal (Related to A01:2021)
-  # ---------------------------------------------------------------------------
-
-  - id: tc010_path_traversal_file_access
-    description: "Detect path traversal vulnerability in file access"
-    category: injection
-    priority: critical
-
-    input:
-      code: |
-        const fs = require('fs');
-
-        app.get('/download', (req, res) => {
-          const filename = req.query.file;
-          const filepath = './uploads/' + filename;
-          res.sendFile(filepath);
-        });
-
-        app.get('/read', (req, res) => {
-          const content = fs.readFileSync('./data/' + req.params.name);
-          res.send(content);
-        });
-      context:
-        language: javascript
-        framework: express
-
-    expected_output:
-      must_contain:
-        - "path traversal"
-        - "directory traversal"
-        - "../"
-        - "sanitize"
-      must_match_regex:
-        - "CWE-22|CWE-23"
-      severity_classification: critical
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.7
-
-  # ---------------------------------------------------------------------------
-  # CATEGORY: Negative Tests (No False Positives)
-  # ---------------------------------------------------------------------------
-
-  - id: tc011_secure_code_no_false_positives
-    description: "Verify secure code is NOT flagged as vulnerable"
-    category: negative
-    priority: critical
-
-    input:
-      code: |
-        const express = require('express');
-        const helmet = require('helmet');
-        const rateLimit = require('express-rate-limit');
-        const bcrypt = require('bcrypt');
-        const validator = require('validator');
-
-        const app = express();
-        app.use(helmet());
-        app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 }));
-
-        app.post('/api/users', async (req, res) => {
-          const { email, password } = req.body;
-
-          // Input validation
-          if (!validator.isEmail(email)) {
-            return res.status(400).json({ error: 'Invalid email' });
-          }
-
-          // Secure password hashing
-          const hashedPassword = await bcrypt.hash(password, 12);
-
-          // Parameterized query
-          await db.query(
-            'INSERT INTO users (email, password) VALUES ($1, $2)',
-            [email, hashedPassword]
-          );
-
-          res.status(201).json({ message: 'User created' });
-        });
-      context:
-        language: javascript
-        framework: express
-        environment: production
-
-    expected_output:
-      must_contain:
-        - "secure"
-        - "best practice"
-      must_not_contain:
-        - "SQL injection"
-        - "XSS"
-        - "critical vulnerability"
-        - "high severity"
-      finding_count:
-        max: 2  # Allow informational findings only
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.6
-      allow_partial: true
-
-  - id: tc012_secure_auth_implementation
-    description: "Verify secure authentication is recognized as safe"
-    category: negative
-    priority: high
-
-    input:
-      code: |
-        const bcrypt = require('bcrypt');
-        const jwt = require('jsonwebtoken');
-
-        async function login(email, password) {
-          const user = await User.findByEmail(email);
-          if (!user) {
-            return { error: 'Invalid credentials' };
-          }
-
-          const match = await bcrypt.compare(password, user.passwordHash);
-          if (!match) {
-            return { error: 'Invalid credentials' };
-          }
-
-          const token = jwt.sign(
-            { userId: user.id },
-            process.env.JWT_SECRET,
-            { expiresIn: '1h' }
-          );
-
-          return { token };
-        }
-      context:
-        language: javascript
-        framework: nodejs
-
-    expected_output:
-      must_contain:
-        - "bcrypt"
-        - "jwt"
-        - "secure"
-      must_not_contain:
-        - "vulnerable"
-        - "critical"
-        - "hardcoded"
-      severity_classification: info
-
-    validation:
-      schema_check: true
-      allow_partial: true
-
-  # ---------------------------------------------------------------------------
-  # CATEGORY: Python Security (Multi-language Support)
-  # ---------------------------------------------------------------------------
-
-  - id: tc013_python_sql_injection
-    description: "Detect SQL injection in Python Flask application"
-    category: injection
-    priority: critical
-
-    input:
-      code: |
-        from flask import Flask, request
-        import sqlite3
-
-        app = Flask(__name__)
-
-        @app.route('/user')
-        def get_user():
-            user_id = request.args.get('id')
-            conn = sqlite3.connect('users.db')
-            cursor = conn.cursor()
-            cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
-            return str(cursor.fetchone())
-      context:
-        language: python
-        framework: flask
-
-    expected_output:
-      must_contain:
-        - "SQL injection"
-        - "parameterized"
-        - "f-string"
-      must_match_regex:
-        - "CWE-89"
-      severity_classification: critical
-      finding_count:
-        min: 1
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.7
-
-  - id: tc014_python_ssti_jinja
-    description: "Detect Server-Side Template Injection in Jinja2"
-    category: injection
-    priority: critical
-
-    input:
-      code: |
-        from flask import Flask, request, render_template_string
-
-        app = Flask(__name__)
-
-        @app.route('/render')
-        def render():
-            template = request.args.get('template')
-            return render_template_string(template)
-      context:
-        language: python
-        framework: flask
-
-    expected_output:
-      must_contain:
-        - "SSTI"
-        - "template injection"
-        - "render_template_string"
-        - "Jinja2"
-      must_match_regex:
-        - "CWE-94|CWE-1336"
-      severity_classification: critical
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.7
-
-  - id: tc015_python_pickle_deserialization
-    description: "Detect insecure deserialization with pickle"
-    category: injection
-    priority: critical
-
-    input:
-      code: |
-        import pickle
-        from flask import Flask, request
-
-        app = Flask(__name__)
-
-        @app.route('/load')
-        def load_data():
-            data = request.get_data()
-            obj = pickle.loads(data)
-            return str(obj)
-      context:
-        language: python
-        framework: flask
-
-    expected_output:
-      must_contain:
-        - "pickle"
-        - "deserialization"
-        - "untrusted"
-        - "RCE"
-      must_match_regex:
-        - "CWE-502"
-        - "A08:2021"
-      severity_classification: critical
-
-    validation:
-      schema_check: true
-      keyword_match_threshold: 0.7
-
-# =============================================================================
-# SUCCESS CRITERIA
-# =============================================================================
-
-success_criteria:
-  # Overall pass rate (90% of tests must pass)
-  pass_rate: 0.9
-
-  # Critical tests must ALL pass (100%)
-  critical_pass_rate: 1.0
-
-  # Average reasoning quality score
-  avg_reasoning_quality: 0.75
-
-  # Maximum suite execution time (5 minutes)
-  max_execution_time_ms: 300000
-
-  # Maximum variance between model results (15%)
-  cross_model_variance: 0.15
-
-# =============================================================================
-# METADATA
-# =============================================================================
-
-metadata:
-  author: "qe-security-auditor"
-  created: "2026-02-02"
-  last_updated: "2026-02-02"
-  coverage_target: >
-    OWASP Top 10 2021: A01 (Broken Access Control), A02 (Cryptographic Failures),
-    A03 (Injection - SQL, XSS, SSTI, Command), A07 (Authentication Failures),
-    A08 (Software Integrity - Deserialization). Covers JavaScript/Node.js
-    Express apps and Python Flask apps. 15 test cases with 90% pass rate
-    requirement and 100% critical pass rate.
diff --git a/.cursor/skills/security/schemas/output.json b/.cursor/skills/security/schemas/output.json
deleted file mode 100644
index 6ad99ad..0000000
--- a/.cursor/skills/security/schemas/output.json
+++ /dev/null
@@ -1,879 +0,0 @@
-{
-  "$schema": "https://json-schema.org/draft/2020-12/schema",
-  "$id": "https://agentic-qe.dev/schemas/security-testing-output.json",
-  "title": "AQE Security Testing Skill Output Schema",
-  "description": "Schema for security-testing skill output validation. Extends the base skill-output template with OWASP Top 10 categories, CWE identifiers, and CVSS scoring.",
-  "type": "object",
-  "required": ["skillName", "version", "timestamp", "status", "trustTier", "output"],
-  "properties": {
-    "skillName": {
-      "type": "string",
-      "const": "security-testing",
-      "description": "Must be 'security-testing'"
-    },
-    "version": {
-      "type": "string",
-      "pattern": "^\\d+\\.\\d+\\.\\d+(-[a-zA-Z0-9]+)?$",
-      "description": "Semantic version of the skill"
-    },
-    "timestamp": {
-      "type": "string",
-      "format": "date-time",
-      "description": "ISO 8601 timestamp of output generation"
-    },
-    "status": {
-      "type": "string",
-      "enum": ["success", "partial", "failed", "skipped"],
-      "description": "Overall execution status"
-    },
-    "trustTier": {
-      "type": "integer",
-      "const": 3,
-      "description": "Trust tier 3 indicates full validation with eval suite"
-    },
-    "output": {
-      "type": "object",
-      "required": ["summary", "findings", "owaspCategories"],
-      "properties": {
-        "summary": {
-          "type": "string",
-          "minLength": 50,
-          "maxLength": 2000,
-          "description": "Human-readable summary of security findings"
-        },
-        "score": {
-          "$ref": "#/$defs/securityScore",
-          "description": "Overall security score"
-        },
-        "findings": {
-          "type": "array",
-          "items": {
-            "$ref": "#/$defs/securityFinding"
-          },
-          "maxItems": 500,
-          "description": "List of security vulnerabilities discovered"
-        },
-        "recommendations": {
-          "type": "array",
-          "items": {
-            "$ref": "#/$defs/securityRecommendation"
-          },
-          "maxItems": 100,
-          "description": "Prioritized remediation recommendations with code examples"
-        },
-        "metrics": {
-          "$ref": "#/$defs/securityMetrics",
-          "description": "Security scan metrics and statistics"
-        },
-        "owaspCategories": {
-          "$ref": "#/$defs/owaspCategoryBreakdown",
-          "description": "OWASP Top 10 2021 category breakdown"
-        },
-        "artifacts": {
-          "type": "array",
-          "items": {
-            "$ref": "#/$defs/artifact"
-          },
-          "maxItems": 50,
-          "description": "Generated security reports and scan artifacts"
-        },
-        "timeline": {
-          "type": "array",
-          "items": {
-            "$ref": "#/$defs/timelineEvent"
-          },
-          "description": "Scan execution timeline"
-        },
-        "scanConfiguration": {
-          "$ref": "#/$defs/scanConfiguration",
-          "description": "Configuration used for the security scan"
-        }
-      }
-    },
-    "metadata": {
-      "$ref": "#/$defs/metadata"
-    },
-    "validation": {
-      "$ref": "#/$defs/validationResult"
-    },
-    "learning": {
-      "$ref": "#/$defs/learningData"
-    }
-  },
-  "$defs": {
-    "securityScore": {
-      "type": "object",
-      "required": ["value", "max"],
-      "properties": {
-        "value": {
-          "type": "number",
-          "minimum": 0,
-          "maximum": 100,
-          "description": "Security score (0=critical issues, 100=no issues)"
-        },
-        "max": {
-          "type": "number",
-          "const": 100,
-          "description": "Maximum score is always 100"
-        },
-        "grade": {
-          "type": "string",
-          "pattern": "^[A-F][+-]?$",
-          "description": "Letter grade: A (90-100), B (80-89), C (70-79), D (60-69), F (<60)"
-        },
-        "trend": {
-          "type": "string",
-          "enum": ["improving", "stable", "declining", "unknown"],
-          "description": "Trend compared to previous scans"
-        },
-        "riskLevel": {
-          "type": "string",
-          "enum": ["critical", "high", "medium", "low", "minimal"],
-          "description": "Overall risk level assessment"
-        }
-      }
-    },
-    "securityFinding": {
-      "type": "object",
-      "required": ["id", "title", "severity", "owasp"],
-      "properties": {
-        "id": {
-          "type": "string",
-          "pattern": "^SEC-\\d{3,6}$",
-          "description": "Unique finding identifier (e.g., SEC-001)"
-        },
-        "title": {
-          "type": "string",
-          "minLength": 10,
-          "maxLength": 200,
-          "description": "Finding title describing the vulnerability"
-        },
-        "description": {
-          "type": "string",
-          "maxLength": 2000,
-          "description": "Detailed description of the vulnerability"
-        },
-        "severity": {
-          "type": "string",
-          "enum": ["critical", "high", "medium", "low", "info"],
-          "description": "Severity: critical (CVSS 9.0-10.0), high (7.0-8.9), medium (4.0-6.9), low (0.1-3.9), info (0)"
-        },
-        "owasp": {
-          "type": "string",
-          "pattern": "^A(0[1-9]|10):20(21|25)$",
-          "description": "OWASP Top 10 category (e.g., A01:2021, A03:2025)"
-        },
-        "owaspCategory": {
-          "type": "string",
-          "enum": [
-            "A01:2021-Broken-Access-Control",
-            "A02:2021-Cryptographic-Failures",
-            "A03:2021-Injection",
-            "A04:2021-Insecure-Design",
-            "A05:2021-Security-Misconfiguration",
-            "A06:2021-Vulnerable-Components",
-            "A07:2021-Identification-Authentication-Failures",
-            "A08:2021-Software-Data-Integrity-Failures",
-            "A09:2021-Security-Logging-Monitoring-Failures",
-            "A10:2021-Server-Side-Request-Forgery"
-          ],
-          "description": "Full OWASP category name"
-        },
-        "cwe": {
-          "type": "string",
-          "pattern": "^CWE-\\d{1,4}$",
-          "description": "CWE identifier (e.g., CWE-79 for XSS, CWE-89 for SQLi)"
-        },
-        "cvss": {
-          "type": "object",
-          "properties": {
-            "score": {
-              "type": "number",
-              "minimum": 0,
-              "maximum": 10,
-              "description": "CVSS v3.1 base score"
-            },
-            "vector": {
-              "type": "string",
-              "pattern": "^CVSS:3\\.1/AV:[NALP]/AC:[LH]/PR:[NLH]/UI:[NR]/S:[UC]/C:[NLH]/I:[NLH]/A:[NLH]$",
-              "description": "CVSS v3.1 vector string"
-            },
-            "severity": {
-              "type": "string",
-              "enum": ["None", "Low", "Medium", "High", "Critical"],
-              "description": "CVSS severity rating"
-            }
-          }
-        },
-        "location": {
-          "$ref": "#/$defs/location",
-          "description": "Location of the vulnerability"
-        },
-        "evidence": {
-          "type": "string",
-          "maxLength": 5000,
-          "description": "Evidence: code snippet, request/response, or PoC"
-        },
-        "remediation": {
-          "type": "string",
-          "maxLength": 2000,
-          "description": "Specific fix instructions for this finding"
-        },
-        "references": {
-          "type": "array",
-          "items": {
-            "type": "object",
-            "required": ["title", "url"],
-            "properties": {
-              "title": { "type": "string" },
-              "url": { "type": "string", "format": "uri" }
-            }
-          },
-          "maxItems": 10,
-          "description": "External references (OWASP, CWE, CVE, etc.)"
-        },
-        "falsePositive": {
-          "type": "boolean",
-          "default": false,
-          "description": "Potential false positive flag"
-        },
-        "confidence": {
-          "type": "number",
-          "minimum": 0,
-          "maximum": 1,
-          "description": "Confidence in finding accuracy (0.0-1.0)"
-        },
-        "exploitability": {
-          "type": "string",
-          "enum": ["trivial", "easy", "moderate", "difficult", "theoretical"],
-          "description": "How easy is it to exploit this vulnerability"
-        },
-        "affectedVersions": {
-          "type": "array",
-          "items": { "type": "string" },
-          "description": "Affected package/library versions for dependency vulnerabilities"
-        },
-        "cve": {
-          "type": "string",
-          "pattern": "^CVE-\\d{4}-\\d{4,}$",
-          "description": "CVE identifier if applicable"
-        }
-      }
-    },
-    "securityRecommendation": {
-      "type": "object",
-      "required": ["id", "title", "priority", "owaspCategories"],
-      "properties": {
-        "id": {
-          "type": "string",
-          "pattern": "^REC-\\d{3,6}$",
-          "description": "Unique recommendation identifier"
-        },
-        "title": {
-          "type": "string",
-          "minLength": 10,
-          "maxLength": 200,
-          "description": "Recommendation title"
-        },
-        "description": {
-          "type": "string",
-          "maxLength": 2000,
-          "description": "Detailed recommendation description"
-        },
-        "priority": {
-          "type": "string",
-          "enum": ["critical", "high", "medium", "low"],
-          "description": "Remediation priority"
-        },
-        "effort": {
-          "type": "string",
-          "enum": ["trivial", "low", "medium", "high", "major"],
-          "description": "Estimated effort: trivial(<1hr), low(1-4hr), medium(1-3d), high(1-2wk), major(>2wk)"
-        },
-        "impact": {
-          "type": "integer",
-          "minimum": 1,
-          "maximum": 10,
-          "description": "Security impact if implemented (1-10)"
-        },
-        "relatedFindings": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "pattern": "^SEC-\\d{3,6}$"
-          },
-          "description": "IDs of findings this addresses"
-        },
-        "owaspCategories": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "pattern": "^A(0[1-9]|10):20(21|25)$"
-          },
-          "description": "OWASP categories this recommendation addresses"
-        },
-        "codeExample": {
-          "type": "object",
-          "properties": {
-            "before": {
-              "type": "string",
-              "maxLength": 2000,
-              "description": "Vulnerable code example"
-            },
-            "after": {
-              "type": "string",
-              "maxLength": 2000,
-              "description": "Secure code example"
-            },
-            "language": {
-              "type": "string",
-              "description": "Programming language"
-            }
-          },
-          "description": "Before/after code examples for remediation"
-        },
-        "resources": {
-          "type": "array",
-          "items": {
-            "type": "object",
-            "required": ["title", "url"],
-            "properties": {
-              "title": { "type": "string" },
-              "url": { "type": "string", "format": "uri" }
-            }
-          },
-          "maxItems": 10,
-          "description": "External resources and documentation"
-        },
-        "automatable": {
-          "type": "boolean",
-          "description": "Can this fix be automated?"
-        },
-        "fixCommand": {
-          "type": "string",
-          "description": "CLI command to apply fix if automatable"
-        }
-      }
-    },
-    "owaspCategoryBreakdown": {
-      "type": "object",
-      "description": "OWASP Top 10 2021 category scores and findings",
-      "properties": {
-        "A01:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A01:2021 - Broken Access Control"
-        },
-        "A02:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A02:2021 - Cryptographic Failures"
-        },
-        "A03:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A03:2021 - Injection"
-        },
-        "A04:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A04:2021 - Insecure Design"
-        },
-        "A05:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A05:2021 - Security Misconfiguration"
-        },
-        "A06:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A06:2021 - Vulnerable and Outdated Components"
-        },
-        "A07:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A07:2021 - Identification and Authentication Failures"
-        },
-        "A08:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A08:2021 - Software and Data Integrity Failures"
-        },
-        "A09:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A09:2021 - Security Logging and Monitoring Failures"
-        },
-        "A10:2021": {
-          "$ref": "#/$defs/owaspCategoryScore",
-          "description": "A10:2021 - Server-Side Request Forgery (SSRF)"
-        }
-      },
-      "additionalProperties": false
-    },
-    "owaspCategoryScore": {
-      "type": "object",
-      "required": ["tested", "score"],
-      "properties": {
-        "tested": {
-          "type": "boolean",
-          "description": "Whether this category was tested"
-        },
-        "score": {
-          "type": "number",
-          "minimum": 0,
-          "maximum": 100,
-          "description": "Category score (100 = no issues, 0 = critical)"
-        },
-        "grade": {
-          "type": "string",
-          "pattern": "^[A-F][+-]?$",
-          "description": "Letter grade for this category"
-        },
-        "findingCount": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Number of findings in this category"
-        },
-        "criticalCount": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Number of critical findings"
-        },
-        "highCount": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Number of high severity findings"
-        },
-        "status": {
-          "type": "string",
-          "enum": ["pass", "fail", "warn", "skip"],
-          "description": "Category status"
-        },
-        "description": {
-          "type": "string",
-          "description": "Category description and context"
-        },
-        "cwes": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "pattern": "^CWE-\\d{1,4}$"
-          },
-          "description": "CWEs found in this category"
-        }
-      }
-    },
-    "securityMetrics": {
-      "type": "object",
-      "properties": {
-        "totalFindings": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Total vulnerabilities found"
-        },
-        "criticalCount": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Critical severity findings"
-        },
-        "highCount": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "High severity findings"
-        },
-        "mediumCount": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Medium severity findings"
-        },
-        "lowCount": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Low severity findings"
-        },
-        "infoCount": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Informational findings"
-        },
-        "filesScanned": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Number of files analyzed"
-        },
-        "linesOfCode": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Lines of code scanned"
-        },
-        "dependenciesChecked": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Number of dependencies checked"
-        },
-        "owaspCategoriesTested": {
-          "type": "integer",
-          "minimum": 0,
-          "maximum": 10,
-          "description": "OWASP Top 10 categories tested"
-        },
-        "owaspCategoriesPassed": {
-          "type": "integer",
-          "minimum": 0,
-          "maximum": 10,
-          "description": "OWASP Top 10 categories with no findings"
-        },
-        "uniqueCwes": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Unique CWE identifiers found"
-        },
-        "falsePositiveRate": {
-          "type": "number",
-          "minimum": 0,
-          "maximum": 1,
-          "description": "Estimated false positive rate"
-        },
-        "scanDurationMs": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Total scan duration in milliseconds"
-        },
-        "coverage": {
-          "type": "object",
-          "properties": {
-            "sast": {
-              "type": "boolean",
-              "description": "Static analysis performed"
-            },
-            "dast": {
-              "type": "boolean",
-              "description": "Dynamic analysis performed"
-            },
-            "dependencies": {
-              "type": "boolean",
-              "description": "Dependency scan performed"
-            },
-            "secrets": {
-              "type": "boolean",
-              "description": "Secret scanning performed"
-            },
-            "configuration": {
-              "type": "boolean",
-              "description": "Configuration review performed"
-            }
-          },
-          "description": "Scan coverage indicators"
-        }
-      }
-    },
-    "scanConfiguration": {
-      "type": "object",
-      "properties": {
-        "target": {
-          "type": "string",
-          "description": "Scan target (file path, URL, or package)"
-        },
-        "targetType": {
-          "type": "string",
-          "enum": ["source", "url", "package", "container", "infrastructure"],
-          "description": "Type of target being scanned"
-        },
-        "scanTypes": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "enum": ["sast", "dast", "dependency", "secret", "configuration", "container", "iac"]
-          },
-          "description": "Types of scans performed"
-        },
-        "severity": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "enum": ["critical", "high", "medium", "low", "info"]
-          },
-          "description": "Severity levels included in scan"
-        },
-        "owaspCategories": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "pattern": "^A(0[1-9]|10):20(21|25)$"
-          },
-          "description": "OWASP categories tested"
-        },
-        "tools": {
-          "type": "array",
-          "items": { "type": "string" },
-          "description": "Security tools used"
-        },
-        "excludePatterns": {
-          "type": "array",
-          "items": { "type": "string" },
-          "description": "File patterns excluded from scan"
-        },
-        "rulesets": {
-          "type": "array",
-          "items": { "type": "string" },
-          "description": "Security rulesets applied"
-        }
-      }
-    },
-    "location": {
-      "type": "object",
-      "properties": {
-        "file": {
-          "type": "string",
-          "maxLength": 500,
-          "description": "File path relative to project root"
-        },
-        "line": {
-          "type": "integer",
-          "minimum": 1,
-          "description": "Line number"
-        },
-        "column": {
-          "type": "integer",
-          "minimum": 1,
-          "description": "Column number"
-        },
-        "endLine": {
-          "type": "integer",
-          "minimum": 1,
-          "description": "End line for multi-line findings"
-        },
-        "endColumn": {
-          "type": "integer",
-          "minimum": 1,
-          "description": "End column"
-        },
-        "url": {
-          "type": "string",
-          "format": "uri",
-          "description": "URL for web-based findings"
-        },
-        "endpoint": {
-          "type": "string",
-          "description": "API endpoint path"
-        },
-        "method": {
-          "type": "string",
-          "enum": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS"],
-          "description": "HTTP method for API findings"
-        },
-        "parameter": {
-          "type": "string",
-          "description": "Vulnerable parameter name"
-        },
-        "component": {
-          "type": "string",
-          "description": "Affected component or module"
-        }
-      }
-    },
-    "artifact": {
-      "type": "object",
-      "required": ["type", "path"],
-      "properties": {
-        "type": {
-          "type": "string",
-          "enum": ["report", "sarif", "data", "log", "evidence"],
-          "description": "Artifact type"
-        },
-        "path": {
-          "type": "string",
-          "maxLength": 500,
-          "description": "Path to artifact"
-        },
-        "format": {
-          "type": "string",
-          "enum": ["json", "sarif", "html", "md", "txt", "xml", "csv"],
-          "description": "Artifact format"
-        },
-        "description": {
-          "type": "string",
-          "maxLength": 500,
-          "description": "Artifact description"
-        },
-        "sizeBytes": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "File size in bytes"
-        },
-        "checksum": {
-          "type": "string",
-          "pattern": "^sha256:[a-f0-9]{64}$",
-          "description": "SHA-256 checksum"
-        }
-      }
-    },
-    "timelineEvent": {
-      "type": "object",
-      "required": ["timestamp", "event"],
-      "properties": {
-        "timestamp": {
-          "type": "string",
-          "format": "date-time",
-          "description": "Event timestamp"
-        },
-        "event": {
-          "type": "string",
-          "maxLength": 200,
-          "description": "Event description"
-        },
-        "type": {
-          "type": "string",
-          "enum": ["start", "checkpoint", "warning", "error", "complete"],
-          "description": "Event type"
-        },
-        "durationMs": {
-          "type": "integer",
-          "minimum": 0,
-          "description": "Duration since previous event"
-        },
-        "phase": {
-          "type": "string",
-          "enum": ["initialization", "sast", "dast", "dependency", "secret", "reporting"],
-          "description": "Scan phase"
-        }
-      }
-    },
-    "metadata": {
-      "type": "object",
-      "properties": {
-        "executionTimeMs": {
-          "type": "integer",
-          "minimum": 0,
-          "maximum": 3600000,
-          "description": "Execution time in milliseconds"
-        },
-        "toolsUsed": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "enum": ["semgrep", "npm-audit", "trivy", "owasp-zap", "bandit", "gosec", "eslint-security", "snyk", "gitleaks", "trufflehog", "bearer"]
-          },
-          "uniqueItems": true,
-          "description": "Security tools used"
-        },
-        "agentId": {
-          "type": "string",
-          "pattern": "^qe-[a-z][a-z0-9-]*$",
-          "description": "Agent ID (e.g., qe-security-scanner)"
-        },
-        "modelUsed": {
-          "type": "string",
-          "description": "LLM model used for analysis"
-        },
-        "inputHash": {
-          "type": "string",
-          "pattern": "^[a-f0-9]{64}$",
-          "description": "SHA-256 hash of input"
-        },
-        "targetUrl": {
-          "type": "string",
-          "format": "uri",
-          "description": "Target URL if applicable"
-        },
-        "targetPath": {
-          "type": "string",
-          "description": "Target path if applicable"
-        },
-        "environment": {
-          "type": "string",
-          "enum": ["development", "staging", "production", "ci"],
-          "description": "Execution environment"
-        },
-        "retryCount": {
-          "type": "integer",
-          "minimum": 0,
-          "maximum": 10,
-          "description": "Number of retries"
-        }
-      }
-    },
-    "validationResult": {
-      "type": "object",
-      "properties": {
-        "schemaValid": {
-          "type": "boolean",
-          "description": "Passes JSON schema validation"
-        },
-        "contentValid": {
-          "type": "boolean",
-          "description": "Passes content validation"
-        },
-        "confidence": {
-          "type": "number",
-          "minimum": 0,
-          "maximum": 1,
-          "description": "Confidence score"
-        },
-        "warnings": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "maxLength": 500
-          },
-          "maxItems": 20,
-          "description": "Validation warnings"
-        },
-        "errors": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "maxLength": 500
-          },
-          "maxItems": 20,
-          "description": "Validation errors"
-        },
-        "validatorVersion": {
-          "type": "string",
-          "pattern": "^\\d+\\.\\d+\\.\\d+$",
-          "description": "Validator version"
-        }
-      }
-    },
-    "learningData": {
-      "type": "object",
-      "properties": {
-        "patternsDetected": {
-          "type": "array",
-          "items": {
-            "type": "string",
-            "maxLength": 200
-          },
-          "maxItems": 20,
-          "description": "Security patterns detected (e.g., sql-injection-string-concat)"
-        },
-        "reward": {
-          "type": "number",
-          "minimum": 0,
-          "maximum": 1,
-          "description": "Reward signal for learning (0.0-1.0)"
-        },
-        "feedbackLoop": {
-          "type": "object",
-          "properties": {
-            "previousRunId": {
-              "type": "string",
-              "format": "uuid",
-              "description": "Previous run ID for comparison"
-            },
-            "improvement": {
-              "type": "number",
-              "minimum": -1,
-              "maximum": 1,
-              "description": "Improvement over previous run"
-            }
-          }
-        },
-        "newVulnerabilityPatterns": {
-          "type": "array",
-          "items": {
-            "type": "object",
-            "properties": {
-              "pattern": { "type": "string" },
-              "cwe": { "type": "string" },
-              "confidence": { "type": "number" }
-            }
-          },
-          "description": "New vulnerability patterns learned"
-        }
-      }
-    }
-  }
-}
diff --git a/.cursor/skills/security/scripts/validate-config.json b/.cursor/skills/security/scripts/validate-config.json
deleted file mode 100644
index e484fef..0000000
--- a/.cursor/skills/security/scripts/validate-config.json
+++ /dev/null
@@ -1,45 +0,0 @@
-{
-  "skillName": "security-testing",
-  "skillVersion": "1.0.0",
-  "requiredTools": [
-    "jq"
-  ],
-  "optionalTools": [
-    "npm",
-    "semgrep",
-    "trivy",
-    "ajv",
-    "jsonschema",
-    "python3"
-  ],
-  "schemaPath": "schemas/output.json",
-  "requiredFields": [
-    "skillName",
-    "status",
-    "output",
-    "output.summary",
-    "output.findings",
-    "output.owaspCategories"
-  ],
-  "requiredNonEmptyFields": [
-    "output.summary"
-  ],
-  "mustContainTerms": [
-    "OWASP",
-    "security",
-    "vulnerability"
-  ],
-  "mustNotContainTerms": [
-    "TODO",
-    "placeholder",
-    "FIXME"
-  ],
-  "enumValidations": {
-    ".status": [
-      "success",
-      "partial",
-      "failed",
-      "skipped"
-    ]
-  }
-}
diff --git a/.cursor/skills/test-run/SKILL.md b/.cursor/skills/test-run/SKILL.md
new file mode 100644
index 0000000..e8a52c9
--- /dev/null
+++ b/.cursor/skills/test-run/SKILL.md
@@ -0,0 +1,75 @@
+---
+name: test-run
+description: |
+  Run the project's test suite, report results, and handle failures.
+  Detects test runners automatically (pytest, dotnet test, cargo test, npm test)
+  or uses scripts/run-tests.sh if available.
+  Trigger phrases:
+  - "run tests", "test suite", "verify tests"
+category: build
+tags: [testing, verification, test-suite]
+disable-model-invocation: true
+---
+
+# Test Run
+
+Run the project's test suite and report results. This skill is invoked by the autopilot at verification checkpoints — after implementing tests, after implementing features, or at any point where the test suite must pass before proceeding.
+
+## Workflow
+
+### 1. Detect Test Runner
+
+Check in order — first match wins:
+
+1. `scripts/run-tests.sh` exists → use it
+2. `docker-compose.test.yml` or equivalent test environment exists → spin it up first, then detect runner below
+3. Auto-detect from project files:
+   - `pytest.ini`, `pyproject.toml` with `[tool.pytest]`, or `conftest.py` → `pytest`
+   - `*.csproj` or `*.sln` → `dotnet test`
+   - `Cargo.toml` → `cargo test`
+   - `package.json` with test script → `npm test`
+   - `Makefile` with `test` target → `make test`
+
+If no runner detected → report failure and ask user to specify.
+
+### 2. Run Tests
+
+1. Execute the detected test runner
+2. Capture output: passed, failed, skipped, errors
+3. If a test environment was spun up, tear it down after tests complete
+
+### 3. Report Results
+
+Present a summary:
+
+```
+══════════════════════════════════════
+ TEST RESULTS: [N passed, M failed, K skipped]
+══════════════════════════════════════
+```
+
+### 4. Handle Outcome
+
+**All tests pass** → return success to the autopilot for auto-chain.
+
+**Tests fail** → present using Choose format:
+
+```
+══════════════════════════════════════
+ TEST RESULTS: [N passed, M failed, K skipped]
+══════════════════════════════════════
+ A) Fix failing tests and re-run
+ B) Proceed anyway (not recommended)
+ C) Abort — fix manually
+══════════════════════════════════════
+ Recommendation: A — fix failures before proceeding
+══════════════════════════════════════
+```
+
+- If user picks A → attempt to fix failures, then re-run (loop back to step 2)
+- If user picks B → return success with warning to the autopilot
+- If user picks C → return failure to the autopilot
+
+## Trigger Conditions
+
+This skill is invoked by the autopilot at test verification checkpoints. It is not typically invoked directly by the user.
diff --git a/.cursor/skills/test-spec/SKILL.md b/.cursor/skills/test-spec/SKILL.md
new file mode 100644
index 0000000..7dd3e48
--- /dev/null
+++ b/.cursor/skills/test-spec/SKILL.md
@@ -0,0 +1,469 @@
+---
+name: test-spec
+description: |
+  Test specification skill. Analyzes input data and expected results completeness,
+  then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits)
+  that treat the system as a black box. Every test pairs input data with quantifiable expected results
+  so tests can verify correctness, not just execution.
+  4-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate,
+  test runner script generation. Produces 8 artifacts under tests/ and 2 shell scripts under scripts/.
+  Trigger phrases:
+  - "test spec", "test specification", "test scenarios"
+  - "blackbox test spec", "black box tests", "blackbox tests"
+  - "performance tests", "resilience tests", "security tests"
+category: build
+tags: [testing, black-box, blackbox-tests, test-specification, qa]
+disable-model-invocation: true
+---
+
+# Test Scenario Specification
+
+Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
+
+## Core Principles
+
+- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
+- **Traceability**: every test traces to at least one acceptance criterion or restriction
+- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
+- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
+- **Spec, don't code**: this workflow produces test specifications, never test implementation code
+- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed
+- **No test without expected result**: every test scenario MUST pair input data with a quantifiable expected result; a test that cannot compare actual output against a known-correct answer is not verifiable and must be removed
+
+## Context Resolution
+
+Fixed paths — no mode detection needed:
+
+- PROBLEM_DIR: `_docs/00_problem/`
+- SOLUTION_DIR: `_docs/01_solution/`
+- DOCUMENT_DIR: `_docs/02_document/`
+- TESTS_OUTPUT_DIR: `_docs/02_document/tests/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+| File | Purpose |
+|------|---------|
+| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations |
+| `_docs/00_problem/input_data/` | Reference data examples, expected results, and optional reference files |
+| `_docs/01_solution/solution.md` | Finalized solution |
+
+### Expected Results Specification
+
+Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be **quantifiable** — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict.
+
+Expected results live inside `_docs/00_problem/input_data/` in one or both of:
+
+1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md`
+
+2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file
+
+```
+input_data/
+├── expected_results/            ← required: expected results folder
+│   ├── results_report.md        ← required: input→expected result mapping
+│   ├── image_01_expected.csv    ← per-file expected detections
+│   └── video_01_expected.csv
+├── image_01.jpg
+├── empty_scene.jpg
+└── data_parameters.md
+```
+
+**Quantifiability requirements** (see template for full format and examples):
+- Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`)
+- Structured data: exact JSON/CSV values, or a reference file in `expected_results/`
+- Counts: exact counts (e.g., "3 detections", "0 errors")
+- Text/patterns: exact string or regex pattern to match
+- Timing: threshold (e.g., "response ≤ 500ms")
+- Error cases: expected error code, message pattern, or HTTP status
+
+### Optional Files (used when available)
+
+| File | Purpose |
+|------|---------|
+| `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
+| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
+| `DOCUMENT_DIR/components/` | Component specs for interface identification |
+
+### Prerequisite Checks (BLOCKING)
+
+1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
+2. `restrictions.md` exists and is non-empty — **STOP if missing**
+3. `input_data/` exists and contains at least one file — **STOP if missing**
+4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."*
+5. `problem.md` exists and is non-empty — **STOP if missing**
+6. `solution.md` exists and is non-empty — **STOP if missing**
+7. Create TESTS_OUTPUT_DIR if it does not exist
+8. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**
+
+## Artifact Management
+
+### Directory Structure
+
+```
+TESTS_OUTPUT_DIR/
+├── environment.md
+├── test-data.md
+├── blackbox-tests.md
+├── performance-tests.md
+├── resilience-tests.md
+├── security-tests.md
+├── resource-limit-tests.md
+└── traceability-matrix.md
+```
+
+### Save Timing
+
+| Phase | Save immediately after | Filename |
+|-------|------------------------|----------|
+| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
+| Phase 2 | Environment spec | `environment.md` |
+| Phase 2 | Test data spec | `test-data.md` |
+| Phase 2 | Blackbox tests | `blackbox-tests.md` |
+| Phase 2 | Performance tests | `performance-tests.md` |
+| Phase 2 | Resilience tests | `resilience-tests.md` |
+| Phase 2 | Security tests | `security-tests.md` |
+| Phase 2 | Resource limit tests | `resource-limit-tests.md` |
+| Phase 2 | Traceability matrix | `traceability-matrix.md` |
+| Phase 3 | Updated test data spec (if data added) | `test-data.md` |
+| Phase 3 | Updated test files (if tests removed) | respective test file |
+| Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
+| Phase 4 | Test runner script | `scripts/run-tests.sh` |
+| Phase 4 | Performance test runner script | `scripts/run-performance-tests.sh` |
+
+### Resumability
+
+If TESTS_OUTPUT_DIR already contains files:
+
+1. List existing files and match them to the save timing table above
+2. Identify which phase/artifacts are complete
+3. Resume from the next incomplete artifact
+4. Inform the user which artifacts are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes.
+
+## Workflow
+
+### Phase 1: Input Data Completeness Analysis
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
+**Constraints**: Analysis only — no test specs yet
+
+1. Read `_docs/01_solution/solution.md`
+2. Read `acceptance_criteria.md`, `restrictions.md`
+3. Read testing strategy from solution.md (if present)
+4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
+5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/`
+6. Analyze `input_data/` contents against:
+   - Coverage of acceptance criteria scenarios
+   - Coverage of restriction edge cases
+   - Coverage of testing strategy requirements
+7. Analyze `input_data/expected_results/results_report.md` completeness:
+   - Every input data item has a corresponding expected result row in the mapping
+   - Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
+   - Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
+   - Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
+8. Present input-to-expected-result pairing assessment:
+
+| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
+|------------|--------------------------|---------------|----------------|
+| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
+
+9. Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
+10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
+11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
+
+**BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient.
+
+---
+
+### Phase 2: Test Scenario Specification
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios
+**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
+
+Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
+
+1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
+2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
+3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
+4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
+5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
+6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
+7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
+8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
+
+**Self-verification**:
+- [ ] Every acceptance criterion is covered by at least one test scenario
+- [ ] Every restriction is verified by at least one test scenario
+- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md`
+- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
+- [ ] Positive and negative scenarios are balanced
+- [ ] Consumer app has no direct access to system internals
+- [ ] Docker environment is self-contained (`docker compose up` sufficient)
+- [ ] External dependencies have mock/stub services defined
+- [ ] Traceability matrix has no uncovered AC or restrictions
+
+**Save action**: Write all files under TESTS_OUTPUT_DIR:
+- `environment.md`
+- `test-data.md`
+- `blackbox-tests.md`
+- `performance-tests.md`
+- `resilience-tests.md`
+- `security-tests.md`
+- `resource-limit-tests.md`
+- `traceability-matrix.md`
+
+**BLOCKING**: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed.
+
+Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
+
+---
+
+### Phase 3: Test Data Validation Gate (HARD GATE)
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%.
+**Constraints**: This phase is MANDATORY and cannot be skipped.
+
+#### Step 1 — Build the test-data and expected-result requirements checklist
+
+Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, extract:
+
+| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
+|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
+| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
+
+Present this table to the user.
+
+#### Step 2 — Ask user to provide missing test data AND expected results
+
+For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
+
+> **Option A — Provide the missing items**: Supply what is missing:
+> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
+> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
+>
+> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
+> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
+> - "HTTP 200 with JSON body matching `expected_response_01.json`"
+> - "Processing time < 500ms"
+> - "0 false positives in the output set"
+>
+> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
+
+**BLOCKING**: Wait for the user's response for every missing item.
+
+#### Step 3 — Validate provided data and expected results
+
+For each item where the user chose **Option A**:
+
+**Input data validation**:
+1. Verify the data file(s) exist at the indicated location
+2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
+3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
+
+**Expected result validation**:
+4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/`
+5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
+   - Exact values (counts, strings, status codes)
+   - Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
+   - Pattern matches (regex, substring, JSON schema)
+   - Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
+   - Reference file for structural comparison (JSON diff, CSV diff)
+6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
+7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
+
+If any validation fails, report the specific issue and loop back to Step 2 for that item.
+
+#### Step 4 — Remove tests without data or expected results
+
+For each item where the user chose **Option B**:
+
+1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
+2. Remove the test scenario from the respective test file
+3. Remove corresponding rows from `traceability-matrix.md`
+4. Update `test-data.md` to reflect the removal
+
+**Save action**: Write updated files under TESTS_OUTPUT_DIR:
+- `test-data.md`
+- Affected test files (if tests removed)
+- `traceability-matrix.md` (if tests removed)
+
+#### Step 5 — Final coverage check
+
+After all removals, recalculate coverage:
+
+1. Count remaining test scenarios that trace to acceptance criteria
+2. Count total acceptance criteria + restrictions
+3. Calculate coverage percentage: `covered_items / total_items * 100`
+
+| Metric | Value |
+|--------|-------|
+| Total AC + Restrictions | ? |
+| Covered by remaining tests | ? |
+| **Coverage %** | **?%** |
+
+**Decision**:
+
+- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user.
+- **Coverage < 70%** → Phase 3 **FAILED**. Report:
+  > ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
+  >
+  > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
+  > |---|---|---|
+  >
+  > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
+
+  **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%.
+
+#### Phase 3 Completion
+
+When coverage ≥ 70% and all remaining tests have validated data AND quantifiable expected results:
+
+1. Present the final coverage report
+2. List all removed tests (if any) with reasons
+3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
+4. Confirm all artifacts are saved and consistent
+
+---
+
+### Phase 4: Test Runner Script Generation
+
+**Role**: DevOps engineer
+**Goal**: Generate executable shell scripts that run the specified tests, so the autopilot and CI can invoke them consistently.
+**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure.
+
+#### Step 1 — Detect test infrastructure
+
+1. Identify the project's test runner from manifests and config files:
+   - Python: `pytest` (pyproject.toml, setup.cfg, pytest.ini)
+   - .NET: `dotnet test` (*.csproj, *.sln)
+   - Rust: `cargo test` (Cargo.toml)
+   - Node: `npm test` or `vitest` / `jest` (package.json)
+2. Identify docker-compose files for integration/blackbox tests (`docker-compose.test.yml`, `e2e/docker-compose*.yml`)
+3. Identify performance/load testing tools from dependencies (k6, locust, artillery, wrk, or built-in benchmarks)
+4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
+
+#### Step 2 — Generate `scripts/run-tests.sh`
+
+Create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
+
+1. Set `set -euo pipefail` and trap cleanup on EXIT
+2. Optionally accept a `--unit-only` flag to skip blackbox tests
+3. Run unit tests using the detected test runner
+4. If blackbox tests exist: spin up docker-compose environment, wait for health checks, run blackbox test suite, tear down
+5. Print a summary of passed/failed/skipped tests
+6. Exit 0 on all pass, exit 1 on any failure
+
+#### Step 3 — Generate `scripts/run-performance-tests.sh`
+
+Create `scripts/run-performance-tests.sh` at the project root. The script must:
+
+1. Set `set -euo pipefail` and trap cleanup on EXIT
+2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
+3. Spin up the system under test (docker-compose or local)
+4. Run load/performance scenarios using the detected tool
+5. Compare results against threshold values from the test spec
+6. Print a pass/fail summary per scenario
+7. Exit 0 if all thresholds met, exit 1 otherwise
+
+#### Step 4 — Verify scripts
+
+1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
+2. Mark both scripts as executable (`chmod +x`)
+3. Present a summary of what each script does to the user
+
+**Save action**: Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.
+
+---
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
+| Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template |
+| Ambiguous requirements | ASK user |
+| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
+| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
+| Test scenario conflicts with restrictions | ASK user to clarify intent |
+| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
+| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
+| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
+
+## Common Mistakes
+
+- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
+- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
+- **Missing expected results**: input data without a paired expected result is useless — the test cannot determine pass/fail without knowing what "correct" looks like
+- **Non-quantifiable expected results**: "should return good results" is not verifiable; expected results must have exact values, tolerances, thresholds, or pattern matches that code can evaluate
+- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
+- **Untraceable tests**: every test should trace to at least one AC or restriction
+- **Writing test code**: this skill produces specifications, never implementation code
+- **Tests without data**: every test scenario MUST have concrete test data AND a quantifiable expected result; a test spec without either is not executable and must be removed
+
+## Trigger Conditions
+
+When the user wants to:
+- Specify blackbox tests before implementation or refactoring
+- Analyze input data completeness for test coverage
+- Produce test scenarios from acceptance criteria
+
+**Keywords**: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios"
+
+## Methodology Quick Reference
+
+```
+┌──────────────────────────────────────────────────────────────────────┐
+│              Test Scenario Specification (4-Phase)                   │
+├──────────────────────────────────────────────────────────────────────┤
+│ PREREQ: Data Gate (BLOCKING)                                         │
+│   → verify AC, restrictions, input_data (incl. expected_results.md)  │
+│                                                                      │
+│ Phase 1: Input Data & Expected Results Completeness Analysis         │
+│   → assess input_data/ coverage vs AC scenarios (≥70%)               │
+│   → verify every input has a quantifiable expected result            │
+│   → present input→expected-result pairing assessment                 │
+│   [BLOCKING: user confirms input data + expected results coverage]   │
+│                                                                      │
+│ Phase 2: Test Scenario Specification                                 │
+│   → environment.md                                                   │
+│   → test-data.md (with expected results mapping)                     │
+│   → blackbox-tests.md (positive + negative)                          │
+│   → performance-tests.md                                             │
+│   → resilience-tests.md                                              │
+│   → security-tests.md                                                │
+│   → resource-limit-tests.md                                          │
+│   → traceability-matrix.md                                           │
+│   [BLOCKING: user confirms test coverage]                            │
+│                                                                      │
+│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)    │
+│   → build test-data + expected-result requirements checklist         │
+│   → ask user: provide data+result (A) or remove test (B)             │
+│   → validate input data (quality + quantity)                         │
+│   → validate expected results (quantifiable + comparison method)     │
+│   → remove tests without data or expected result, warn user          │
+│   → final coverage check (≥70% or FAIL + loop back)                  │
+│   [BLOCKING: coverage ≥ 70% required to pass]                        │
+│                                                                      │
+│ Phase 4: Test Runner Script Generation                               │
+│   → detect test runner + docker-compose + load tool                  │
+│   → scripts/run-tests.sh (unit + blackbox)                           │
+│   → scripts/run-performance-tests.sh (load/perf scenarios)           │
+│   → verify scripts are valid and executable                          │
+├──────────────────────────────────────────────────────────────────────┤
+│ Principles: Black-box only · Traceability · Save immediately         │
+│             Ask don't assume · Spec don't code                       │
+│             No test without data · No test without expected result   │
+└──────────────────────────────────────────────────────────────────────┘
+```
diff --git a/.cursor/skills/test-spec/templates/expected-results.md b/.cursor/skills/test-spec/templates/expected-results.md
new file mode 100644
index 0000000..315a13a
--- /dev/null
+++ b/.cursor/skills/test-spec/templates/expected-results.md
@@ -0,0 +1,135 @@
+# Expected Results Template
+
+Save as `_docs/00_problem/input_data/expected_results/results_report.md`.
+For complex expected outputs, place reference CSV files alongside it in `_docs/00_problem/input_data/expected_results/`.
+Referenced by the test-spec skill (`.cursor/skills/test-spec/SKILL.md`).
+
+---
+
+```markdown
+# Expected Results
+
+Maps every input data item to its quantifiable expected result.
+Tests use this mapping to compare actual system output against known-correct answers.
+
+## Result Format Legend
+
+| Result Type | When to Use | Example |
+|-------------|-------------|---------|
+| Exact value | Output must match precisely | `status_code: 200`, `detection_count: 3` |
+| Tolerance range | Numeric output with acceptable variance | `confidence: 0.92 ± 0.05`, `bbox_x: 120 ± 10px` |
+| Threshold | Output must exceed or stay below a limit | `latency < 500ms`, `confidence ≥ 0.85` |
+| Pattern match | Output must match a string/regex pattern | `error_message contains "invalid format"` |
+| File reference | Complex output compared against a reference file | `match expected_results/case_01.json` |
+| Schema match | Output structure must conform to a schema | `response matches DetectionResultSchema` |
+| Set/count | Output must contain specific items or counts | `classes ⊇ {"car", "person"}`, `detections.length == 5` |
+
+## Comparison Methods
+
+| Method | Description | Tolerance Syntax |
+|--------|-------------|-----------------|
+| `exact` | Actual == Expected | N/A |
+| `numeric_tolerance` | abs(actual - expected) ≤ tolerance | `± <value>` or `± <percent>%` |
+| `range` | min ≤ actual ≤ max | `[min, max]` |
+| `threshold_min` | actual ≥ threshold | `≥ <value>` |
+| `threshold_max` | actual ≤ threshold | `≤ <value>` |
+| `regex` | actual matches regex pattern | regex string |
+| `substring` | actual contains substring | substring |
+| `json_diff` | structural comparison against reference JSON | diff tolerance per field |
+| `set_contains` | actual output set contains expected items | subset notation |
+| `file_reference` | compare against reference file in expected_results/ | file path |
+
+## Input → Expected Result Mapping
+
+### [Scenario Group Name, e.g. "Single Image Detection"]
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `[file or parameters]` | [what this input represents] | [quantifiable expected output] | [method from table above] | [± value, range, or N/A] | [path in expected_results/ or N/A] |
+
+#### Example — Object Detection
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `image_01.jpg` | Aerial photo, 3 vehicles visible | `detection_count: 3`, classes: `["ArmorVehicle", "ArmorVehicle", "Truck"]` | exact (count), set_contains (classes) | N/A | N/A |
+| 2 | `image_01.jpg` | Same image, bbox positions | bboxes: `[(120,80,340,290), (400,150,580,310), (50,400,200,520)]` | numeric_tolerance | ± 15px per coordinate | `expected_results/image_01_detections.json` |
+| 3 | `image_01.jpg` | Same image, confidence scores | confidences: `[0.94, 0.88, 0.91]` | threshold_min | each ≥ 0.85 | N/A |
+| 4 | `empty_scene.jpg` | Aerial photo, no objects | `detection_count: 0`, empty detections array | exact | N/A | N/A |
+| 5 | `corrupted.dat` | Invalid file format | HTTP 400, body contains `"error"` key | exact (status), substring (body) | N/A | N/A |
+
+#### Example — Performance
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `standard_image.jpg` | 1920x1080 single image | Response time | threshold_max | ≤ 2000ms | N/A |
+| 2 | `large_image.jpg` | 8000x6000 tiled image | Response time | threshold_max | ≤ 10000ms | N/A |
+
+#### Example — Error Handling
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `POST /detect` with no file | Missing required input | HTTP 422, message matches `"file.*required"` | exact (status), regex (message) | N/A | N/A |
+| 2 | `POST /detect` with `probability_threshold: 5.0` | Out-of-range config | HTTP 422 or clamped to valid range | exact (status) or range [0.0, 1.0] | N/A | N/A |
+
+## Expected Result Reference Files
+
+When the expected output is too complex for an inline table cell (e.g., full JSON response with nested objects), place a reference file in `_docs/00_problem/input_data/expected_results/`.
+
+### File Naming Convention
+
+`<input_name>_expected.<format>`
+
+Examples:
+- `image_01_detections.json`
+- `batch_A_results.csv`
+- `video_01_annotations.json`
+
+### Reference File Requirements
+
+- Must be machine-readable (JSON, CSV, YAML — not prose)
+- Must contain only the expected output structure and values
+- Must include tolerance annotations where applicable (as metadata fields or comments)
+- Must be valid and parseable by standard libraries
+
+### Reference File Example (JSON)
+
+File: `expected_results/image_01_detections.json`
+
+​```json
+{
+  "input": "image_01.jpg",
+  "expected": {
+    "detection_count": 3,
+    "detections": [
+      {
+        "class": "ArmorVehicle",
+        "confidence": { "min": 0.85 },
+        "bbox": { "x1": 120, "y1": 80, "x2": 340, "y2": 290, "tolerance_px": 15 }
+      },
+      {
+        "class": "ArmorVehicle",
+        "confidence": { "min": 0.85 },
+        "bbox": { "x1": 400, "y1": 150, "x2": 580, "y2": 310, "tolerance_px": 15 }
+      },
+      {
+        "class": "Truck",
+        "confidence": { "min": 0.85 },
+        "bbox": { "x1": 50, "y1": 400, "x2": 200, "y2": 520, "tolerance_px": 15 }
+      }
+    ]
+  }
+}
+​```
+```
+
+---
+
+## Guidance Notes
+
+- Every row in the mapping table must have at least one quantifiable comparison — no row should say only "should work" or "returns result".
+- Use `exact` comparison for counts, status codes, and discrete values.
+- Use `numeric_tolerance` for floating-point values and spatial coordinates where minor variance is expected.
+- Use `threshold_min`/`threshold_max` for performance metrics and confidence scores.
+- Use `file_reference` when the expected output has more than ~3 fields or nested structures.
+- Reference files must be committed alongside input data — they are part of the test specification.
+- When the system has non-deterministic behavior (e.g., model inference variance across hardware), document the expected tolerance explicitly and justify it.
diff --git a/.cursor/skills/test-spec/templates/run-tests-script.md b/.cursor/skills/test-spec/templates/run-tests-script.md
new file mode 100644
index 0000000..e5c41ff
--- /dev/null
+++ b/.cursor/skills/test-spec/templates/run-tests-script.md
@@ -0,0 +1,88 @@
+# Test Runner Script Structure
+
+Reference for generating `scripts/run-tests.sh` and `scripts/run-performance-tests.sh`.
+
+## `scripts/run-tests.sh`
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+UNIT_ONLY=false
+RESULTS_DIR="$PROJECT_ROOT/test-results"
+
+for arg in "$@"; do
+  case $arg in
+    --unit-only) UNIT_ONLY=true ;;
+  esac
+done
+
+cleanup() {
+  # tear down docker-compose if it was started
+}
+trap cleanup EXIT
+
+mkdir -p "$RESULTS_DIR"
+
+# --- Unit Tests ---
+# [detect runner: pytest / dotnet test / cargo test / npm test]
+# [run and capture exit code]
+# [save results to $RESULTS_DIR/unit-results.*]
+
+# --- Blackbox Tests (skip if --unit-only) ---
+# if ! $UNIT_ONLY; then
+#   [docker compose -f <compose-file> up -d]
+#   [wait for health checks]
+#   [run blackbox test suite]
+#   [save results to $RESULTS_DIR/blackbox-results.*]
+# fi
+
+# --- Summary ---
+# [print passed / failed / skipped counts]
+# [exit 0 if all passed, exit 1 otherwise]
+```
+
+## `scripts/run-performance-tests.sh`
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+RESULTS_DIR="$PROJECT_ROOT/test-results"
+
+cleanup() {
+  # tear down test environment if started
+}
+trap cleanup EXIT
+
+mkdir -p "$RESULTS_DIR"
+
+# --- Start System Under Test ---
+# [docker compose up -d or start local server]
+# [wait for health checks]
+
+# --- Run Performance Scenarios ---
+# [detect tool: k6 / locust / artillery / wrk / built-in]
+# [run each scenario from performance-tests.md]
+# [capture metrics: latency P50/P95/P99, throughput, error rate]
+
+# --- Compare Against Thresholds ---
+# [read thresholds from test spec or CLI args]
+# [print per-scenario pass/fail]
+
+# --- Summary ---
+# [exit 0 if all thresholds met, exit 1 otherwise]
+```
+
+## Key Requirements
+
+- Both scripts must be idempotent (safe to run multiple times)
+- Both scripts must work in CI (no interactive prompts, no GUI)
+- Use `trap cleanup EXIT` to ensure teardown even on failure
+- Exit codes: 0 = all pass, 1 = failures detected
+- Write results to `test-results/` directory (add to `.gitignore` if not already present)
+- The actual commands depend on the detected tech stack — fill them in during Phase 4 of the test-spec skill
diff --git a/.cursor/skills/ui-design/SKILL.md b/.cursor/skills/ui-design/SKILL.md
new file mode 100644
index 0000000..afbd431
--- /dev/null
+++ b/.cursor/skills/ui-design/SKILL.md
@@ -0,0 +1,254 @@
+---
+name: ui-design
+description: |
+  End-to-end UI design workflow: requirements gathering → design system synthesis → HTML+CSS mockup generation → visual verification → iterative refinement.
+  Zero external dependencies. Optional MCP enhancements (RenderLens, AccessLint).
+  Two modes:
+  - Full workflow: phases 0-8 for complex design tasks
+  - Quick mode: skip to code generation for simple requests
+  Command entry points:
+  - /design-audit — quality checks on existing mockup
+  - /design-polish — final refinement pass
+  - /design-critique — UX review with feedback
+  - /design-regen — regenerate with different direction
+  Trigger phrases:
+  - "design a UI", "create a mockup", "build a page"
+  - "make a landing page", "design a dashboard"
+  - "mockup", "design system", "UI design"
+category: create
+tags: [ui-design, mockup, html, css, tailwind, design-system, accessibility]
+disable-model-invocation: true
+---
+
+# UI Design Skill
+
+End-to-end UI design workflow producing production-quality HTML+CSS mockups entirely within Cursor, with zero external tool dependencies.
+
+## Core Principles
+
+- **Design intent over defaults**: never settle for generic AI output; every visual choice must trace to user requirements
+- **Verify visually**: AI must see what it generates whenever possible (browser screenshots)
+- **Tokens over hardcoded values**: use CSS custom properties with semantic naming, not raw hex
+- **Restraint over decoration**: less is more; every visual element must earn its place
+- **Ask, don't assume**: when design direction is ambiguous, STOP and ask the user
+- **One screen at a time**: generate individual screens, not entire applications at once
+
+## Context Resolution
+
+Determine the operating mode based on invocation before any other logic runs.
+
+**Project mode** (default — `_docs/` structure exists):
+- MOCKUPS_DIR: `_docs/02_document/ui_mockups/`
+
+**Standalone mode** (explicit input file provided, e.g. `/ui-design @some_brief.md`):
+- INPUT_FILE: the provided file (treated as design brief)
+- MOCKUPS_DIR: `_standalone/ui_mockups/`
+
+Create MOCKUPS_DIR if it does not exist. Announce the detected mode and resolved path to the user.
+
+## Output Directory
+
+All generated artifacts go to `MOCKUPS_DIR`:
+
+```
+MOCKUPS_DIR/
+├── DESIGN.md              # Generated design system (three-layer tokens)
+├── index.html             # Main mockup (or named per page)
+└── [page-name].html       # Additional pages if multi-page
+```
+
+## Complexity Detection (Phase 0)
+
+Before starting the workflow, classify the request:
+
+**Quick mode** — skip to Phase 5 (Code Generation):
+- Request is a single component or screen
+- User provides enough style context in their message
+- `MOCKUPS_DIR/DESIGN.md` already exists
+- Signals: "just make a...", "quick mockup of...", single component name, less than 2 sentences
+
+**Full mode** — run phases 1-8:
+- Multi-page request
+- Brand-specific requirements
+- "design system for...", complex layouts, dashboard/admin panel
+- No existing DESIGN.md
+
+Announce the detected mode to the user.
+
+## Phase 1: Context Check
+
+1. Check for existing project documentation: PRD, design specs, README with design notes
+2. Check for existing `MOCKUPS_DIR/DESIGN.md`
+3. Check for existing mockups in `MOCKUPS_DIR/`
+4. If DESIGN.md exists → announce "Using existing design system" → skip to Phase 5
+5. If project docs with design info exist → extract requirements from them, skip to Phase 3
+
+## Phase 2: Requirements Gathering
+
+Use the AskQuestion tool for structured input. Adapt based on what Phase 1 found — only ask for what's missing.
+
+**Round 1 — Structural:**
+
+Ask using AskQuestion with these questions:
+- **Page type**: landing, dashboard, form, settings, profile, admin panel, e-commerce, blog, documentation, other
+- **Target audience**: developers, business users, consumers, internal team, general public
+- **Platform**: web desktop-first, web mobile-first
+- **Key sections**: header, hero, sidebar, main content, cards grid, data table, form, footer (allow multiple)
+
+**Round 2 — Design Intent:**
+
+Ask using AskQuestion with these questions:
+- **Visual atmosphere**: Airy & spacious / Dense & data-rich / Warm & approachable / Sharp & technical / Luxurious & premium
+- **Color mood**: Cool blues & grays / Warm earth tones / Bold & vibrant / Monochrome / Dark mode / Let AI choose based on atmosphere / Custom (specify brand colors)
+- **Typography mood**: Geometric (modern, clean) / Humanist (friendly, readable) / Monospace (technical, code-like) / Serif (editorial, premium)
+
+Then ask in free-form:
+- "Name an app or website whose look you admire" (optional, helps anchor style)
+- "Any specific content, copy, or data to include?"
+
+## Phase 3: Direction Exploration
+
+Generate 2-3 text-based direction summaries. Each direction is 3-5 sentences describing:
+- Visual approach and mood
+- Color palette direction (specific hues, not just "blue")
+- Layout strategy (grid type, density, whitespace approach)
+- Typography choice (specific font suggestions, not just "sans-serif")
+
+Present to user: "Here are 2-3 possible directions. Which resonates? Or describe a blend."
+
+Wait for user to pick before proceeding.
+
+## Phase 4: Design System Synthesis
+
+Generate `MOCKUPS_DIR/DESIGN.md` using the template from `templates/design-system.md`.
+
+The generated DESIGN.md must include all 6 sections:
+1. Visual Atmosphere — descriptive mood (never "clean and modern")
+2. Color System — three-layer CSS custom properties (primitives → semantic → component)
+3. Typography — specific font family, weight hierarchy, size scale with rem values
+4. Spacing & Layout — base unit, spacing scale, grid, breakpoints
+5. Component Styling Defaults — buttons, cards, inputs, navigation with all states
+6. Interaction States — loading, error, empty, hover, focus, disabled patterns
+
+Read `references/design-vocabulary.md` for atmosphere descriptors and style vocabulary to use when writing the DESIGN.md.
+
+## Phase 5: Code Generation
+
+Construct the generation by combining context from multiple sources:
+
+1. Read `MOCKUPS_DIR/DESIGN.md` for the design system
+2. Read `references/components.md` for component best practices relevant to the page type
+3. Read `references/anti-patterns.md` for explicit avoidance instructions
+
+Generate `MOCKUPS_DIR/[page-name].html` as a single file with:
+- `<script src="https://cdn.tailwindcss.com"></script>` for Tailwind
+- `<style>` block with all CSS custom properties from DESIGN.md
+- Tailwind config override in `<script>` to map tokens to Tailwind theme
+- Semantic HTML (nav, main, section, article, footer)
+- Mobile-first responsive design
+- All interactive elements with hover, focus, active states
+- At least one loading skeleton example
+- Proper heading hierarchy (single h1)
+
+**Anti-AI-Slop guard clauses** (MANDATORY — read `references/anti-patterns.md` for full list):
+- Do NOT use Inter or Roboto unless user explicitly requested them
+- Do NOT default to purple/indigo accent color
+- Do NOT create "card soup" — vary layout patterns
+- Do NOT make all buttons equal weight
+- Do NOT over-decorate
+- Use the actual tokens from DESIGN.md, not hardcoded values
+
+For quick mode without DESIGN.md: use a sensible default design system matching the request context. Still follow all anti-slop rules.
+
+## Phase 6: Visual Verification
+
+Tiered verification — use the best available tool:
+
+**Layer 1 — Structural Check** (always runs):
+Read `references/quality-checklist.md` and verify against the structural checklist.
+
+**Layer 2 — Visual Check** (when browser tool is available):
+1. Open the generated HTML file using the browser tool
+2. Take screenshots at desktop (1440px) width
+3. Examine the screenshot for: spacing consistency, alignment, color rendering, typography hierarchy, overall visual balance
+4. Compare against DESIGN.md's intended atmosphere
+5. Flag issues: cramped areas, orphan text, broken layouts, invisible elements
+
+**Layer 3 — Compliance Check** (when MCP tools are available):
+- If AccessLint MCP is configured: audit HTML for WCAG violations, auto-fix flagged issues
+- If RenderLens MCP is configured: render + audit (Lighthouse + WCAG scores) + diff
+
+Auto-fix any issues found. Re-verify after fixes.
+
+## Phase 7: User Review
+
+1. Open mockup in browser for the user:
+   - Primary: use Cursor browser tool (AI can see and discuss the same view)
+   - Fallback: use OS-appropriate command (`open` on macOS, `xdg-open` on Linux, `start` on Windows)
+2. Present assessment summary: structural check results, visual observations, compliance scores if available
+3. Ask: "How does this look? What would you like me to change?"
+
+## Phase 8: Iteration
+
+1. Parse user feedback into specific changes
+2. Apply targeted edits via StrReplace (not full regeneration unless user requests a fundamentally different direction)
+3. Re-run visual verification (Phase 6)
+4. Present changes to user
+5. Repeat until user approves
+
+## Command Entry Points
+
+These commands bypass the full workflow for targeted operations on existing mockups:
+
+### /design-audit
+Run quality checks on an existing mockup in `MOCKUPS_DIR/`.
+1. Read the HTML file
+2. Run structural checklist from `references/quality-checklist.md`
+3. If browser tool available: take screenshot and visual check
+4. If AccessLint MCP available: WCAG audit
+5. Report findings with severity levels
+
+### /design-polish
+Final refinement pass on an existing mockup.
+1. Read the HTML file and DESIGN.md
+2. Check token usage (no hardcoded values that should be tokens)
+3. Verify all interaction states are present
+4. Refine spacing consistency, typography hierarchy
+5. Apply micro-improvements (subtle shadows, transitions, hover states)
+
+### /design-critique
+UX review with specific feedback.
+1. Read the HTML file
+2. Evaluate: information hierarchy, call-to-action clarity, cognitive load, navigation flow
+3. Check against anti-patterns from `references/anti-patterns.md`
+4. Provide a structured critique with specific improvement suggestions
+
+### /design-regen
+Regenerate mockup with a different design direction.
+1. Keep the existing page structure and content
+2. Ask user what direction to change (atmosphere, colors, layout, typography)
+3. Update DESIGN.md tokens accordingly
+4. Regenerate the HTML with the new design system
+
+## Optional MCP Enhancements
+
+When configured, these MCP servers enhance the workflow:
+
+| MCP Server | Phase | What It Adds |
+|------------|-------|-------------|
+| RenderLens | 6 | HTML→screenshot, Lighthouse audit, pixel-level diff |
+| AccessLint | 6 | WCAG violation detection + auto-fix (99.5% fix rate) |
+| Playwright | 6 | Screenshot at multiple viewports, visual regression |
+
+The skill works fully without any MCP servers. MCPs are enhancements, not requirements.
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Unclear design direction | **ASK user** — present direction options |
+| Conflicting requirements (e.g., "minimal but feature-rich") | **ASK user** which to prioritize |
+| User asks for a framework-specific output (React, Vue) | **WARN**: this skill generates HTML+CSS mockups; suggest adapting after approval |
+| Generated mockup looks wrong in visual verification | Auto-fix if possible; **ASK user** if the issue is subjective |
+| User requests multi-page site | Generate one page at a time; maintain DESIGN.md consistency across pages |
+| Accessibility audit fails | Auto-fix violations; **WARN user** about remaining manual-check items |
diff --git a/.cursor/skills/ui-design/references/anti-patterns.md b/.cursor/skills/ui-design/references/anti-patterns.md
new file mode 100644
index 0000000..800fe8e
--- /dev/null
+++ b/.cursor/skills/ui-design/references/anti-patterns.md
@@ -0,0 +1,69 @@
+# Anti-Patterns — AI Slop Prevention
+
+Read this file before generating any HTML/CSS. These are explicit instructions for what NOT to do.
+
+## Typography Anti-Patterns
+
+- **Do NOT default to Inter or Roboto.** These are the #1 signal of AI-generated UI. Choose a font that matches the atmosphere from `design-vocabulary.md`. Only use Inter/Roboto if the user explicitly requests them.
+- **Do NOT use the same font weight everywhere.** Establish a clear weight hierarchy: 600-700 for headings, 400 for body, 500 for UI elements.
+- **Do NOT set body text smaller than 14px (0.875rem).** Prefer 16px (1rem) for body.
+- **Do NOT skip heading levels.** Go h1 → h2 → h3, never h1 → h3.
+- **Do NOT use placeholder-only form fields.** Labels above inputs are mandatory; placeholders are hints only.
+
+## Color Anti-Patterns
+
+- **Do NOT default to purple or indigo accent colors.** Purple/indigo is the second-biggest AI-slop signal. Use the accent color from DESIGN.md tokens.
+- **Do NOT use more than 1 strong accent color** in the same view. Secondary accents should be muted or derived from the primary.
+- **Do NOT use gray text on colored backgrounds** without checking contrast. WCAG AA requires 4.5:1 for normal text, 3:1 for large text.
+- **Do NOT use rainbow color coding** for categories. Limit to 5-6 carefully chosen, distinguishable colors.
+- **Do NOT apply background gradients to text** (gradient text is fragile and often unreadable).
+
+## Layout Anti-Patterns
+
+- **Do NOT create "card soup"** — rows of identical cards with no visual break. Vary layout patterns: full-width sections, split layouts, featured items, asymmetric grids.
+- **Do NOT center everything.** Left-align body text. Center only headings, short captions, and CTAs.
+- **Do NOT use fixed pixel widths** for layout. Use relative units (%, fr, auto, minmax).
+- **Do NOT nest excessive containers.** Avoid "div soup" — use semantic elements (nav, main, section, article, aside, footer).
+- **Do NOT ignore mobile.** Design mobile-first; every component must work at 375px width.
+
+## Component Anti-Patterns
+
+- **Do NOT make all buttons equal weight.** Establish clear hierarchy: one primary (filled), secondary (outline), ghost (text-only) per visible area.
+- **Do NOT use spinners for content with known layout.** Use skeleton loaders that match the shape of the content.
+- **Do NOT put a modal inside a modal.** If you need nested interaction, use a slide-over or expand the current modal.
+- **Do NOT disable buttons without explanation.** Every disabled button needs a title attribute or adjacent text explaining why.
+- **Do NOT use "Click here" as link text.** Links should describe the destination: "View documentation", "Download report".
+- **Do NOT show hamburger menus on desktop.** Hamburgers are for mobile only; use full navigation on desktop.
+- **Do NOT use equal-weight buttons in a pair.** One must be visually primary, the other secondary.
+
+## Interaction Anti-Patterns
+
+- **Do NOT skip hover states on interactive elements.** Every clickable element needs a visible hover change.
+- **Do NOT skip focus states.** Keyboard users need visible focus indicators on every interactive element.
+- **Do NOT omit loading states.** If data loads asynchronously, show a skeleton or progress indicator.
+- **Do NOT omit empty states.** When a list or section has no data, show an illustration + explanation + action CTA.
+- **Do NOT omit error states.** Form validation errors need inline messages below the field with an icon.
+- **Do NOT use bare alert() for messages.** Use toast notifications or inline banners.
+
+## Decoration Anti-Patterns
+
+- **Do NOT over-decorate.** Restraint over decoration. Every visual element must earn its place.
+- **Do NOT apply shadows AND borders AND background fills simultaneously** on the same element. Pick one or two.
+- **Do NOT use generic stock-photo placeholder images.** Use SVG illustrations, solid color blocks with icons, or real content.
+- **Do NOT use decorative backgrounds** that reduce text readability.
+- **Do NOT animate everything.** Use motion sparingly and purposefully: transitions for state changes (200-300ms), not decorative animation.
+
+## Spacing Anti-Patterns
+
+- **Do NOT use inconsistent spacing.** Stick to the spacing scale from DESIGN.md (multiples of 4px or 8px base unit).
+- **Do NOT use zero padding inside containers.** Minimum 12-16px padding for any content container.
+- **Do NOT crowd elements.** When in doubt, add more whitespace, not less.
+- **Do NOT use different spacing systems** in different parts of the same page. One scale for the whole page.
+
+## Accessibility Anti-Patterns
+
+- **Do NOT rely on color alone** to convey information. Add icons, text, or patterns.
+- **Do NOT use thin font weights (100-300) for body text.** Minimum 400 for readability.
+- **Do NOT create custom controls** without proper ARIA attributes. Prefer native HTML elements.
+- **Do NOT trap keyboard focus** outside of modals. Only modals should have focus traps.
+- **Do NOT auto-play media** without user consent and a visible stop/mute control.
diff --git a/.cursor/skills/ui-design/references/components.md b/.cursor/skills/ui-design/references/components.md
new file mode 100644
index 0000000..9aaf542
--- /dev/null
+++ b/.cursor/skills/ui-design/references/components.md
@@ -0,0 +1,307 @@
+# Component Reference
+
+Use this reference when generating UI mockups. Each component includes best practices, required states, and accessibility requirements.
+
+## Navigation
+
+### Top Navigation Bar
+- Fixed or sticky at top; z-index above content
+- Logo/brand left, primary nav center or right, actions (search, profile, CTA) far right
+- Active state: underline, background highlight, or bold — pick one, be consistent
+- Mobile: collapse to hamburger menu at `md` breakpoint; never show hamburger on desktop
+- Height: 56-72px; padding inline 16-24px
+- Aliases: navbar, header nav, app bar, top bar
+
+### Sidebar Navigation
+- Width: 240-280px expanded, 64-72px collapsed
+- Sections with labels; icons + text for each item
+- Active item: background fill + accent color text/icon
+- Collapse/expand toggle; responsive: overlay on mobile
+- Scroll independently from main content if taller than viewport
+- Aliases: side nav, drawer, rail
+
+### Breadcrumbs
+- Show hierarchy path; separator: `/` or `>`
+- Current page is plain text (not a link); parent pages are links
+- Truncate with ellipsis if more than 4-5 levels
+- Aliases: path indicator, navigation trail
+
+### Tabs
+- Use for switching between related content views within the same context
+- Active tab: border-bottom accent or filled background
+- Never nest tabs inside tabs
+- Scrollable when too many to fit; show scroll indicators
+- Aliases: tab bar, segmented control, view switcher
+
+### Pagination
+- Show current page, first, last, and 2-3 surrounding pages
+- Previous/Next buttons always visible; disabled at boundaries
+- Show total count when available: "Showing 1-20 of 342"
+- Aliases: pager, page navigation
+
+## Content Display
+
+### Card
+- Border-radius: 8-12px; subtle shadow or border (not both unless intentional)
+- Padding: 16-24px; consistent within the same card grid
+- Content order: image/visual → title → description → metadata → actions
+- Hover: subtle shadow lift or border-color change (not both)
+- Never stack more than 3 cards vertically without visual break
+- Aliases: tile, panel, content block
+
+### Data Table
+- Header row: sticky, slightly bolder background, sort indicators
+- Row hover: subtle background change
+- Striped rows optional; alternate between base and surface colors
+- Cell padding: 12-16px vertical, 16px horizontal
+- Truncate long text with ellipsis + tooltip on hover
+- Responsive: horizontal scroll with frozen first column, or stack to card layout on mobile
+- Include empty state when no data
+- Aliases: grid, spreadsheet, list view
+
+### List
+- Consistent item height or padding
+- Dividers between items: subtle border or spacing (not both)
+- Interactive lists: hover state on entire row
+- Leading element (icon/avatar) + content (title + subtitle) + trailing element (action/badge)
+- Aliases: item list, feed, timeline
+
+### Stat/Metric Card
+- Large number/value prominently displayed
+- Label above or below the value; comparison/trend indicator optional
+- Color-code trends: green up, red down, gray neutral
+- Aliases: KPI card, metric tile, stat block
+
+### Avatar
+- Circular; sizes: 24/32/40/48/64px
+- Fallback: initials on colored background when no image
+- Status indicator: small circle at bottom-right (green=online, gray=offline)
+- Group: overlap with z-index stacking; show "+N" for overflow
+- Aliases: profile picture, user icon
+
+### Badge/Tag
+- Small, pill-shaped or rounded-rectangle
+- Color indicates category or status; limit to 5-6 distinct colors
+- Text: short (1-3 words); truncate if longer
+- Removable variant: include x button
+- Aliases: chip, label, status indicator
+
+### Hero Section
+- Full-width; height 400-600px or viewport-relative
+- Strong headline (h1) + supporting text + primary CTA
+- Background: gradient, image with overlay, or solid color — not all three
+- Text must have sufficient contrast over any background
+- Aliases: banner, jumbotron, splash
+
+### Empty State
+- Illustration or icon (not a generic placeholder)
+- Explanatory text: what this area will contain
+- Primary action CTA: "Create your first...", "Add...", "Import..."
+- Never show just blank space
+- Aliases: zero state, no data, blank slate
+
+### Skeleton Loader
+- Match the shape and layout of the content being loaded
+- Animate with subtle pulse or shimmer (left-to-right gradient)
+- Show for predictable content; use progress bar for uploads/processes
+- Never use spinning loaders for content that has a known layout
+- Aliases: placeholder, loading state, content loader
+
+## Forms & Input
+
+### Text Input
+- Height: 40-48px; padding inline 12-16px
+- Label above the input (not placeholder-only); placeholder as hint only
+- States: default, hover, focus (accent ring), error (red border + message), disabled (reduced opacity)
+- Error message below the field with icon; don't use red placeholder
+- Aliases: text field, input box, form field
+
+### Textarea
+- Minimum height: 80-120px; resizable vertically
+- Character count when there's a limit
+- Same states as text input
+- Aliases: multiline input, text area, comment box
+
+### Select/Dropdown
+- Match text input height and styling
+- Chevron indicator on the right
+- Options list: max height with scroll; selected item checkmark
+- Search/filter for lists longer than 10 items
+- Aliases: combo box, picker, dropdown menu
+
+### Checkbox
+- Size: 16-20px; rounded corners (2-4px)
+- Label to the right; clickable area includes the label
+- States: unchecked, checked (accent fill + white check), indeterminate (dash), disabled
+- Group: vertical stack with 8-12px gap
+- Aliases: check box, toggle option, multi-select
+
+### Radio Button
+- Size: 16-20px; circular
+- Same interaction patterns as checkbox but single-select
+- Group: vertical stack; minimum 2 options
+- Aliases: radio, option button, single-select
+
+### Toggle/Switch
+- Width: 40-52px; height: 20-28px; thumb is circular
+- Off: gray track; On: accent color track
+- Label to the left or right; describe the "on" state
+- Never use for actions that require a submit; toggles are instant
+- Aliases: switch, on/off toggle
+
+### File Upload
+- Drop zone with dashed border; icon + "Drag & drop or click to upload"
+- Show file type restrictions and size limit
+- Progress indicator during upload
+- File list after upload: name, size, remove button
+- Aliases: file picker, upload area, attachment
+
+### Form Layout
+- Single column for most forms; two columns only for related short fields (first/last name, city/state)
+- Group related fields with section headings
+- Required field indicator: asterisk after label
+- Submit button: right-aligned or full-width; clearly primary
+- Inline validation: show errors on blur, not on every keystroke
+
+## Actions
+
+### Button
+- Primary: filled accent color, white text; one per visible area
+- Secondary: outline or subtle background; supports primary action
+- Ghost/tertiary: text-only with hover background
+- Sizes: sm (32px), md (40px), lg (48px); padding inline 16-24px
+- States: default, hover (darken/lighten 10%), active (darken 15%), focus (ring), disabled (opacity 0.5 + not-allowed cursor)
+- Disabled buttons must have a title attribute explaining why
+- Icon-only buttons: need aria-label; minimum 40px touch target
+- Aliases: action, CTA, submit
+
+### Icon Button
+- Circular or rounded-square; minimum 40px for touch targets
+- Tooltip on hover showing the action name
+- Visually lighter than text buttons
+- Aliases: toolbar button, action icon
+
+### Dropdown Menu
+- Trigger: button or icon button
+- Menu: elevated surface (shadow), rounded corners
+- Items: 36-44px height; icon + label; hover background
+- Dividers between groups; section labels for grouped items
+- Keyboard navigable: arrow keys, enter to select, escape to close
+- Aliases: context menu, action menu, overflow menu
+
+### Floating Action Button (FAB)
+- Circular, 56px; elevated with shadow
+- One per screen maximum; bottom-right placement
+- Primary creation action only
+- Extended variant: pill-shape with icon + label
+- Aliases: FAB, add button, create button
+
+## Feedback
+
+### Toast/Notification
+- Position: top-right or bottom-right; stack vertically
+- Auto-dismiss: 4-6 seconds for info; persist for errors until dismissed
+- Types: success (green), error (red), warning (amber), info (blue)
+- Content: icon + message + optional action link + close button
+- Maximum 3 visible at once; queue the rest
+- Aliases: snackbar, alert toast, flash message
+
+### Alert/Banner
+- Full-width within its container; not floating
+- Types: info, success, warning, error with corresponding colors
+- Icon left, message center, dismiss button right
+- Persistent until user dismisses or condition changes
+- Aliases: notice, inline alert, status banner
+
+### Modal/Dialog
+- Centered; overlay dims background (opacity 0.5 black)
+- Max width: 480-640px for standard, 800px for complex
+- Header (title + close button) + body + footer (actions)
+- Actions: right-aligned; primary right, secondary left
+- Close on overlay click and Escape key
+- Never put a modal inside a modal
+- Focus trap: tab cycles within modal while open
+- Aliases: popup, dialog box, lightbox
+
+### Tooltip
+- Appears on hover after 300-500ms delay; disappears on mouse leave
+- Position: above element by default; flip if near viewport edge
+- Max width: 200-280px; short text only
+- Arrow/caret pointing to trigger element
+- Aliases: hint, info popup, hover text
+
+### Progress Indicator
+- Linear bar: for known duration/percentage; show percentage text
+- Skeleton: for content loading with known layout
+- Spinner: only for indeterminate short waits (< 3 seconds) where layout is unknown
+- Step indicator: for multi-step flows; show completed/current/upcoming
+- Aliases: loading bar, progress bar, stepper
+
+## Layout
+
+### Page Shell
+- Max content width: 1200-1440px; centered with auto margins
+- Sidebar + main content pattern: sidebar fixed, main scrolls
+- Header/footer outside max-width for full-bleed effect
+- Consistent padding: 16px mobile, 24px tablet, 32px desktop
+
+### Grid
+- CSS Grid or Flexbox; 12-column system or auto-fit with minmax
+- Gap: 16-24px between items
+- Responsive: 1 column mobile, 2 columns tablet, 3-4 columns desktop
+- Never rely on fixed pixel widths; use fr units or percentages
+
+### Section Divider
+- Use spacing (48-96px margin) as primary divider; use lines sparingly
+- If using lines: subtle (1px, border color); full-width or indented
+- Alternate section backgrounds (base/surface) for clear separation without lines
+
+### Responsive Breakpoints
+- sm: 640px (large phone landscape)
+- md: 768px (tablet)
+- lg: 1024px (small laptop)
+- xl: 1280px (desktop)
+- Design mobile-first: base styles are mobile, layer up with breakpoints
+
+## Specialized
+
+### Pricing Table
+- 2-4 tiers side by side; highlight recommended tier
+- Feature comparison with checkmarks; group features by category
+- CTA button per tier; recommended tier has primary button, others secondary
+- Monthly/annual toggle if applicable
+- Aliases: pricing cards, plan comparison
+
+### Testimonial
+- Quote text (large, italic or with quotation marks)
+- Attribution: avatar + name + title/company
+- Layout: single featured or carousel/grid of multiple
+- Aliases: review, customer quote, social proof
+
+### Footer
+- Full-width; darker background than body
+- Column layout: links grouped by category; 3-5 columns
+- Bottom row: copyright, legal links, social icons
+- Responsive: columns stack on mobile
+- Aliases: site footer, bottom navigation
+
+### Search
+- Input with search icon; expand on focus or always visible
+- Results: dropdown with highlighted matching text
+- Recent searches and suggestions
+- Keyboard shortcut hint (Cmd+K / Ctrl+K)
+- Aliases: search bar, omnibar, search field
+
+### Date Picker
+- Input that opens a calendar dropdown
+- Navigate months with arrows; today highlighted
+- Range selection: two calendars side by side
+- Presets: "Today", "Last 7 days", "This month"
+- Aliases: calendar picker, date selector
+
+### Chart/Graph Placeholder
+- Container with appropriate aspect ratio (16:9 for line/bar, 1:1 for pie)
+- Include chart title, legend, and axis labels in the mockup
+- Use representative fake data; label as "Sample Data"
+- Tooltip placeholder on hover
+- Aliases: data visualization, graph, analytics chart
diff --git a/.cursor/skills/ui-design/references/design-vocabulary.md b/.cursor/skills/ui-design/references/design-vocabulary.md
new file mode 100644
index 0000000..3f275f1
--- /dev/null
+++ b/.cursor/skills/ui-design/references/design-vocabulary.md
@@ -0,0 +1,139 @@
+# Design Vocabulary
+
+Use this reference when writing DESIGN.md files and constructing generation prompts. Replace vague descriptors with specific, actionable terms.
+
+## Atmosphere Descriptors
+
+Use these instead of "clean and modern":
+
+| Atmosphere | Characteristics | Font Direction | Color Direction | Spacing |
+|------------|----------------|---------------|-----------------|---------|
+| **Airy & Spacious** | Generous whitespace, light backgrounds, floating elements, subtle shadows | Thin/light weights, generous letter-spacing | Soft pastels, whites, muted accents | Large margins, open padding |
+| **Dense & Data-Rich** | Compact spacing, information-heavy, efficient use of space | Medium weights, tighter line-heights, smaller sizes | Neutral grays, high-contrast data colors | Tight but consistent padding |
+| **Warm & Approachable** | Rounded corners, friendly illustrations, organic shapes | Rounded/humanist typefaces, comfortable sizes | Earth tones, warm neutrals, amber/coral accents | Medium spacing, generous touch targets |
+| **Sharp & Technical** | Crisp edges, precise alignment, monospace elements, dark themes | Geometric or monospace, precise sizing | Cool grays, electric blues/greens, dark backgrounds | Grid-strict, mathematical spacing |
+| **Luxurious & Premium** | Generous space, refined details, serif accents, subtle animations | Serif or elegant sans-serif, generous sizing | Deep darks, gold/champagne accents, rich jewel tones | Expansive whitespace, dramatic padding |
+| **Playful & Creative** | Asymmetric layouts, bold colors, hand-drawn elements, motion | Display fonts, variable weights, expressive sizing | Bright saturated colors, unexpected combinations | Dynamic, deliberately uneven |
+| **Corporate & Enterprise** | Structured grids, predictable patterns, dense but organized | System fonts or conservative sans-serif | Brand blues/grays, accent for status indicators | Systematic, spec-driven |
+| **Editorial & Content** | Typography-forward, reading-focused, long-form layout | Serif for body text, sans for UI elements | Near-monochrome, sparse accent color | Generous line-height, wide columns |
+
+## Style-Specific Vocabulary
+
+### When user says... → Use these terms in DESIGN.md
+
+| Vague Input | Professional Translation |
+|-------------|------------------------|
+| "clean" | Restrained palette, generous whitespace, consistent alignment grid |
+| "modern" | Current design patterns (2024-2026), subtle depth, micro-interactions |
+| "minimal" | Single accent color, maximum negative space, typography-driven hierarchy |
+| "professional" | Structured grid, conservative palette, system fonts, clear navigation |
+| "fun" | Saturated palette, rounded elements, playful illustrations, motion |
+| "elegant" | Serif typography, muted palette, generous spacing, refined details |
+| "techy" | Dark theme, monospace accents, neon highlights, sharp corners |
+| "bold" | High contrast, large type, strong color blocks, dramatic layout |
+| "friendly" | Rounded corners (12-16px), humanist fonts, warm colors, illustrations |
+| "corporate" | Blue-gray palette, structured grid, conventional layout, data tables |
+
+## Color Mood Palettes
+
+### Cool Blues & Grays
+- Background: #f8fafc → #f1f5f9
+- Surface: #ffffff
+- Text: #0f172a → #475569
+- Accent: #2563eb (blue-600)
+- Pairs well with: Airy, Sharp, Corporate atmospheres
+
+### Warm Earth Tones
+- Background: #faf8f5 → #f5f0eb
+- Surface: #ffffff
+- Text: #292524 → #78716c
+- Accent: #c2410c (orange-700) or #b45309 (amber-700)
+- Pairs well with: Warm, Editorial atmospheres
+
+### Bold & Vibrant
+- Background: #fafafa → #f5f5f5
+- Surface: #ffffff
+- Text: #171717 → #525252
+- Accent: #dc2626 (red-600) or #7c3aed (violet-600) or #059669 (emerald-600)
+- Pairs well with: Playful, Creative atmospheres
+
+### Monochrome
+- Background: #fafafa → #f5f5f5
+- Surface: #ffffff
+- Text: #171717 → #737373
+- Accent: #171717 (black) with #e5e5e5 borders
+- Pairs well with: Minimal, Luxurious, Editorial atmospheres
+
+### Dark Mode
+- Background: #09090b → #18181b
+- Surface: #27272a → #3f3f46
+- Text: #fafafa → #a1a1aa
+- Accent: #3b82f6 (blue-500) or #22d3ee (cyan-400)
+- Pairs well with: Sharp, Technical, Dense atmospheres
+
+## Typography Mood Mapping
+
+### Geometric (Modern, Clean)
+Fonts: DM Sans, Plus Jakarta Sans, Outfit, General Sans, Satoshi
+- Characteristics: even stroke weight, circular letter forms, precise geometry
+- Best for: SaaS, tech products, dashboards, landing pages
+
+### Humanist (Friendly, Readable)
+Fonts: Source Sans 3, Nunito, Lato, Open Sans, Noto Sans
+- Characteristics: organic curves, varying stroke, warm feel
+- Best for: consumer apps, health/wellness, education, community platforms
+
+### Monospace (Technical, Code-Like)
+Fonts: JetBrains Mono, Fira Code, IBM Plex Mono, Space Mono
+- Characteristics: fixed-width, technical aesthetic, raw precision
+- Best for: developer tools, terminals, data displays, documentation
+
+### Serif (Editorial, Premium)
+Fonts: Playfair Display, Lora, Merriweather, Crimson Pro, Libre Baskerville
+- Characteristics: traditional elegance, reading comfort, authority
+- Best for: blogs, magazines, luxury brands, portfolio sites
+
+### Display (Expressive, Bold)
+Fonts: Cabinet Grotesk, Clash Display, Archivo Black, Space Grotesk
+- Characteristics: high impact, personality-driven, attention-grabbing
+- Best for: hero sections, headlines, creative portfolios, marketing pages
+- Use for headings only; pair with a readable body font
+
+## Shape & Depth Vocabulary
+
+### Border Radius Scale
+| Term | Value | Use for |
+|------|-------|---------|
+| Sharp | 0-2px | Technical, enterprise, data-heavy |
+| Subtle | 4-6px | Professional, balanced |
+| Rounded | 8-12px | Friendly, modern SaaS |
+| Pill | 16-24px or full | Playful, badges, tags |
+| Circle | 50% | Avatars, icon buttons |
+
+### Shadow Scale
+| Term | Value | Use for |
+|------|-------|---------|
+| None | none | Flat design, minimal |
+| Whisper | 0 1px 2px rgba(0,0,0,0.05) | Subtle elevation, cards |
+| Soft | 0 4px 6px rgba(0,0,0,0.07) | Standard cards, dropdowns |
+| Medium | 0 10px 15px rgba(0,0,0,0.1) | Elevated elements, modals |
+| Strong | 0 20px 25px rgba(0,0,0,0.15) | Floating elements, popovers |
+
+### Surface Hierarchy
+1. **Background** — deepest layer, covers viewport
+2. **Surface** — content containers (cards, panels) sitting on background
+3. **Elevated** — elements above surface (modals, dropdowns, tooltips)
+4. **Overlay** — dimming layer between surface and elevated elements
+
+## Layout Pattern Names
+
+| Pattern | Description | Best for |
+|---------|-------------|----------|
+| **Holy grail** | Header + sidebar + main + footer | Admin dashboards, apps |
+| **Magazine** | Multi-column with varied widths | Content sites, blogs |
+| **Single column** | Centered narrow content | Landing pages, articles, forms |
+| **Split screen** | Two equal or 60/40 halves | Comparison pages, sign-up flows |
+| **Card grid** | Uniform grid of cards | Product listings, portfolios |
+| **Asymmetric** | Deliberately unequal columns | Creative, editorial layouts |
+| **Full bleed** | Edge-to-edge sections, no max-width | Marketing pages, portfolios |
+| **Dashboard** | Stat cards + charts + tables in grid | Analytics, admin panels |
diff --git a/.cursor/skills/ui-design/references/quality-checklist.md b/.cursor/skills/ui-design/references/quality-checklist.md
new file mode 100644
index 0000000..db75b04
--- /dev/null
+++ b/.cursor/skills/ui-design/references/quality-checklist.md
@@ -0,0 +1,109 @@
+# Quality Checklist
+
+Run through this checklist after generating or modifying a mockup. Three layers; run all that apply.
+
+## Layer 1: Structural Check (Always Run)
+
+### Semantic HTML
+- [ ] Uses `nav`, `main`, `section`, `article`, `aside`, `footer` — not just `div`
+- [ ] Single `h1` per page
+- [ ] Heading hierarchy follows h1 → h2 → h3 without skipping levels
+- [ ] Lists use `ul`/`ol`/`li`, not styled `div`s
+- [ ] Interactive elements are `button` or `a`, not clickable `div`s
+
+### Design Tokens
+- [ ] CSS custom properties defined in `<style>` block
+- [ ] Colors in HTML reference tokens (e.g., `var(--color-accent)`) not raw hex
+- [ ] Spacing follows the defined scale, not arbitrary pixel values
+- [ ] Font family matches DESIGN.md, not browser default or Inter/Roboto
+
+### Responsive Design
+- [ ] Mobile-first: base styles work at 375px
+- [ ] Content readable without horizontal scroll at all breakpoints
+- [ ] Navigation adapts: full nav on desktop, collapsed on mobile
+- [ ] Images/media have max-width: 100%
+- [ ] Touch targets minimum 44px on mobile
+
+### Interaction States
+- [ ] All buttons have hover, focus, active states
+- [ ] All links have hover and focus states
+- [ ] At least one loading state example (skeleton loader preferred)
+- [ ] At least one empty state with illustration + CTA
+- [ ] Disabled elements have visual indicator + explanation (title attribute)
+- [ ] Form inputs have focus ring using accent color
+
+### Component Quality
+- [ ] Button hierarchy: one primary per visible area, secondary and ghost variants present
+- [ ] Forms: labels above inputs, not placeholder-only
+- [ ] Error states: inline message below field with icon
+- [ ] No hamburger menu on desktop
+- [ ] No modal inside modal
+- [ ] No "Click here" links
+
+### Code Quality
+- [ ] Valid HTML (no unclosed tags, no duplicate IDs)
+- [ ] Tailwind classes are valid (no made-up utilities)
+- [ ] No inline styles that duplicate token values
+- [ ] File is self-contained (single HTML file, no external dependencies except Tailwind CDN)
+- [ ] Total file size under 50KB
+
+## Layer 2: Visual Check (When Browser Tool Available)
+
+Take a screenshot and examine:
+
+### Spacing & Alignment
+- [ ] Consistent margins between sections
+- [ ] Elements within the same row are vertically aligned
+- [ ] Padding within cards/containers is consistent
+- [ ] No orphan text (single word on its own line in headings)
+- [ ] Grid alignment: elements on the same row have matching heights or intentional variation
+
+### Typography
+- [ ] Heading sizes create clear hierarchy (visible difference between h1, h2, h3)
+- [ ] Body text is comfortable reading size (not tiny)
+- [ ] Font rendering looks correct (font loaded or appropriate fallback)
+- [ ] Line length: body text 50-75 characters per line
+
+### Color & Contrast
+- [ ] Primary accent is visible but not overwhelming
+- [ ] Text is readable over all backgrounds
+- [ ] No elements blend into their backgrounds
+- [ ] Status colors (success/error/warning) are distinguishable
+
+### Overall Composition
+- [ ] Visual weight is balanced (not all content on one side)
+- [ ] Clear focal point on the page (hero, headline, or primary CTA)
+- [ ] Appropriate whitespace: not cramped, not excessively empty
+- [ ] Consistent visual language throughout the page
+
+### Atmosphere Match
+- [ ] Overall feel matches the DESIGN.md atmosphere description
+- [ ] Not generic "AI generated" look
+- [ ] Color palette is cohesive (no unexpected color outliers)
+- [ ] Typography choice matches the intended mood
+
+## Layer 3: Compliance Check (When MCP Tools Available)
+
+### AccessLint MCP
+- [ ] Run `audit_html` on the generated file
+- [ ] Fix all violations with fixability "fixable" or "potentially_fixable"
+- [ ] Document any remaining violations that require manual judgment
+- [ ] Re-run `diff_html` to confirm fixes resolved violations
+
+### RenderLens MCP
+- [ ] Render at 1440px and 375px widths
+- [ ] Lighthouse accessibility score ≥ 80
+- [ ] Lighthouse performance score ≥ 70
+- [ ] Lighthouse best practices score ≥ 80
+- [ ] If iterating: run diff between previous and current version
+
+## Severity Classification
+
+When reporting issues found during the checklist:
+
+| Severity | Criteria | Action |
+|----------|----------|--------|
+| **Critical** | Broken layout, invisible content, no mobile support | Fix immediately before showing to user |
+| **High** | Missing interaction states, accessibility violations, token misuse | Fix before showing to user |
+| **Medium** | Minor spacing inconsistency, non-ideal font weight, slight alignment issue | Note in assessment, fix if easy |
+| **Low** | Style preference, minor polish opportunity | Note in assessment, fix during /design-polish |
diff --git a/.cursor/skills/ui-design/templates/design-system.md b/.cursor/skills/ui-design/templates/design-system.md
new file mode 100644
index 0000000..a5d8712
--- /dev/null
+++ b/.cursor/skills/ui-design/templates/design-system.md
@@ -0,0 +1,199 @@
+# Design System: [Project Name]
+
+## 1. Visual Atmosphere
+
+[Describe the mood, density, and aesthetic philosophy in 2-3 sentences. Be specific — never use "clean and modern". Reference the atmosphere type from design-vocabulary.md. Example: "A spacious, light-filled interface with generous whitespace that feels calm and unhurried. Elements float on a near-white canvas with subtle shadows providing depth. The overall impression is sophisticated simplicity — premium without being cold."]
+
+## 2. Color System
+
+### Primitives
+
+```css
+:root {
+  --white: #ffffff;
+  --black: #000000;
+
+  --gray-50: #______;
+  --gray-100: #______;
+  --gray-200: #______;
+  --gray-300: #______;
+  --gray-400: #______;
+  --gray-500: #______;
+  --gray-600: #______;
+  --gray-700: #______;
+  --gray-800: #______;
+  --gray-900: #______;
+  --gray-950: #______;
+
+  --accent-50: #______;
+  --accent-100: #______;
+  --accent-200: #______;
+  --accent-300: #______;
+  --accent-400: #______;
+  --accent-500: #______;
+  --accent-600: #______;
+  --accent-700: #______;
+  --accent-800: #______;
+  --accent-900: #______;
+
+  --red-500: #______;
+  --red-600: #______;
+  --green-500: #______;
+  --green-600: #______;
+  --amber-500: #______;
+  --amber-600: #______;
+}
+```
+
+### Semantic Tokens
+
+```css
+:root {
+  --color-bg-primary: var(--gray-50);
+  --color-bg-secondary: var(--gray-100);
+  --color-bg-surface: var(--white);
+  --color-bg-inverse: var(--gray-900);
+
+  --color-text-primary: var(--gray-900);
+  --color-text-secondary: var(--gray-500);
+  --color-text-tertiary: var(--gray-400);
+  --color-text-inverse: var(--white);
+  --color-text-link: var(--accent-600);
+
+  --color-accent: var(--accent-600);
+  --color-accent-hover: var(--accent-700);
+  --color-accent-light: var(--accent-50);
+
+  --color-border: var(--gray-200);
+  --color-border-strong: var(--gray-300);
+  --color-divider: var(--gray-100);
+
+  --color-error: var(--red-600);
+  --color-error-light: var(--red-500);
+  --color-success: var(--green-600);
+  --color-success-light: var(--green-500);
+  --color-warning: var(--amber-600);
+  --color-warning-light: var(--amber-500);
+}
+```
+
+### Component Tokens
+
+```css
+:root {
+  --button-primary-bg: var(--color-accent);
+  --button-primary-text: var(--color-text-inverse);
+  --button-primary-hover: var(--color-accent-hover);
+  --button-secondary-bg: transparent;
+  --button-secondary-border: var(--color-border-strong);
+  --button-secondary-text: var(--color-text-primary);
+
+  --card-bg: var(--color-bg-surface);
+  --card-border: var(--color-border);
+  --card-shadow: 0 1px 3px rgba(0, 0, 0, 0.08);
+
+  --input-bg: var(--color-bg-surface);
+  --input-border: var(--color-border);
+  --input-border-focus: var(--color-accent);
+  --input-text: var(--color-text-primary);
+  --input-placeholder: var(--color-text-tertiary);
+
+  --nav-bg: var(--color-bg-surface);
+  --nav-active-bg: var(--color-accent-light);
+  --nav-active-text: var(--color-accent);
+}
+```
+
+## 3. Typography
+
+- **Font family**: [Specific font name], [fallback], system-ui, sans-serif
+- **Font source**: Google Fonts link or system font
+
+| Level | Element | Size | Weight | Line Height | Letter Spacing |
+|-------|---------|------|--------|-------------|----------------|
+| Display | Hero headlines | 3rem (48px) | 700 | 1.1 | -0.02em |
+| H1 | Page title | 2.25rem (36px) | 700 | 1.2 | -0.01em |
+| H2 | Section title | 1.5rem (24px) | 600 | 1.3 | 0 |
+| H3 | Subsection | 1.25rem (20px) | 600 | 1.4 | 0 |
+| H4 | Card/group title | 1.125rem (18px) | 600 | 1.4 | 0 |
+| Body | Default text | 1rem (16px) | 400 | 1.5 | 0 |
+| Small | Captions, meta | 0.875rem (14px) | 400 | 1.5 | 0.01em |
+| XS | Labels, badges | 0.75rem (12px) | 500 | 1.4 | 0.02em |
+
+## 4. Spacing & Layout
+
+- **Base unit**: 4px (0.25rem)
+- **Spacing scale**: 1 (4px), 2 (8px), 3 (12px), 4 (16px), 5 (20px), 6 (24px), 8 (32px), 10 (40px), 12 (48px), 16 (64px), 20 (80px), 24 (96px)
+- **Content max-width**: [1200px / 1280px / 1440px]
+- **Grid**: [12-column / auto-fit] with [16px / 24px] gap
+
+| Breakpoint | Name | Min Width | Columns | Padding |
+|------------|------|-----------|---------|---------|
+| Mobile | sm | 0 | 1 | 16px |
+| Tablet | md | 768px | 2 | 24px |
+| Laptop | lg | 1024px | 3-4 | 32px |
+| Desktop | xl | 1280px | 4+ | 32px |
+
+## 5. Component Styling Defaults
+
+### Buttons
+- Border radius: [6px / 8px / full]
+- Padding: 10px 20px (md), 8px 16px (sm), 12px 24px (lg)
+- Font weight: 500
+- Transition: background-color 150ms ease, box-shadow 150ms ease
+- Focus: 2px ring with 2px offset using `--color-accent`
+- Disabled: opacity 0.5, cursor not-allowed
+
+### Cards
+- Border radius: [8px / 12px]
+- Border: 1px solid var(--card-border)
+- Shadow: var(--card-shadow)
+- Padding: 20-24px
+- Hover (if interactive): shadow increase or border-color change
+
+### Inputs
+- Height: 40px (md), 36px (sm), 48px (lg)
+- Border radius: 6px
+- Border: 1px solid var(--input-border)
+- Padding: 0 12px
+- Focus: border-color var(--input-border-focus) + 2px ring
+- Error: border-color var(--color-error) + error message below
+
+### Navigation
+- Item height: 40px
+- Active: background var(--nav-active-bg), text var(--nav-active-text)
+- Hover: background var(--color-bg-secondary)
+- Transition: background-color 150ms ease
+
+## 6. Interaction States (MANDATORY)
+
+### Loading
+- Use skeleton loaders matching content shape
+- Pulse animation: opacity 0.4 → 1.0, duration 1.5s, ease-in-out
+- Background: var(--color-bg-secondary)
+
+### Error
+- Inline message below the element
+- Icon (circle-exclamation) + red text using var(--color-error)
+- Border change on the input/container to var(--color-error)
+
+### Empty
+- Centered illustration or icon (64-96px)
+- Heading: "No [items] yet" or similar
+- Descriptive text: one sentence explaining what will appear
+- Primary CTA button: "Create first...", "Add...", "Import..."
+
+### Hover
+- Interactive elements: subtle background shift or underline
+- Cards: shadow increase or border-color change
+- Transition: 150ms ease
+
+### Focus
+- Visible ring: 2px solid var(--color-accent), 2px offset
+- Applied to all interactive elements (buttons, inputs, links, tabs)
+- Never remove outline without providing alternative focus indicator
+
+### Disabled
+- Opacity: 0.5
+- Cursor: not-allowed
+- Title attribute explaining why the element is disabled
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..e43b0f9
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1 @@
+.DS_Store