Refactor annotation tool from WPF desktop app to .NET API

Replace the WPF desktop application (Azaion.Suite, Azaion.Annotator, Azaion.Common, Azaion.Inference, Azaion.Loader, Azaion.LoaderUI, Azaion.Dataset, Azaion.Test) with a standalone .NET Web API in src/. Made-with: Cursor
2026-04-22 06:56:29 +00:00 · 2026-03-25 04:40:03 +02:00
parent e7ea5a8ded
commit 9e7dc290db
367 changed files with 8840 additions and 16583 deletions
@@ -0,0 +1,179 @@
+## Developer TODO (Project Mode)
+
+### BUILD (green-field or new features)
+
+```
+1. Create _docs/00_problem/                      — describe what you're building
+   - problem.md                                   (required)
+   - restrictions.md                              (required)
+   - acceptance_criteria.md                       (required)
+   - security_approach.md                         (optional)
+
+2. /research                                     — produces solution drafts in _docs/01_solution/
+   Run multiple times: Mode A → draft, Mode B → assess & revise
+   Finalize as solution.md
+
+3. /plan                                         — architecture, components, risks, tests → _docs/02_plans/
+
+4. /decompose                                    — feature specs, implementation order → _docs/02_tasks/
+
+5. /implement-initial                            — scaffold project from initial_structure.md (once)
+
+6. /implement-wave                               — implement next wave of features (repeat per wave)
+
+7. /implement-code-review                        — review implemented code (after each wave or at the end)
+
+8. /implement-black-box-tests                    — E2E tests via Docker consumer app (after all waves)
+
+9. commit & push
+```
+
+### SHIP (deploy and operate)
+
+```
+10. /implement-cicd                              — validate/enhance CI/CD pipeline
+11. /deploy                                      — deployment strategy per environment
+12. /observability                               — monitoring, logging, alerting plan
+```
+
+### EVOLVE (maintenance and improvement)
+
+```
+13. /refactor                                    — structured refactoring (skill, 6-phase workflow)
+```
+
+## Implementation Flow
+
+### `/implement-initial`
+
+Reads `_docs/02_tasks/<topic>/initial_structure.md` and scaffolds the project skeleton: folder structure, shared models, interfaces, stubs, .gitignore, .env.example, CI/CD config, DB migrations setup, test structure.
+
+Run once after decompose.
+
+### `/implement-wave`
+
+Reads `SUMMARY.md` and `cross_dependencies.md` from `_docs/02_tasks/<topic>/`.
+
+1. Detects which features are already implemented
+2. Identifies the next wave (phase) of independent features
+3. Presents the wave for confirmation (blocks until user confirms)
+4. Launches parallel `implementer` subagents (max 4 concurrent; same-component features run sequentially)
+5. Runs tests, reports results
+6. Suggests commit
+
+Repeat `/implement-wave` until all phases are done.
+
+### `/implement-code-review`
+
+Reviews implemented code against specs. Reports issues by type (Bug/Security/Performance/Style/Debt) with priorities and suggested fixes.
+
+### `/implement-black-box-tests`
+
+Reads `_docs/02_plans/<topic>/e2e_test_infrastructure.md` (produced by plan skill). Builds a separate Docker-based consumer app that exercises the system as a black box — no internal imports, no direct DB access. Runs E2E scenarios, produces a CSV test report.
+
+Run after all waves are done.
+
+### `/implement-cicd`
+
+Reviews existing CI/CD pipeline configuration, validates all stages work, optimizes performance (parallelization, caching), ensures quality gates are enforced (coverage, linting, security scanning).
+
+Run after `/implement-initial` or after all waves.
+
+### `/deploy`
+
+Defines deployment strategy per environment: deployment procedures, rollback procedures, health checks, deployment checklist. Outputs `_docs/02_components/deployment_strategy.md`.
+
+Run before first production release.
+
+### `/observability`
+
+Plans logging strategy, metrics collection, distributed tracing, alerting rules, and dashboards. Outputs `_docs/02_components/observability_plan.md`.
+
+Run before first production release.
+
+### Commit
+
+After each wave or review — standard `git add && git commit`. The wave command suggests a commit message.
+
+## Available Skills
+
+| Skill | Triggers | Purpose |
+|-------|----------|---------|
+| **research** | "research", "investigate", "assess solution" | 8-step research → solution drafts |
+| **plan** | "plan", "decompose solution" | Architecture, components, risks, tests, epics |
+| **decompose** | "decompose", "task decomposition" | Feature specs + implementation order |
+| **refactor** | "refactor", "refactoring", "improve code" | 6-phase structured refactoring workflow |
+| **security** | "security audit", "OWASP" | OWASP-based security testing |
+
+## Project Folder Structure
+
+```
+_docs/
+├── 00_problem/
+│   ├── problem.md
+│   ├── restrictions.md
+│   ├── acceptance_criteria.md
+│   └── security_approach.md
+├── 01_solution/
+│   ├── solution_draft01.md
+│   ├── solution_draft02.md
+│   ├── solution.md
+│   ├── tech_stack.md
+│   └── security_analysis.md
+├── 01_research/
+│   └── <topic>/
+├── 02_plans/
+│   └── <topic>/
+│       ├── architecture.md
+│       ├── system-flows.md
+│       ├── components/
+│       └── FINAL_report.md
+├── 02_tasks/
+│   └── <topic>/
+│       ├── initial_structure.md
+│       ├── cross_dependencies.md
+│       ├── SUMMARY.md
+│       └── [##]_[component]/
+│           └── [##].[##]_feature_[name].md
+└── 04_refactoring/
+    ├── baseline_metrics.md
+    ├── discovery/
+    ├── analysis/
+    ├── test_specs/
+    ├── coupling_analysis.md
+    ├── execution_log.md
+    ├── hardening/
+    └── FINAL_report.md
+```
+
+## Implementation Tools
+
+| Tool | Type | Purpose |
+|------|------|---------|
+| `implementer` | Subagent | Implements a single feature from its spec. Launched by implement-wave. |
+| `/implement-initial` | Command | Scaffolds project skeleton from `initial_structure.md`. Run once. |
+| `/implement-wave` | Command | Detects next wave, launches parallel implementers. Repeatable. |
+| `/implement-code-review` | Command | Reviews code against specs. |
+| `/implement-black-box-tests` | Command | E2E tests via Docker consumer app. After all waves. |
+| `/implement-cicd` | Command | Validate and enhance CI/CD pipeline. |
+| `/deploy` | Command | Plan deployment strategy per environment. |
+| `/observability` | Command | Plan logging, metrics, tracing, alerting. |
+
+## Standalone Mode (Reference)
+
+Any skill can run in standalone mode by passing an explicit file:
+
+```
+/research @my_problem.md
+/plan @my_design.md
+/decompose @some_spec.md
+/refactor @some_component.md
+```
+
+Output goes to `_standalone/<topic>/` (git-ignored) instead of `_docs/`. Standalone mode relaxes guardrails — only the provided file is required; restrictions and acceptance criteria are optional.
+
+Single component decompose is also supported:
+
+```
+/decompose @_docs/02_plans/<topic>/components/03_parser/description.md
+```
@@ -0,0 +1,49 @@
+---
+name: implementer
+description: |
+  Implements a single feature from its spec file. Use when implementing features from _docs/02_tasks/.
+  Reads the feature spec, analyzes the codebase, implements the feature with tests, and verifies acceptance criteria.
+---
+
+You are a professional software developer implementing a single feature.
+
+## Input
+
+You receive a path to a feature spec file (e.g., `_docs/02_tasks/<topic>/[##]_[name]/[##].[##]_feature_[name].md`).
+
+## Context
+
+Read these files for project context:
+- `_docs/00_problem/problem.md`
+- `_docs/00_problem/restrictions.md`
+- `_docs/00_problem/acceptance_criteria.md`
+- `_docs/01_solution/solution.md`
+
+## Process
+
+1. Read the feature spec thoroughly — understand acceptance criteria, scope, constraints
+2. Analyze the existing codebase: conventions, patterns, related code, shared interfaces
+3. Research best implementation approaches for the tech stack if needed
+4. If the feature has a dependency on an unimplemented component, create a temporary mock
+5. Implement the feature following existing code conventions
+6. Implement error handling per the project's defined strategy
+7. Implement unit tests (use //Arrange //Act //Assert comments)
+8. Implement integration tests — analyze existing tests, add to them or create new
+9. Run all tests, fix any failures
+10. Verify the implementation satisfies every acceptance criterion from the spec
+
+## After completion
+
+Report:
+- What was implemented
+- Which acceptance criteria are satisfied
+- Test results (passed/failed)
+- Any mocks created for unimplemented dependencies
+- Any concerns or deviations from the spec
+
+## Principles
+
+- Follow SOLID, KISS, DRY
+- Dumb code, smart data
+- No unnecessary comments or logs (only exceptions)
+- Ask if requirements are ambiguous — do not assume
@@ -0,0 +1,71 @@
+# Deployment Strategy Planning
+
+## Initial data:
+ - Problem description: `@_docs/00_problem/problem_description.md`
+ - Restrictions: `@_docs/00_problem/restrictions.md`
+ - Full Solution Description: `@_docs/01_solution/solution.md`
+ - Components: `@_docs/02_components`
+ - Environment Strategy: `@_docs/00_templates/environment_strategy.md`
+
+## Role
+  You are a DevOps/Platform engineer
+
+## Task
+ - Define deployment strategy for each environment
+ - Plan deployment procedures and automation
+ - Define rollback procedures
+ - Establish deployment verification steps
+ - Document manual intervention points
+
+## Output
+
+### Deployment Architecture
+ - Infrastructure diagram (where components run)
+ - Network topology
+ - Load balancing strategy
+ - Container/VM configuration
+
+### Deployment Procedures
+
+#### Staging Deployment
+ - Trigger conditions
+ - Pre-deployment checks
+ - Deployment steps
+ - Post-deployment verification
+ - Smoke tests to run
+
+#### Production Deployment
+ - Approval workflow
+ - Deployment window
+ - Pre-deployment checks
+ - Deployment steps (blue-green, rolling, canary)
+ - Post-deployment verification
+ - Smoke tests to run
+
+### Rollback Procedures
+ - Rollback trigger criteria
+ - Rollback steps per environment
+ - Data rollback considerations
+ - Communication plan during rollback
+
+### Health Checks
+ - Liveness probe configuration
+ - Readiness probe configuration
+ - Custom health endpoints
+
+### Deployment Checklist
+ - [ ] All tests pass in CI
+ - [ ] Security scan clean
+ - [ ] Database migrations reviewed
+ - [ ] Feature flags configured
+ - [ ] Monitoring alerts configured
+ - [ ] Rollback plan documented
+ - [ ] Stakeholders notified
+
+Store output to `_docs/02_components/deployment_strategy.md`
+
+## Notes
+ - Prefer automated deployments over manual
+ - Zero-downtime deployments for production
+ - Always have a rollback plan
+ - Ask questions about infrastructure constraints
@@ -0,0 +1,45 @@
+# Implement E2E Black-Box Tests
+
+Build a separate Docker-based consumer application that exercises the main system as a black box, validating end-to-end use cases.
+
+## Input
+- E2E test infrastructure spec: `_docs/02_plans/<topic>/e2e_test_infrastructure.md` (produced by plan skill Step 4b)
+
+## Context
+- Problem description: `@_docs/00_problem/problem.md`
+- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
+- Solution: `@_docs/01_solution/solution.md`
+- Architecture: `@_docs/02_plans/<topic>/architecture.md`
+
+## Role
+You are a professional QA engineer and developer
+
+## Task
+- Read the E2E test infrastructure spec thoroughly
+- Build the Docker test environment:
+  - Create docker-compose.yml with all services (system under test, test DB, consumer app, dependency mocks)
+  - Configure networks and volumes per spec
+- Implement the consumer application:
+  - Separate project/folder that communicates with the main system only through its public interfaces
+  - No internal imports from the main system, no direct DB access
+  - Use the tech stack and entry point defined in the spec
+- Implement each E2E test scenario from the spec:
+  - Check existing E2E tests; update if a similar test already exists
+  - Prepare seed data and fixtures per the test data management section
+  - Implement teardown/cleanup procedures
+- Run the full E2E suite via `docker compose up`
+- If tests fail:
+  - Fix issues iteratively until all pass
+  - If a failure is caused by missing external data, API access, or environment config, ask the user
+- Ensure the E2E suite integrates into the CI pipeline per the spec
+- Produce a CSV test report (test ID, name, execution time, result, error message) at the output path defined in the spec
+
+## Safety Rules
+- The consumer app must treat the main system as a true black box
+- Never import internal modules or access the main system's database directly
+- Docker environment must be self-contained — no host dependencies beyond Docker itself
+- If external services need mocking, implement mock/stub services as Docker containers
+
+## Notes
+- Ask questions if the spec is ambiguous or incomplete
+- If `e2e_test_infrastructure.md` is missing, stop and inform the user to run the plan skill first
@@ -0,0 +1,64 @@
+# CI/CD Pipeline Validation & Enhancement
+
+## Initial data:
+ - Problem description: `@_docs/00_problem/problem_description.md`
+ - Restrictions: `@_docs/00_problem/restrictions.md`
+ - Full Solution Description: `@_docs/01_solution/solution.md`
+ - Components: `@_docs/02_components`
+ - Environment Strategy: `@_docs/00_templates/environment_strategy.md`
+
+## Role
+  You are a DevOps engineer
+
+## Task
+ - Review existing CI/CD pipeline configuration
+ - Validate all stages are working correctly
+ - Optimize pipeline performance (parallelization, caching)
+ - Ensure test coverage gates are enforced
+ - Verify security scanning is properly configured
+ - Add missing quality gates
+
+## Checklist
+
+### Pipeline Health
+ - [ ] All stages execute successfully
+ - [ ] Build time is acceptable (<10 min for most projects)
+ - [ ] Caching is properly configured (dependencies, build artifacts)
+ - [ ] Parallel execution where possible
+
+### Quality Gates
+ - [ ] Code coverage threshold enforced (minimum 75%)
+ - [ ] Linting errors block merge
+ - [ ] Security vulnerabilities block merge (critical/high)
+ - [ ] All tests must pass
+
+### Environment Deployments
+ - [ ] Staging deployment works on merge to stage branch
+ - [ ] Environment variables properly configured per environment
+ - [ ] Secrets are securely managed (not in code)
+ - [ ] Rollback procedure documented
+
+### Monitoring
+ - [ ] Build notifications configured (Slack, email, etc.)
+ - [ ] Failed build alerts
+ - [ ] Deployment success/failure notifications
+
+## Output
+
+### Pipeline Status Report
+ - Current pipeline configuration summary
+ - Issues found and fixes applied
+ - Performance metrics (build times)
+
+### Recommended Improvements
+ - Short-term improvements
+ - Long-term optimizations
+
+### Quality Gate Configuration
+ - Thresholds configured
+ - Enforcement rules
+
+## Notes
+ - Do not break existing functionality
+ - Test changes in separate branch first
+ - Document any manual steps required
@@ -0,0 +1,38 @@
+# Code Review
+
+## Initial data:
+ - Problem description: `@_docs/00_problem/problem_description.md`.
+ - Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`.
+ - Security approach: `@_docs/00_problem/security_approach.md`.
+ - Full Solution Description: `@_docs/01_solution/solution.md`
+ - Components: `@_docs/02_components`
+
+## Role
+  You are a senior software engineer performing code review
+
+## Task
+ - Review implemented code against component specifications
+ - Check code quality: readability, maintainability, SOLID principles
+ - Check error handling consistency
+ - Check logging implementation
+ - Check security requirements are met
+ - Check test coverage is adequate
+ - Identify code smells and technical debt
+
+## Output
+ ### Issues Found
+  For each issue:
+  - File/Location
+  - Issue type (Bug/Security/Performance/Style/Debt)
+  - Description
+  - Suggested fix
+  - Priority (High/Medium/Low)
+
+ ### Summary
+  - Total issues by type
+  - Blocking issues that must be fixed
+  - Recommended improvements
+
+## Notes
+ - Can also use Cursor's built-in review feature
+ - Focus on critical issues first
@@ -0,0 +1,53 @@
+# Implement Initial Structure
+
+## Input
+- Structure plan: `_docs/02_tasks/<topic>/initial_structure.md` (produced by decompose skill)
+
+## Context
+- Problem description: `@_docs/00_problem/problem.md`
+- Restrictions: `@_docs/00_problem/restrictions.md`
+- Solution: `@_docs/01_solution/solution.md`
+
+## Role
+You are a professional software architect
+
+## Task
+- Read carefully the structure plan in `initial_structure.md`
+- Execute the plan — create the project skeleton:
+  - DTOs and shared models
+  - Component interfaces
+  - Empty implementations (stubs)
+  - Helpers — empty implementations or interfaces
+- Add .gitignore appropriate for the project's language/framework
+- Add .env.example with required environment variables
+- Configure CI/CD pipeline per the structure plan stages
+- Apply environment strategy (dev, staging, production) per the structure plan
+- Add database migration setup if applicable
+- Add README.md, describe the project based on the solution
+- Create test folder structure per the structure plan
+- Configure branch protection rules recommendations
+
+## Example
+The structure should roughly look like this (varies by tech stack):
+  - .gitignore
+  - .env.example
+  - .github/workflows/ (or .gitlab-ci.yml or azure-pipelines.yml)
+  - api/
+  - components/
+    - component1_folder/
+    - component2_folder/
+  - db/
+    - migrations/
+  - helpers/
+  - models/
+  - tests/
+    - unit/
+    - integration/
+      - test_data/
+
+Semantically coherent components may have their own project or subfolder. Common interfaces can be in a shared layer or per-component — follow language conventions.
+
+## Notes
+- Follow SOLID, KISS, DRY
+- Follow conventions of the project's programming language
+- Ask as many questions as needed
@@ -0,0 +1,62 @@
+# Implement Next Wave
+
+Identify the next batch of independent features and implement them in parallel using the implementer subagent.
+
+## Prerequisites
+- Project scaffolded (`/implement-initial` completed)
+- `_docs/02_tasks/<topic>/SUMMARY.md` exists
+- `_docs/02_tasks/<topic>/cross_dependencies.md` exists
+
+## Wave Sizing
+- One wave = one phase from SUMMARY.md (features whose dependencies are all satisfied)
+- Max 4 subagents run concurrently; features in the same component run sequentially
+- If a phase has more than 8 features or more than 20 complexity points, suggest splitting into smaller waves and let the user cherry-pick which features to include
+
+## Task
+
+1. **Read the implementation plan**
+   - Read `SUMMARY.md` for the phased implementation order
+   - Read `cross_dependencies.md` for the dependency graph
+
+2. **Detect current progress**
+   - Analyze the codebase to determine which features are already implemented
+   - Match implemented code against feature specs in `_docs/02_tasks/<topic>/`
+   - Identify the next incomplete wave/phase from the implementation order
+
+3. **Present the wave**
+   - List all features in this wave with their complexity points
+   - Show which component each feature belongs to
+   - Confirm total features and estimated complexity
+   - If the phase exceeds 8 features or 20 complexity points, recommend splitting and let user select a subset
+   - **BLOCKING**: Do NOT proceed until user confirms
+
+4. **Launch parallel implementation**
+   - For each feature in the wave, launch an `implementer` subagent in background
+   - Each subagent receives the path to its feature spec file
+   - Features within different components can run in parallel
+   - Features within the same component should run sequentially to avoid file conflicts
+
+5. **Monitor and report**
+   - Wait for all subagents to complete
+   - Collect results from each: what was implemented, test results, any issues
+   - Run the full test suite
+   - Report summary:
+     - Features completed successfully
+     - Features that failed or need manual attention
+     - Test results (passed/failed/skipped)
+     - Any mocks created for future-wave dependencies
+
+6. **Post-wave actions**
+   - Suggest: `git add . && git commit` with a wave-level commit message
+   - If all features passed: "Ready for next wave. Run `/implement-wave` again."
+   - If some failed: "Fix the failing features before proceeding to the next wave."
+
+## Safety Rules
+- Never launch features whose dependencies are not yet implemented
+- Features within the same component run sequentially, not in parallel
+- If a subagent fails, do NOT retry automatically — report and let user decide
+- Always run tests after the wave completes, before suggesting commit
+
+## Notes
+- Ask questions if the implementation order is ambiguous
+- If SUMMARY.md or cross_dependencies.md is missing, stop and inform the user to run the decompose skill first
@@ -0,0 +1,122 @@
+# Observability Planning
+
+## Initial data:
+ - Problem description: `@_docs/00_problem/problem_description.md`
+ - Full Solution Description: `@_docs/01_solution/solution.md`
+ - Components: `@_docs/02_components`
+ - Deployment Strategy: `@_docs/02_components/deployment_strategy.md`
+
+## Role
+  You are a Site Reliability Engineer (SRE)
+
+## Task
+ - Define logging strategy across all components
+ - Plan metrics collection and dashboards
+ - Design distributed tracing (if applicable)
+ - Establish alerting rules
+ - Document incident response procedures
+
+## Output
+
+### Logging Strategy
+
+#### Log Levels
+| Level | Usage | Example |
+|-------|-------|---------|
+| ERROR | Exceptions, failures requiring attention | Database connection failed |
+| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
+| INFO | Significant business events | User registered, Order placed |
+| DEBUG | Detailed diagnostic information | Request payload, Query params |
+
+#### Log Format
+```json
+{
+  "timestamp": "ISO8601",
+  "level": "INFO",
+  "service": "service-name",
+  "correlation_id": "uuid",
+  "message": "Event description",
+  "context": {}
+}
+```
+
+#### Log Storage
+- Development: Console/file
+- Staging: Centralized (ELK, CloudWatch, etc.)
+- Production: Centralized with retention policy
+
+### Metrics
+
+#### System Metrics
+- CPU usage
+- Memory usage
+- Disk I/O
+- Network I/O
+
+#### Application Metrics
+| Metric | Type | Description |
+|--------|------|-------------|
+| request_count | Counter | Total requests |
+| request_duration | Histogram | Response time |
+| error_count | Counter | Failed requests |
+| active_connections | Gauge | Current connections |
+
+#### Business Metrics
+- [Define based on acceptance criteria]
+
+### Distributed Tracing
+
+#### Trace Context
+- Correlation ID propagation
+- Span naming conventions
+- Sampling strategy
+
+#### Integration Points
+- HTTP headers
+- Message queue metadata
+- Database query tagging
+
+### Alerting
+
+#### Alert Categories
+| Severity | Response Time | Examples |
+|----------|---------------|----------|
+| Critical | 5 min | Service down, Data loss |
+| High | 30 min | High error rate, Performance degradation |
+| Medium | 4 hours | Elevated latency, Disk usage high |
+| Low | Next business day | Non-critical warnings |
+
+#### Alert Rules
+```yaml
+alerts:
+  - name: high_error_rate
+    condition: error_rate > 5%
+    duration: 5m
+    severity: high
+    
+  - name: service_down
+    condition: health_check_failed
+    duration: 1m
+    severity: critical
+```
+
+### Dashboards
+
+#### Operations Dashboard
+- Service health status
+- Request rate and error rate
+- Response time percentiles
+- Resource utilization
+
+#### Business Dashboard
+- Key business metrics
+- User activity
+- Transaction volumes
+
+Store output to `_docs/02_components/observability_plan.md`
+
+## Notes
+ - Follow the principle: "If it's not monitored, it's not in production"
+ - Balance verbosity with cost
+ - Ensure PII is not logged
+ - Plan for log rotation and retention
@@ -0,0 +1,22 @@
+---
+description: Coding rules
+alwaysApply: true
+---
+# Coding preferences
+- Always prefer simple solution
+- Generate concise code
+- Do not put comments in the code
+- Do not put logs unless it is an exception, or was asked specifically
+- Do not put code annotations unless it was asked specifically 
+- Write code that takes into account the different environments: development, production
+- You are careful to make changes that are requested or you are confident the changes are well understood and related to the change being requested
+- Mocking data is needed only for tests, never mock data for dev or prod env
+- When you add new libraries or dependencies make sure you are using the same version of it as other parts of the code
+
+- Focus on the areas of code relevant to the task
+- Do not touch code that is unrelated to the task
+- Always think about what other methods and areas of code might be affected by the code changes
+- When you think you are done with changes, run tests and make sure they are not broken
+- Do not rename any databases or tables or table columns without confirmation. Avoid such renaming if possible.
+- Do not create diagrams unless I ask explicitly
+- Make sure we don't commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task
@@ -0,0 +1,9 @@
+---
+description: Techstack
+alwaysApply: true
+---
+# Tech Stack
+- Using Postgres database
+- Depending on task, for backend prefer .Net or Python. Could be RUST for more specific things.
+- For Frontend, use React with Tailwind css (or even plain css, if it is a simple project)
+- document api with OpenAPI
@@ -0,0 +1,281 @@
+---
+name: decompose
+description: |
+  Decompose planned components into atomic implementable features with bootstrap structure plan.
+  4-step workflow: bootstrap structure plan, feature decomposition, cross-component verification, and Jira task creation.
+  Supports project mode (_docs/ structure), single component mode, and standalone mode (@file.md).
+  Trigger phrases:
+  - "decompose", "decompose features", "feature decomposition"
+  - "task decomposition", "break down components"
+  - "prepare for implementation"
+disable-model-invocation: true
+---
+
+# Feature Decomposition
+
+Decompose planned components into atomic, implementable feature specs with a bootstrap structure plan through a systematic workflow.
+
+## Core Principles
+
+- **Atomic features**: each feature does one thing; if it exceeds 5 complexity points, split it
+- **Behavioral specs, not implementation plans**: describe what the system should do, not how to build it
+- **Save immediately**: write artifacts to disk after each component; never accumulate unsaved work
+- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
+- **Plan, don't code**: this workflow produces documents and Jira tasks, never implementation code
+
+## Context Resolution
+
+Determine the operating mode based on invocation before any other logic runs.
+
+**Full project mode** (no explicit input file provided):
+- PLANS_DIR: `_docs/02_plans/`
+- TASKS_DIR: `_docs/02_tasks/`
+- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, PLANS_DIR
+- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (cross-verification) + Step 4 (Jira)
+
+**Single component mode** (provided file is within `_docs/02_plans/` and inside a `components/` subdirectory):
+- PLANS_DIR: `_docs/02_plans/`
+- TASKS_DIR: `_docs/02_tasks/`
+- Derive `<topic>`, component number, and component name from the file path
+- Ask user for the parent Epic ID
+- Runs Step 2 (that component only) + Step 4 (Jira)
+- Overwrites existing feature files in that component's TASKS_DIR subdirectory
+
+**Standalone mode** (explicit input file provided, not within `_docs/02_plans/`):
+- INPUT_FILE: the provided file (treated as a component spec)
+- Derive `<topic>` from the input filename (without extension)
+- TASKS_DIR: `_standalone/<topic>/tasks/`
+- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
+- Ask user for the parent Epic ID
+- Runs Step 2 (that component only) + Step 4 (Jira)
+
+Announce the detected mode and resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+**Full project mode:**
+
+| File | Purpose |
+|------|---------|
+| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations (if available) |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria (if available) |
+| `_docs/01_solution/solution.md` | Finalized solution |
+| `PLANS_DIR/<topic>/architecture.md` | Architecture from plan skill |
+| `PLANS_DIR/<topic>/system-flows.md` | System flows from plan skill |
+| `PLANS_DIR/<topic>/components/[##]_[name]/description.md` | Component specs from plan skill |
+
+**Single component mode:**
+
+| File | Purpose |
+|------|---------|
+| The provided component `description.md` | Component spec to decompose |
+| Corresponding `tests.md` in the same directory (if available) | Test specs for context |
+
+**Standalone mode:**
+
+| File | Purpose |
+|------|---------|
+| INPUT_FILE (the provided file) | Component spec to decompose |
+
+### Prerequisite Checks (BLOCKING)
+
+**Full project mode:**
+1. At least one `<topic>/` directory exists under PLANS_DIR with `architecture.md` and `components/` — **STOP if missing**
+2. If multiple topics exist, ask user which one to decompose
+3. Create TASKS_DIR if it does not exist
+4. If `TASKS_DIR/<topic>/` already exists, ask user: **resume from last checkpoint or start fresh?**
+
+**Single component mode:**
+1. The provided component file exists and is non-empty — **STOP if missing**
+2. Create the component's subdirectory under TASKS_DIR if it does not exist
+
+**Standalone mode:**
+1. INPUT_FILE exists and is non-empty — **STOP if missing**
+2. Create TASKS_DIR if it does not exist
+
+## Artifact Management
+
+### Directory Structure
+
+```
+TASKS_DIR/<topic>/
+├── initial_structure.md                        (Step 1, full mode only)
+├── cross_dependencies.md                       (Step 3, full mode only)
+├── SUMMARY.md                                  (final)
+├── [##]_[component_name]/
+│   ├── [##].[##]_feature_[feature_name].md
+│   ├── [##].[##]_feature_[feature_name].md
+│   └── ...
+├── [##]_[component_name]/
+│   └── ...
+└── ...
+```
+
+### Save Timing
+
+| Step | Save immediately after | Filename |
+|------|------------------------|----------|
+| Step 1 | Bootstrap structure plan complete | `initial_structure.md` |
+| Step 2 | Each component decomposed | `[##]_[name]/[##].[##]_feature_[feature_name].md` |
+| Step 3 | Cross-component verification complete | `cross_dependencies.md` |
+| Step 4 | Jira tasks created | Jira via MCP |
+| Final | All steps complete | `SUMMARY.md` |
+
+### Resumability
+
+If `TASKS_DIR/<topic>/` already contains artifacts:
+
+1. List existing files and match them to the save timing table
+2. Identify the last completed component based on which feature files exist
+3. Resume from the next incomplete component
+4. Inform the user which components are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all applicable steps. Update status as each step/component completes.
+
+## Workflow
+
+### Step 1: Bootstrap Structure Plan (full project mode only)
+
+**Role**: Professional software architect
+**Goal**: Produce `initial_structure.md` describing the project skeleton for implementation
+**Constraints**: This is a plan document, not code. The `implement-initial` command executes it.
+
+1. Read architecture.md, all component specs, and system-flows.md from PLANS_DIR
+2. Read problem, solution, and restrictions from `_docs/00_problem/` and `_docs/01_solution/`
+3. Research best implementation patterns for the identified tech stack
+4. Document the structure plan using `templates/initial-structure.md`
+
+**Self-verification**:
+- [ ] All components have corresponding folders in the layout
+- [ ] All inter-component interfaces have DTOs defined
+- [ ] CI/CD stages cover build, lint, test, security, deploy
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Test structure includes unit and integration test locations
+
+**Save action**: Write `initial_structure.md`
+
+**BLOCKING**: Present structure plan summary to user. Do NOT proceed until user confirms.
+
+---
+
+### Step 2: Feature Decomposition (all modes)
+
+**Role**: Professional software architect
+**Goal**: Decompose each component into atomic, implementable feature specs
+**Constraints**: Behavioral specs only — describe what, not how. No implementation code.
+
+For each component (or the single provided component):
+
+1. Read the component's `description.md` and `tests.md` (if available)
+2. Decompose into atomic features; create only 1 feature if the component is simple or atomic
+3. Split into multiple features only when it is necessary and would be easier to implement
+4. Do not create features of other components — only features of the current component
+5. Each feature should be atomic, containing 0 APIs or a list of semantically connected APIs
+6. Write each feature spec using `templates/feature-spec.md`
+7. Estimate complexity per feature (1, 2, 3, 5 points); no feature should exceed 5 points — split if it does
+8. Note feature dependencies (within component and cross-component)
+
+**Self-verification** (per component):
+- [ ] Every feature is atomic (single concern)
+- [ ] No feature exceeds 5 complexity points
+- [ ] Feature dependencies are noted
+- [ ] Features cover all interfaces defined in the component spec
+- [ ] No features duplicate work from other components
+
+**Save action**: Write each `[##]_[name]/[##].[##]_feature_[feature_name].md`
+
+---
+
+### Step 3: Cross-Component Verification (full project mode only)
+
+**Role**: Professional software architect and analyst
+**Goal**: Verify feature consistency across all components
+**Constraints**: Review step — fix gaps found, do not add new features
+
+1. Verify feature dependencies across all components are consistent
+2. Check no gaps: every interface in architecture.md has features covering it
+3. Check no overlaps: features don't duplicate work across components
+4. Produce dependency matrix showing cross-component feature dependencies
+5. Determine recommended implementation order based on dependencies
+
+**Self-verification**:
+- [ ] Every architecture interface is covered by at least one feature
+- [ ] No circular feature dependencies across components
+- [ ] Cross-component dependencies are explicitly noted in affected feature specs
+
+**Save action**: Write `cross_dependencies.md`
+
+**BLOCKING**: Present cross-component summary to user. Do NOT proceed until user confirms.
+
+---
+
+### Step 4: Jira Tasks (all modes)
+
+**Role**: Professional product manager
+**Goal**: Create Jira tasks from feature specs under the appropriate parent epics
+**Constraints**: Be concise — fewer words with the same meaning is better
+
+1. For each feature spec, create a Jira task following the parsing rules and field mapping from `gen_jira_task_and_branch.md` (skip branch creation and file renaming — those happen during implementation)
+2. In full mode: search Jira for epics matching component names/labels to find parent epic IDs
+3. In single component mode: use the Epic ID obtained during context resolution
+4. In standalone mode: use the Epic ID obtained during context resolution
+5. Do NOT create git branches or rename files — that happens during implementation
+
+**Self-verification**:
+- [ ] Every feature has a corresponding Jira task
+- [ ] Every task is linked to the correct parent epic
+- [ ] Task descriptions match feature spec content
+
+**Save action**: Jira tasks created via MCP
+
+---
+
+## Summary Report
+
+After all steps complete, write `SUMMARY.md` using `templates/summary.md` as structure.
+
+## Common Mistakes
+
+- **Coding during decomposition**: this workflow produces specs, never code
+- **Over-splitting**: don't create many features if the component is simple — 1 feature is fine
+- **Features exceeding 5 points**: split them; no feature should be too complex for a single task
+- **Cross-component features**: each feature belongs to exactly one component
+- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
+- **Creating git branches**: branch creation is an implementation concern, not a decomposition one
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Ambiguous component boundaries | ASK user |
+| Feature complexity exceeds 5 points after splitting | ASK user |
+| Missing component specs in PLANS_DIR | ASK user |
+| Cross-component dependency conflict | ASK user |
+| Jira epic not found for a component | ASK user for Epic ID |
+| Component naming | PROCEED, confirm at next BLOCKING gate |
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│            Feature Decomposition (4-Step Method)               │
+├────────────────────────────────────────────────────────────────┤
+│ CONTEXT: Resolve mode (full / single component / standalone)   │
+│ 1. Bootstrap Structure  → initial_structure.md (full only)     │
+│    [BLOCKING: user confirms structure]                         │
+│ 2. Feature Decompose    → [##]_[name]/[##].[##]_feature_*     │
+│ 3. Cross-Verification   → cross_dependencies.md (full only)   │
+│    [BLOCKING: user confirms dependencies]                      │
+│ 4. Jira Tasks           → Jira via MCP                        │
+│    ─────────────────────────────────────────────────           │
+│    Summary → SUMMARY.md                                        │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Atomic features · Behavioral specs · Save now      │
+│             Ask don't assume · Plan don't code                 │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,108 @@
+# Feature Specification Template
+
+Create a focused behavioral specification that describes **what** the system should do, not **how** it should be built.
+Save as `TASKS_DIR/<topic>/[##]_[component_name]/[##].[##]_feature_[feature_name].md`.
+
+---
+
+```markdown
+# [Feature Name]
+
+**Status**: Draft | **Date**: [YYYY-MM-DD] | **Feature**: [Brief Feature Description]
+**Complexity**: [1|2|3|5] points
+**Dependencies**: [List dependent features or "None"]
+**Component**: [##]_[component_name]
+
+## Problem
+
+Clear, concise statement of the problem users are facing.
+
+## Outcome
+
+- Measurable or observable goal 1
+- Measurable or observable goal 2
+
+## Scope
+
+### Included
+- What's in scope for this feature
+
+### Excluded
+- Explicitly what's NOT in scope
+
+## Acceptance Criteria
+
+**AC-1: [Title]**
+Given [precondition]
+When [action]
+Then [expected result]
+
+**AC-2: [Title]**
+Given [precondition]
+When [action]
+Then [expected result]
+
+## Non-Functional Requirements
+
+**Performance**
+- [requirement if relevant]
+
+**Compatibility**
+- [requirement if relevant]
+
+**Reliability**
+- [requirement if relevant]
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | [test subject] | [expected result] |
+
+## Integration Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | [setup] | [test subject] | [expected behavior] | [NFR if any] |
+
+## Constraints
+
+- [Architectural pattern constraint if critical]
+- [Technical limitation]
+- [Integration requirement]
+
+## Risks & Mitigation
+
+**Risk 1: [Title]**
+- *Risk*: [Description]
+- *Mitigation*: [Approach]
+```
+
+---
+
+## Complexity Points Guide
+
+- 1 point: Trivial, self-contained, no dependencies
+- 2 points: Non-trivial, low complexity, minimal coordination
+- 3 points: Multi-step, moderate complexity, potential alignment needed
+- 5 points: Difficult, interconnected logic, medium-high risk
+- 8 points: Too complex — split into smaller features
+
+## Output Guidelines
+
+**DO:**
+- Focus on behavior and user experience
+- Use clear, simple language
+- Keep acceptance criteria testable (Gherkin format)
+- Include realistic scope boundaries
+- Write from the user's perspective
+- Include complexity estimation
+- Note dependencies on other features
+
+**DON'T:**
+- Include implementation details (file paths, classes, methods)
+- Prescribe technical solutions or libraries
+- Add architectural diagrams or code examples
+- Specify exact API endpoints or data structures
+- Include step-by-step implementation instructions
+- Add "how to build" guidance
@@ -0,0 +1,113 @@
+# Initial Structure Plan Template
+
+Use this template for the bootstrap structure plan. Save as `TASKS_DIR/<topic>/initial_structure.md`.
+
+---
+
+```markdown
+# Initial Project Structure Plan
+
+**Date**: [YYYY-MM-DD]
+**Tech Stack**: [language, framework, database, etc.]
+**Source**: architecture.md, component specs from _docs/02_plans/<topic>/
+
+## Project Folder Layout
+
+```
+project-root/
+├── [folder structure based on tech stack and components]
+└── ...
+```
+
+### Layout Rationale
+
+[Brief explanation of why this structure was chosen — language conventions, framework patterns, etc.]
+
+## DTOs and Interfaces
+
+### Shared DTOs
+
+| DTO Name | Used By Components | Fields Summary |
+|----------|-------------------|---------------|
+| [name] | [component list] | [key fields] |
+
+### Component Interfaces
+
+| Component | Interface | Methods | Exposed To |
+|-----------|-----------|---------|-----------|
+| [##]_[name] | [InterfaceName] | [method list] | [consumers] |
+
+## CI/CD Pipeline
+
+| Stage | Purpose | Trigger |
+|-------|---------|---------|
+| Build | Compile/bundle the application | Every push |
+| Lint / Static Analysis | Code quality and style checks | Every push |
+| Unit Tests | Run unit test suite | Every push |
+| Integration Tests | Run integration test suite | Every push |
+| Security Scan | SAST / dependency check | Every push |
+| Deploy to Staging | Deploy to staging environment | Merge to staging branch |
+
+### Pipeline Configuration Notes
+
+[Framework-specific notes: CI tool, runners, caching, parallelism, etc.]
+
+## Environment Strategy
+
+| Environment | Purpose | Configuration Notes |
+|-------------|---------|-------------------|
+| Development | Local development | [local DB, mock services, debug flags] |
+| Staging | Pre-production testing | [staging DB, staging services, production-like config] |
+| Production | Live system | [production DB, real services, optimized config] |
+
+### Environment Variables
+
+| Variable | Dev | Staging | Production | Description |
+|----------|-----|---------|------------|-------------|
+| [VAR_NAME] | [value/source] | [value/source] | [value/source] | [purpose] |
+
+## Database Migration Approach
+
+**Migration tool**: [tool name]
+**Strategy**: [migration strategy — e.g., versioned scripts, ORM migrations]
+
+### Initial Schema
+
+[Key tables/collections that need to be created, referencing component data access patterns]
+
+## Test Structure
+
+```
+tests/
+├── unit/
+│   ├── [component_1]/
+│   ├── [component_2]/
+│   └── ...
+├── integration/
+│   ├── test_data/
+│   └── [test files]
+└── ...
+```
+
+### Test Configuration Notes
+
+[Test runner, fixtures, test data management, isolation strategy]
+
+## Implementation Order
+
+| Order | Component | Reason |
+|-------|-----------|--------|
+| 1 | [##]_[name] | [why first — foundational, no dependencies] |
+| 2 | [##]_[name] | [depends on #1] |
+| ... | ... | ... |
+```
+
+---
+
+## Guidance Notes
+
+- This is a PLAN document, not code. The `3.05_implement_initial_structure` command executes it.
+- Focus on structure and organization decisions, not implementation details.
+- Reference component specs for interface and DTO details — don't repeat everything.
+- The folder layout should follow conventions of the identified tech stack.
+- Environment strategy should account for secrets management and configuration.
@@ -0,0 +1,59 @@
+# Decomposition Summary Template
+
+Use this template after all steps complete. Save as `TASKS_DIR/<topic>/SUMMARY.md`.
+
+---
+
+```markdown
+# Decomposition Summary
+
+**Date**: [YYYY-MM-DD]
+**Topic**: [topic name]
+**Total Components**: [N]
+**Total Features**: [N]
+**Total Complexity Points**: [N]
+
+## Component Breakdown
+
+| # | Component | Features | Total Points | Jira Epic |
+|---|-----------|----------|-------------|-----------|
+| 01 | [name] | [count] | [sum] | [EPIC-ID] |
+| 02 | [name] | [count] | [sum] | [EPIC-ID] |
+| ... | ... | ... | ... | ... |
+
+## Feature List
+
+| Component | Feature | Complexity | Jira Task | Dependencies |
+|-----------|---------|-----------|-----------|-------------|
+| [##]_[name] | [##].[##]_feature_[name] | [points] | [TASK-ID] | [deps or "None"] |
+| ... | ... | ... | ... | ... |
+
+## Implementation Order
+
+Recommended sequence based on dependency analysis:
+
+| Phase | Components / Features | Rationale |
+|-------|----------------------|-----------|
+| 1 | [list] | [foundational, no dependencies] |
+| 2 | [list] | [depends on phase 1] |
+| 3 | [list] | [depends on phase 1-2] |
+| ... | ... | ... |
+
+### Parallelization Opportunities
+
+[Features/components that can be implemented concurrently within each phase]
+
+## Cross-Component Dependencies
+
+| From (Feature) | To (Feature) | Dependency Type |
+|----------------|-------------|-----------------|
+| [comp.feature] | [comp.feature] | [data / API / event] |
+| ... | ... | ... |
+
+## Artifacts Produced
+
+- `initial_structure.md` — project skeleton plan
+- `cross_dependencies.md` — dependency matrix
+- `[##]_[name]/[##].[##]_feature_*.md` — feature specs per component
+- Jira tasks created under respective epics
+```
@@ -0,0 +1,393 @@
+---
+name: plan
+description: |
+  Decompose a solution into architecture, system flows, components, tests, and Jira epics.
+  Systematic 5-step planning workflow with BLOCKING gates, self-verification, and structured artifact management.
+  Supports project mode (_docs/ + _docs/02_plans/ structure) and standalone mode (@file.md).
+  Trigger phrases:
+  - "plan", "decompose solution", "architecture planning"
+  - "break down the solution", "create planning documents"
+  - "component decomposition", "solution analysis"
+disable-model-invocation: true
+---
+
+# Solution Planning
+
+Decompose a problem and solution into architecture, system flows, components, tests, and Jira epics through a systematic 5-step workflow.
+
+## Core Principles
+
+- **Single Responsibility**: each component does one thing well; do not spread related logic across components
+- **Dumb code, smart data**: keep logic simple, push complexity into data structures and configuration
+- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
+- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
+- **Plan, don't code**: this workflow produces documents and specs, never implementation code
+
+## Context Resolution
+
+Determine the operating mode based on invocation before any other logic runs.
+
+**Project mode** (no explicit input file provided):
+- PROBLEM_FILE: `_docs/00_problem/problem.md`
+- SOLUTION_FILE: `_docs/01_solution/solution.md`
+- PLANS_DIR: `_docs/02_plans/`
+- All existing guardrails apply as-is.
+
+**Standalone mode** (explicit input file provided, e.g. `/plan @some_doc.md`):
+- INPUT_FILE: the provided file (treated as combined problem + solution context)
+- Derive `<topic>` from the input filename (without extension)
+- PLANS_DIR: `_standalone/<topic>/plans/`
+- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
+- `acceptance_criteria.md` and `restrictions.md` are optional — warn if absent
+
+Announce the detected mode and resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+**Project mode:**
+
+| File | Purpose |
+|------|---------|
+| PROBLEM_FILE (`_docs/00_problem/problem.md`) | Problem description and context |
+| `_docs/00_problem/input_data/` | Reference data examples (if available) |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations (if available) |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria (if available) |
+| SOLUTION_FILE (`_docs/01_solution/solution.md`) | Solution draft to decompose |
+
+**Standalone mode:**
+
+| File | Purpose |
+|------|---------|
+| INPUT_FILE (the provided file) | Combined problem + solution context |
+
+### Prerequisite Checks (BLOCKING)
+
+**Project mode:**
+1. PROBLEM_FILE exists and is non-empty — **STOP if missing**
+2. SOLUTION_FILE exists and is non-empty — **STOP if missing**
+3. Create PLANS_DIR if it does not exist
+4. If `PLANS_DIR/<topic>/` already exists, ask user: **resume from last checkpoint or start fresh?**
+
+**Standalone mode:**
+1. INPUT_FILE exists and is non-empty — **STOP if missing**
+2. Warn if no `restrictions.md` or `acceptance_criteria.md` provided alongside INPUT_FILE
+3. Create PLANS_DIR if it does not exist
+4. If `PLANS_DIR/<topic>/` already exists, ask user: **resume from last checkpoint or start fresh?**
+
+## Artifact Management
+
+### Directory Structure
+
+At the start of planning, create a topic-named working directory under PLANS_DIR:
+
+```
+PLANS_DIR/<topic>/
+├── architecture.md
+├── system-flows.md
+├── risk_mitigations.md
+├── risk_mitigations_02.md          (iterative, ## as sequence)
+├── components/
+│   ├── 01_[name]/
+│   │   ├── description.md
+│   │   └── tests.md
+│   ├── 02_[name]/
+│   │   ├── description.md
+│   │   └── tests.md
+│   └── ...
+├── common-helpers/
+│   ├── 01_helper_[name]/
+│   ├── 02_helper_[name]/
+│   └── ...
+├── e2e_test_infrastructure.md
+├── diagrams/
+│   ├── components.drawio
+│   └── flows/
+│       ├── flow_[name].md          (Mermaid)
+│       └── ...
+└── FINAL_report.md
+```
+
+### Save Timing
+
+| Step | Save immediately after | Filename |
+|------|------------------------|----------|
+| Step 1 | Architecture analysis complete | `architecture.md` |
+| Step 1 | System flows documented | `system-flows.md` |
+| Step 2 | Each component analyzed | `components/[##]_[name]/description.md` |
+| Step 2 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
+| Step 2 | Diagrams generated | `diagrams/` |
+| Step 3 | Risk assessment complete | `risk_mitigations.md` |
+| Step 4 | Tests written per component | `components/[##]_[name]/tests.md` |
+| Step 4b | E2E test infrastructure spec | `e2e_test_infrastructure.md` |
+| Step 5 | Epics created in Jira | Jira via MCP |
+| Final | All steps complete | `FINAL_report.md` |
+
+### Save Principles
+
+1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
+2. **Incremental updates**: same file can be updated multiple times; append or replace
+3. **Preserve process**: keep all intermediate files even after integration into final report
+4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
+
+### Resumability
+
+If `PLANS_DIR/<topic>/` already contains artifacts:
+
+1. List existing files and match them to the save timing table above
+2. Identify the last completed step based on which artifacts exist
+3. Resume from the next incomplete step
+4. Inform the user which steps are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all steps (1 through 5, including 4b). Update status as each step completes.
+
+## Workflow
+
+### Step 1: Solution Analysis
+
+**Role**: Professional software architect
+**Goal**: Produce `architecture.md` and `system-flows.md` from the solution draft
+**Constraints**: No code, no component-level detail yet; focus on system-level view
+
+1. Read all input files thoroughly
+2. Research unknown or questionable topics via internet; ask user about ambiguities
+3. Document architecture using `templates/architecture.md` as structure
+4. Document system flows using `templates/system-flows.md` as structure
+
+**Self-verification**:
+- [ ] Architecture covers all capabilities mentioned in solution.md
+- [ ] System flows cover all main user/system interactions
+- [ ] No contradictions with problem.md or restrictions.md
+- [ ] Technology choices are justified
+
+**Save action**: Write `architecture.md` and `system-flows.md`
+
+**BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
+
+---
+
+### Step 2: Component Decomposition
+
+**Role**: Professional software architect
+**Goal**: Decompose the architecture into components with detailed specs
+**Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
+
+1. Identify components from the architecture; think about separation, reusability, and communication patterns
+2. If additional components are needed (data preparation, shared helpers), create them
+3. For each component, write a spec using `templates/component-spec.md` as structure
+4. Generate diagrams:
+   - draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
+   - Mermaid flowchart per main control flow
+5. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified. 
+
+**Self-verification**:
+- [ ] Each component has a single, clear responsibility
+- [ ] No functionality is spread across multiple components
+- [ ] All inter-component interfaces are defined (who calls whom, with what)
+- [ ] Component dependency graph has no circular dependencies
+- [ ] All components from architecture.md are accounted for
+
+**Save action**: Write:
+ - each component `components/[##]_[name]/description.md`
+ - comomon helper `common-helpers/[##]_helper_[name].md`
+ - diagrams `diagrams/`
+
+**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
+
+---
+
+### Step 3: Architecture Review & Risk Assessment
+
+**Role**: Professional software architect and analyst
+**Goal**: Validate all artifacts for consistency, then identify and mitigate risks
+**Constraints**: This is a review step — fix problems found, do not add new features
+
+#### 3a. Evaluator Pass (re-read ALL artifacts)
+
+Review checklist:
+- [ ] All components follow Single Responsibility Principle
+- [ ] All components follow dumb code / smart data principle
+- [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
+- [ ] No circular dependencies in the dependency graph
+- [ ] No missing interactions between components
+- [ ] No over-engineering — is there a simpler decomposition?
+- [ ] Security considerations addressed in component design
+- [ ] Performance bottlenecks identified
+- [ ] API contracts are consistent across components
+
+Fix any issues found before proceeding to risk identification.
+
+#### 3b. Risk Identification
+
+1. Identify technical and project risks
+2. Assess probability and impact using `templates/risk-register.md`
+3. Define mitigation strategies
+4. Apply mitigations to architecture, flows, and component documents where applicable
+
+**Self-verification**:
+- [ ] Every High/Critical risk has a concrete mitigation strategy
+- [ ] Mitigations are reflected in the relevant component or architecture docs
+- [ ] No new risks introduced by the mitigations themselves
+
+**Save action**: Write `risk_mitigations.md`
+
+**BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
+
+**Iterative**: If user requests another round, repeat Step 3 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
+
+---
+
+### Step 4: Test Specifications
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
+**Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
+
+1. For each component, write tests using `templates/test-spec.md` as structure
+2. Cover all 4 types: integration, performance, security, acceptance
+3. Include test data management (setup, teardown, isolation)
+4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
+
+**Self-verification**:
+- [ ] Every acceptance criterion has at least one test covering it
+- [ ] Test inputs are realistic and well-defined
+- [ ] Expected results are specific and measurable
+- [ ] No component is left without tests
+
+**Save action**: Write each `components/[##]_[name]/tests.md`
+
+---
+
+### Step 4b: E2E Black-Box Test Infrastructure
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Specify a separate consumer application and Docker environment for black-box end-to-end testing of the main system
+**Constraints**: Spec only — no test code. Consumer must treat the main system as a black box (no internal imports, no direct DB access).
+
+1. Define Docker environment: services (system under test, test DB, consumer app, dependencies), networks, volumes
+2. Specify consumer application: tech stack, entry point, communication interfaces with the main system
+3. Define E2E test scenarios from acceptance criteria — focus on critical end-to-end use cases that cross component boundaries
+4. Specify test data management: seed data, isolation strategy, external dependency mocks
+5. Define CI/CD integration: when to run, gate behavior, timeout
+6. Define reporting format (CSV: test ID, name, execution time, result, error message)
+
+Use `templates/e2e-test-infrastructure.md` as structure.
+
+**Self-verification**:
+- [ ] Critical acceptance criteria are covered by at least one E2E scenario
+- [ ] Consumer app has no direct access to system internals
+- [ ] Docker environment is self-contained (`docker compose up` sufficient)
+- [ ] External dependencies have mock/stub services defined
+
+**Save action**: Write `e2e_test_infrastructure.md`
+
+---
+
+### Step 5: Jira Epics
+
+**Role**: Professional product manager
+**Goal**: Create Jira epics from components, ordered by dependency
+**Constraints**: Be concise — fewer words with the same meaning is better
+
+1. Generate Jira Epics from components using Jira MCP, structured per `templates/epic-spec.md`
+2. Order epics by dependency (which must be done first)
+3. Include effort estimation per epic (T-shirt size or story points range)
+4. Ensure each epic has clear acceptance criteria cross-referenced with component specs
+5. Generate updated draw.io diagram showing component-to-epic mapping
+
+**Self-verification**:
+- [ ] Every component maps to exactly one epic
+- [ ] Dependency order is respected (no epic depends on a later one)
+- [ ] Acceptance criteria are measurable
+- [ ] Effort estimates are realistic
+
+**Save action**: Epics created in Jira via MCP
+
+---
+
+## Quality Checklist (before FINAL_report.md)
+
+Before writing the final report, verify ALL of the following:
+
+### Architecture
+- [ ] Covers all capabilities from solution.md
+- [ ] Technology choices are justified
+- [ ] Deployment model is defined
+
+### Components
+- [ ] Every component follows SRP
+- [ ] No circular dependencies
+- [ ] All inter-component interfaces are defined and consistent
+- [ ] No orphan components (unused by any flow)
+
+### Risks
+- [ ] All High/Critical risks have mitigations
+- [ ] Mitigations are reflected in component/architecture docs
+- [ ] User has confirmed risk assessment is sufficient
+
+### Tests
+- [ ] Every acceptance criterion is covered by at least one test
+- [ ] All 4 test types are represented per component (where applicable)
+- [ ] Test data management is defined
+
+### E2E Test Infrastructure
+- [ ] Critical use cases covered by E2E scenarios
+- [ ] Docker environment is self-contained
+- [ ] Consumer app treats main system as black box
+- [ ] CI/CD integration and reporting defined
+
+### Epics
+- [ ] Every component maps to an epic
+- [ ] Dependency order is correct
+- [ ] Acceptance criteria are measurable
+
+**Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
+
+## Common Mistakes
+
+- **Coding during planning**: this workflow produces documents, never code
+- **Multi-responsibility components**: if a component does two things, split it
+- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
+- **Diagrams without data**: generate diagrams only after the underlying structure is documented
+- **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input
+- **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output
+- **Ignoring restrictions.md**: every constraint must be traceable in the architecture or risk register
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Ambiguous requirements | ASK user |
+| Missing acceptance criteria | ASK user |
+| Technology choice with multiple valid options | ASK user |
+| Component naming | PROCEED, confirm at next BLOCKING gate |
+| File structure within templates | PROCEED |
+| Contradictions between input files | ASK user |
+| Risk mitigation requires architecture change | ASK user |
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│                Solution Planning (5-Step Method)               │
+├────────────────────────────────────────────────────────────────┤
+│ CONTEXT: Resolve mode (project vs standalone) + set paths      │
+│ 1. Solution Analysis     → architecture.md, system-flows.md    │
+│    [BLOCKING: user confirms architecture]                      │
+│ 2. Component Decompose   → components/[##]_[name]/description  │
+│    [BLOCKING: user confirms decomposition]                     │
+│ 3. Review & Risk Assess  → risk_mitigations.md                 │
+│    [BLOCKING: user confirms risks, iterative]                  │
+│ 4. Test Specifications   → components/[##]_[name]/tests.md     │
+│ 4b.E2E Test Infra        → e2e_test_infrastructure.md          │
+│ 5. Jira Epics            → Jira via MCP                        │
+│    ─────────────────────────────────────────────────           │
+│    Quality Checklist → FINAL_report.md                         │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: SRP · Dumb code/smart data · Save immediately      │
+│             Ask don't assume · Plan don't code                 │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,128 @@
+# Architecture Document Template
+
+Use this template for the architecture document. Save as `_docs/02_plans/<topic>/architecture.md`.
+
+---
+
+```markdown
+# [System Name] — Architecture
+
+## 1. System Context
+
+**Problem being solved**: [One paragraph summarizing the problem from problem.md]
+
+**System boundaries**: [What is inside the system vs. external]
+
+**External systems**:
+
+| System | Integration Type | Direction | Purpose |
+|--------|-----------------|-----------|---------|
+| [name] | REST / Queue / DB / File | Inbound / Outbound / Both | [why] |
+
+## 2. Technology Stack
+
+| Layer | Technology | Version | Rationale |
+|-------|-----------|---------|-----------|
+| Language | | | |
+| Framework | | | |
+| Database | | | |
+| Cache | | | |
+| Message Queue | | | |
+| Hosting | | | |
+| CI/CD | | | |
+
+**Key constraints from restrictions.md**:
+- [Constraint 1 and how it affects technology choices]
+- [Constraint 2]
+
+## 3. Deployment Model
+
+**Environments**: Development, Staging, Production
+
+**Infrastructure**:
+- [Cloud provider / On-prem / Hybrid]
+- [Container orchestration if applicable]
+- [Scaling strategy: horizontal / vertical / auto]
+
+**Environment-specific configuration**:
+
+| Config | Development | Production |
+|--------|-------------|------------|
+| Database | [local/docker] | [managed service] |
+| Secrets | [.env file] | [secret manager] |
+| Logging | [console] | [centralized] |
+
+## 4. Data Model Overview
+
+> High-level data model covering the entire system. Detailed per-component models go in component specs.
+
+**Core entities**:
+
+| Entity | Description | Owned By Component |
+|--------|-------------|--------------------|
+| [entity] | [what it represents] | [component ##] |
+
+**Key relationships**:
+- [Entity A] → [Entity B]: [relationship description]
+
+**Data flow summary**:
+- [Source] → [Transform] → [Destination]: [what data and why]
+
+## 5. Integration Points
+
+### Internal Communication
+
+| From | To | Protocol | Pattern | Notes |
+|------|----|----------|---------|-------|
+| [component] | [component] | Sync REST / Async Queue / Direct call | Request-Response / Event / Command | |
+
+### External Integrations
+
+| External System | Protocol | Auth | Rate Limits | Failure Mode |
+|----------------|----------|------|-------------|--------------|
+| [system] | [REST/gRPC/etc] | [API key/OAuth/etc] | [limits] | [retry/circuit breaker/fallback] |
+
+## 6. Non-Functional Requirements
+
+| Requirement | Target | Measurement | Priority |
+|------------|--------|-------------|----------|
+| Availability | [e.g., 99.9%] | [how measured] | High/Medium/Low |
+| Latency (p95) | [e.g., <200ms] | [endpoint/operation] | |
+| Throughput | [e.g., 1000 req/s] | [peak/sustained] | |
+| Data retention | [e.g., 90 days] | [which data] | |
+| Recovery (RPO/RTO) | [e.g., RPO 1hr, RTO 4hr] | | |
+| Scalability | [e.g., 10x current load] | [timeline] | |
+
+## 7. Security Architecture
+
+**Authentication**: [mechanism — JWT / session / API key]
+
+**Authorization**: [RBAC / ABAC / per-resource]
+
+**Data protection**:
+- At rest: [encryption method]
+- In transit: [TLS version]
+- Secrets management: [tool/approach]
+
+**Audit logging**: [what is logged, where, retention]
+
+## 8. Key Architectural Decisions
+
+Record significant decisions that shaped the architecture.
+
+### ADR-001: [Decision Title]
+
+**Context**: [Why this decision was needed]
+
+**Decision**: [What was decided]
+
+**Alternatives considered**:
+1. [Alternative 1] — rejected because [reason]
+2. [Alternative 2] — rejected because [reason]
+
+**Consequences**: [Trade-offs accepted]
+
+### ADR-002: [Decision Title]
+
+...
+```
@@ -0,0 +1,156 @@
+# Component Specification Template
+
+Use this template for each component. Save as `components/[##]_[name]/description.md`.
+
+---
+
+```markdown
+# [Component Name]
+
+## 1. High-Level Overview
+
+**Purpose**: [One sentence: what this component does and its role in the system]
+
+**Architectural Pattern**: [e.g., Repository, Event-driven, Pipeline, Facade, etc.]
+
+**Upstream dependencies**: [Components that this component calls or consumes from]
+
+**Downstream consumers**: [Components that call or consume from this component]
+
+## 2. Internal Interfaces
+
+For each interface this component exposes internally:
+
+### Interface: [InterfaceName]
+
+| Method | Input | Output | Async | Error Types |
+|--------|-------|--------|-------|-------------|
+| `method_name` | `InputDTO` | `OutputDTO` | Yes/No | `ErrorType1`, `ErrorType2` |
+
+**Input DTOs**:
+```
+[DTO name]:
+  field_1: type (required/optional) — description
+  field_2: type (required/optional) — description
+```
+
+**Output DTOs**:
+```
+[DTO name]:
+  field_1: type — description
+  field_2: type — description
+```
+
+## 3. External API Specification
+
+> Include this section only if the component exposes an external HTTP/gRPC API.
+> Skip if the component is internal-only.
+
+| Endpoint | Method | Auth | Rate Limit | Description |
+|----------|--------|------|------------|-------------|
+| `/api/v1/...` | GET/POST/PUT/DELETE | Required/Public | X req/min | Brief description |
+
+**Request/Response schemas**: define per endpoint using OpenAPI-style notation.
+
+**Example request/response**:
+```json
+// Request
+{ }
+
+// Response
+{ }
+```
+
+## 4. Data Access Patterns
+
+### Queries
+
+| Query | Frequency | Hot Path | Index Needed |
+|-------|-----------|----------|--------------|
+| [describe query] | High/Medium/Low | Yes/No | Yes/No |
+
+### Caching Strategy
+
+| Data | Cache Type | TTL | Invalidation |
+|------|-----------|-----|-------------|
+| [data item] | In-memory / Redis / None | [duration] | [trigger] |
+
+### Storage Estimates
+
+| Table/Collection | Est. Row Count (1yr) | Row Size | Total Size | Growth Rate |
+|-----------------|---------------------|----------|------------|-------------|
+| [table_name] | | | | /month |
+
+### Data Management
+
+**Seed data**: [Required seed data and how to load it]
+
+**Rollback**: [Rollback procedure for this component's data changes]
+
+## 5. Implementation Details
+
+**Algorithmic Complexity**: [Big O for critical methods — only if non-trivial]
+
+**State Management**: [Local state / Global state / Stateless — explain how state is handled]
+
+**Key Dependencies**: [External libraries and their purpose]
+
+| Library | Version | Purpose |
+|---------|---------|---------|
+| [name] | [version] | [why needed] |
+
+**Error Handling Strategy**:
+- [How errors are caught, propagated, and reported]
+- [Retry policy if applicable]
+- [Circuit breaker if applicable]
+
+## 6. Extensions and Helpers
+
+> List any shared utilities this component needs that should live in a `helpers/` folder.
+
+| Helper | Purpose | Used By |
+|--------|---------|---------|
+| [helper_name] | [what it does] | [list of components] |
+
+## 7. Caveats & Edge Cases
+
+**Known limitations**:
+- [Limitation 1]
+
+**Potential race conditions**:
+- [Race condition scenario, if any]
+
+**Performance bottlenecks**:
+- [Bottleneck description and mitigation approach]
+
+## 8. Dependency Graph
+
+**Must be implemented after**: [list of component numbers/names]
+
+**Can be implemented in parallel with**: [list of component numbers/names]
+
+**Blocks**: [list of components that depend on this one]
+
+## 9. Logging Strategy
+
+| Log Level | When | Example |
+|-----------|------|---------|
+| ERROR | Unrecoverable failures | `Failed to process order {id}: {error}` |
+| WARN | Recoverable issues | `Retry attempt {n} for {operation}` |
+| INFO | Key business events | `Order {id} created by user {uid}` |
+| DEBUG | Development diagnostics | `Query returned {n} rows in {ms}ms` |
+
+**Log format**: [structured JSON / plaintext — match system standard]
+
+**Log storage**: [stdout / file / centralized logging service]
+```
+
+---
+
+## Guidance Notes
+
+- **Section 3 (External API)**: skip entirely for internal-only components. Include for any component that exposes HTTP endpoints, WebSocket connections, or gRPC services.
+- **Section 4 (Storage Estimates)**: critical for components that manage persistent data. Skip for stateless components.
+- **Section 5 (Algorithmic Complexity)**: only document if the algorithm is non-trivial (O(n^2) or worse, recursive, etc.). Simple CRUD operations don't need this.
+- **Section 6 (Helpers)**: if the helper is used by only one component, keep it inside that component. Only extract to `helpers/` if shared by 2+ components.
+- **Section 8 (Dependency Graph)**: this is essential for determining implementation order. Be precise about what "depends on" means — data dependency, API dependency, or shared infrastructure.
@@ -0,0 +1,141 @@
+# E2E Black-Box Test Infrastructure Template
+
+Describes a separate consumer application that tests the main system as a black box.
+Save as `PLANS_DIR/<topic>/e2e_test_infrastructure.md`.
+
+---
+
+```markdown
+# E2E Test Infrastructure
+
+## Overview
+
+**System under test**: [main system name and entry points — API URLs, message queues, etc.]
+**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals.
+
+## Docker Environment
+
+### Services
+
+| Service | Image / Build | Purpose | Ports |
+|---------|--------------|---------|-------|
+| system-under-test | [main app image or build context] | The main system being tested | [ports] |
+| test-db | [postgres/mysql/etc.] | Database for the main system | [ports] |
+| e2e-consumer | [build context for consumer app] | Black-box test runner | — |
+| [dependency] | [image] | [purpose — cache, queue, etc.] | [ports] |
+
+### Networks
+
+| Network | Services | Purpose |
+|---------|----------|---------|
+| e2e-net | all | Isolated test network |
+
+### Volumes
+
+| Volume | Mounted to | Purpose |
+|--------|-----------|---------|
+| [name] | [service:path] | [test data, DB persistence, etc.] |
+
+### docker-compose structure
+
+```yaml
+# Outline only — not runnable code
+services:
+  system-under-test:
+    # main system
+  test-db:
+    # database
+  e2e-consumer:
+    # consumer test app
+    depends_on:
+      - system-under-test
+```
+
+## Consumer Application
+
+**Tech stack**: [language, framework, test runner]
+**Entry point**: [how it starts — e.g., pytest, jest, custom runner]
+
+### Communication with system under test
+
+| Interface | Protocol | Endpoint / Topic | Authentication |
+|-----------|----------|-----------------|----------------|
+| [API name] | [HTTP/gRPC/AMQP/etc.] | [URL or topic] | [method] |
+
+### What the consumer does NOT have access to
+
+- No direct database access to the main system
+- No internal module imports
+- No shared memory or file system with the main system
+
+## E2E Test Scenarios
+
+### Acceptance Criteria Traceability
+
+| AC ID | Acceptance Criterion | E2E Test IDs | Coverage |
+|-------|---------------------|-------------|----------|
+| AC-01 | [criterion] | E2E-01 | Covered |
+| AC-02 | [criterion] | E2E-02, E2E-03 | Covered |
+| AC-03 | [criterion] | — | NOT COVERED — [reason] |
+
+### E2E-01: [Scenario Name]
+
+**Summary**: [One sentence: what end-to-end use case this validates]
+
+**Traces to**: AC-01
+
+**Preconditions**:
+- [System state required before test]
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | [call / send] | [response / event] |
+| 2 | [call / send] | [response / event] |
+
+**Max execution time**: [e.g., 10s]
+
+---
+
+### E2E-02: [Scenario Name]
+
+(repeat structure)
+
+---
+
+## Test Data Management
+
+**Seed data**:
+
+| Data Set | Description | How Loaded | Cleanup |
+|----------|-------------|-----------|---------|
+| [name] | [what it contains] | [SQL script / API call / fixture file] | [how removed after test] |
+
+**Isolation strategy**: [e.g., each test run gets a fresh DB via container restart, or transactions are rolled back, or namespaced data]
+
+**External dependencies**: [any external APIs that need mocking or sandbox environments]
+
+## CI/CD Integration
+
+**When to run**: [e.g., on PR merge to dev, nightly, before production deploy]
+**Pipeline stage**: [where in the CI pipeline this fits]
+**Gate behavior**: [block merge / warning only / manual approval]
+**Timeout**: [max total suite duration before considered failed]
+
+## Reporting
+
+**Format**: CSV
+**Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
+**Output path**: [where the CSV is written — e.g., ./e2e-results/report.csv]
+```
+
+---
+
+## Guidance Notes
+
+- Every E2E test MUST trace to at least one acceptance criterion. If it doesn't, question whether it's needed.
+- The consumer app must treat the main system as a true black box — no internal imports, no direct DB queries against the main system's database.
+- Keep the number of E2E tests focused on critical use cases. Exhaustive testing belongs in per-component tests (Step 4).
+- Docker environment should be self-contained — `docker compose up` must be sufficient to run the full suite.
+- If the main system requires external services (payment gateways, third-party APIs), define mock/stub services in the Docker environment.
@@ -0,0 +1,127 @@
+# Jira Epic Template
+
+Use this template for each Jira epic. Create epics via Jira MCP.
+
+---
+
+```markdown
+## Epic: [Component Name] — [Outcome]
+
+**Example**: Data Ingestion — Near-real-time pipeline
+
+### Epic Summary
+
+[1-2 sentences: what we are building + why it matters]
+
+### Problem / Context
+
+[Current state, pain points, constraints, business opportunities.
+Link to architecture.md and relevant component spec.]
+
+### Scope
+
+**In Scope**:
+- [Capability 1 — describe what, not how]
+- [Capability 2]
+- [Capability 3]
+
+**Out of Scope**:
+- [Explicit exclusion 1 — prevents scope creep]
+- [Explicit exclusion 2]
+
+### Assumptions
+
+- [System design assumption]
+- [Data structure assumption]
+- [Infrastructure assumption]
+
+### Dependencies
+
+**Epic dependencies** (must be completed first):
+- [Epic name / ID]
+
+**External dependencies**:
+- [Services, hardware, environments, certificates, data sources]
+
+### Effort Estimation
+
+**T-shirt size**: S / M / L / XL
+**Story points range**: [min]-[max]
+
+### Users / Consumers
+
+| Type | Who | Key Use Cases |
+|------|-----|--------------|
+| Internal | [team/role] | [use case] |
+| External | [user type] | [use case] |
+| System | [service name] | [integration point] |
+
+### Requirements
+
+**Functional**:
+- [API expectations, events, data handling]
+- [Idempotency, retry behavior]
+
+**Non-functional**:
+- [Availability, latency, throughput targets]
+- [Scalability, processing limits, data retention]
+
+**Security / Compliance**:
+- [Authentication, encryption, secrets management]
+- [Logging, audit trail]
+- [SOC2 / ISO / GDPR if applicable]
+
+### Design & Architecture
+
+- Architecture doc: `_docs/02_plans/<topic>/architecture.md`
+- Component spec: `_docs/02_plans/<topic>/components/[##]_[name]/description.md`
+- System flows: `_docs/02_plans/<topic>/system-flows.md`
+
+### Definition of Done
+
+- [ ] All in-scope capabilities implemented
+- [ ] Automated tests pass (unit + integration + e2e)
+- [ ] Minimum coverage threshold met (75%)
+- [ ] Runbooks written (if applicable)
+- [ ] Documentation updated
+
+### Acceptance Criteria
+
+| # | Criterion | Measurable Condition |
+|---|-----------|---------------------|
+| 1 | [criterion] | [how to verify] |
+| 2 | [criterion] | [how to verify] |
+
+### Risks & Mitigations
+
+| # | Risk | Mitigation | Owner |
+|---|------|------------|-------|
+| 1 | [top risk] | [mitigation] | [owner] |
+| 2 | | | |
+| 3 | | | |
+
+### Labels
+
+- `component:[name]`
+- `env:prod` / `env:stg`
+- `type:platform` / `type:data` / `type:integration`
+
+### Child Issues
+
+| Type | Title | Points |
+|------|-------|--------|
+| Spike | [research/investigation task] | [1-3] |
+| Task | [implementation task] | [1-5] |
+| Task | [implementation task] | [1-5] |
+| Enabler | [infrastructure/setup task] | [1-3] |
+```
+
+---
+
+## Guidance Notes
+
+- Be concise. Fewer words with the same meaning = better epic.
+- Capabilities in scope are "what", not "how" — avoid describing implementation details.
+- Dependency order matters: epics that must be done first should be listed earlier in the backlog.
+- Every epic maps to exactly one component. If a component is too large for one epic, split the component first.
+- Complexity points for child issues follow the project standard: 1, 2, 3, 5, 8. Do not create issues above 5 points — split them.
@@ -0,0 +1,104 @@
+# Final Planning Report Template
+
+Use this template after completing all 5 steps and the quality checklist. Save as `_docs/02_plans/<topic>/FINAL_report.md`.
+
+---
+
+```markdown
+# [System Name] — Planning Report
+
+## Executive Summary
+
+[2-3 sentences: what was planned, the core architectural approach, and the key outcome (number of components, epics, estimated effort)]
+
+## Problem Statement
+
+[Brief restatement from problem.md — transformed, not copy-pasted]
+
+## Architecture Overview
+
+[Key architectural decisions and technology stack summary. Reference `architecture.md` for full details.]
+
+**Technology stack**: [language, framework, database, hosting — one line]
+
+**Deployment**: [environment strategy — one line]
+
+## Component Summary
+
+| # | Component | Purpose | Dependencies | Epic |
+|---|-----------|---------|-------------|------|
+| 01 | [name] | [one-line purpose] | — | [Jira ID] |
+| 02 | [name] | [one-line purpose] | 01 | [Jira ID] |
+| ... | | | | |
+
+**Implementation order** (based on dependency graph):
+1. [Phase 1: components that can start immediately]
+2. [Phase 2: components that depend on Phase 1]
+3. [Phase 3: ...]
+
+## System Flows
+
+| Flow | Description | Key Components |
+|------|-------------|---------------|
+| [name] | [one-line summary] | [component list] |
+
+[Reference `system-flows.md` for full diagrams and details.]
+
+## Risk Summary
+
+| Level | Count | Key Risks |
+|-------|-------|-----------|
+| Critical | [N] | [brief list] |
+| High | [N] | [brief list] |
+| Medium | [N] | — |
+| Low | [N] | — |
+
+**Iterations completed**: [N]
+**All Critical/High risks mitigated**: Yes / No — [details if No]
+
+[Reference `risk_mitigations.md` for full register.]
+
+## Test Coverage
+
+| Component | Integration | Performance | Security | Acceptance | AC Coverage |
+|-----------|-------------|-------------|----------|------------|-------------|
+| [name] | [N tests] | [N tests] | [N tests] | [N tests] | [X/Y ACs] |
+| ... | | | | | |
+
+**Overall acceptance criteria coverage**: [X / Y total ACs covered] ([percentage]%)
+
+## Epic Roadmap
+
+| Order | Epic | Component | Effort | Dependencies |
+|-------|------|-----------|--------|-------------|
+| 1 | [Jira ID]: [name] | [component] | [S/M/L/XL] | — |
+| 2 | [Jira ID]: [name] | [component] | [S/M/L/XL] | Epic 1 |
+| ... | | | | |
+
+**Total estimated effort**: [sum or range]
+
+## Key Decisions Made
+
+| # | Decision | Rationale | Alternatives Rejected |
+|---|----------|-----------|----------------------|
+| 1 | [decision] | [why] | [what was rejected] |
+| 2 | | | |
+
+## Open Questions
+
+| # | Question | Impact | Assigned To |
+|---|----------|--------|-------------|
+| 1 | [unresolved question] | [what it blocks or affects] | [who should answer] |
+
+## Artifact Index
+
+| File | Description |
+|------|-------------|
+| `architecture.md` | System architecture |
+| `system-flows.md` | System flows and diagrams |
+| `components/01_[name]/description.md` | Component spec |
+| `components/01_[name]/tests.md` | Test spec |
+| `risk_mitigations.md` | Risk register |
+| `diagrams/components.drawio` | Component diagram |
+| `diagrams/flows/flow_[name].md` | Flow diagrams |
+```
@@ -0,0 +1,99 @@
+# Risk Register Template
+
+Use this template for risk assessment. Save as `_docs/02_plans/<topic>/risk_mitigations.md`.
+Subsequent iterations: `risk_mitigations_02.md`, `risk_mitigations_03.md`, etc.
+
+---
+
+```markdown
+# Risk Assessment — [Topic] — Iteration [##]
+
+## Risk Scoring Matrix
+
+|  | Low Impact | Medium Impact | High Impact |
+|--|------------|---------------|-------------|
+| **High Probability** | Medium | High | Critical |
+| **Medium Probability** | Low | Medium | High |
+| **Low Probability** | Low | Low | Medium |
+
+## Acceptance Criteria by Risk Level
+
+| Level | Action Required |
+|-------|----------------|
+| Low | Accepted, monitored quarterly |
+| Medium | Mitigation plan required before implementation |
+| High | Mitigation + contingency plan required, reviewed weekly |
+| Critical | Must be resolved before proceeding to next planning step |
+
+## Risk Register
+
+| ID | Risk | Category | Probability | Impact | Score | Mitigation | Owner | Status |
+|----|------|----------|-------------|--------|-------|------------|-------|--------|
+| R01 | [risk description] | [category] | High/Med/Low | High/Med/Low | Critical/High/Med/Low | [mitigation strategy] | [owner] | Open/Mitigated/Accepted |
+| R02 | | | | | | | | |
+
+## Risk Categories
+
+### Technical Risks
+- Technology choices may not meet requirements
+- Integration complexity underestimated
+- Performance targets unachievable
+- Security vulnerabilities in design
+- Data model cannot support future requirements
+
+### Schedule Risks
+- Dependencies delayed
+- Scope creep from ambiguous requirements
+- Underestimated complexity
+
+### Resource Risks
+- Key person dependency
+- Team lacks experience with chosen technology
+- Infrastructure not available in time
+
+### External Risks
+- Third-party API changes or deprecation
+- Vendor reliability or pricing changes
+- Regulatory or compliance changes
+- Data source availability
+
+## Detailed Risk Analysis
+
+### R01: [Risk Title]
+
+**Description**: [Detailed description of the risk]
+
+**Trigger conditions**: [What would cause this risk to materialize]
+
+**Affected components**: [List of components impacted]
+
+**Mitigation strategy**:
+1. [Action 1]
+2. [Action 2]
+
+**Contingency plan**: [What to do if mitigation fails]
+
+**Residual risk after mitigation**: [Low/Medium/High]
+
+**Documents updated**: [List architecture/component docs that were updated to reflect this mitigation]
+
+---
+
+### R02: [Risk Title]
+
+(repeat structure above)
+
+## Architecture/Component Changes Applied
+
+| Risk ID | Document Modified | Change Description |
+|---------|------------------|--------------------|
+| R01 | `architecture.md` §3 | [what changed] |
+| R01 | `components/02_[name]/description.md` §5 | [what changed] |
+
+## Summary
+
+**Total risks identified**: [N]
+**Critical**: [N] | **High**: [N] | **Medium**: [N] | **Low**: [N]
+**Risks mitigated this iteration**: [N]
+**Risks requiring user decision**: [list]
+```
@@ -0,0 +1,108 @@
+# System Flows Template
+
+Use this template for the system flows document. Save as `_docs/02_plans/<topic>/system-flows.md`.
+Individual flow diagrams go in `_docs/02_plans/<topic>/diagrams/flows/flow_[name].md`.
+
+---
+
+```markdown
+# [System Name] — System Flows
+
+## Flow Inventory
+
+| # | Flow Name | Trigger | Primary Components | Criticality |
+|---|-----------|---------|-------------------|-------------|
+| F1 | [name] | [user action / scheduled / event] | [component list] | High/Medium/Low |
+| F2 | [name] | | | |
+| ... | | | | |
+
+## Flow Dependencies
+
+| Flow | Depends On | Shares Data With |
+|------|-----------|-----------------|
+| F1 | — | F2 (via [entity]) |
+| F2 | F1 must complete first | F3 |
+
+---
+
+## Flow F1: [Flow Name]
+
+### Description
+
+[1-2 sentences: what this flow does, who triggers it, what the outcome is]
+
+### Preconditions
+
+- [Condition 1]
+- [Condition 2]
+
+### Sequence Diagram
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant ComponentA
+    participant ComponentB
+    participant Database
+
+    User->>ComponentA: [action]
+    ComponentA->>ComponentB: [call with params]
+    ComponentB->>Database: [query/write]
+    Database-->>ComponentB: [result]
+    ComponentB-->>ComponentA: [response]
+    ComponentA-->>User: [result]
+```
+
+### Flowchart
+
+```mermaid
+flowchart TD
+    Start([Trigger]) --> Step1[Step description]
+    Step1 --> Decision{Condition?}
+    Decision -->|Yes| Step2[Step description]
+    Decision -->|No| Step3[Step description]
+    Step2 --> EndNode([Result])
+    Step3 --> EndNode
+```
+
+### Data Flow
+
+| Step | From | To | Data | Format |
+|------|------|----|------|--------|
+| 1 | [source] | [destination] | [what data] | [DTO/event/etc] |
+| 2 | | | | |
+
+### Error Scenarios
+
+| Error | Where | Detection | Recovery |
+|-------|-------|-----------|----------|
+| [error type] | [which step] | [how detected] | [what happens] |
+
+### Performance Expectations
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| End-to-end latency | [target] | [conditions] |
+| Throughput | [target] | [peak/sustained] |
+
+---
+
+## Flow F2: [Flow Name]
+
+(repeat structure above)
+```
+
+---
+
+## Mermaid Diagram Conventions
+
+Follow these conventions for consistency across all flow diagrams:
+
+- **Participants**: use component names matching `components/[##]_[name]`
+- **Node IDs**: camelCase, no spaces (e.g., `validateInput`, `saveOrder`)
+- **Decision nodes**: use `{Question?}` format
+- **Start/End**: use `([label])` stadium shape
+- **External systems**: use `[[label]]` subroutine shape
+- **Subgraphs**: group by component or bounded context
+- **No styling**: do not add colors or CSS classes — let the renderer theme handle it
+- **Edge labels**: wrap special characters in quotes (e.g., `-->|"O(n) check"|`)
@@ -0,0 +1,172 @@
+# Test Specification Template
+
+Use this template for each component's test spec. Save as `components/[##]_[name]/tests.md`.
+
+---
+
+```markdown
+# Test Specification — [Component Name]
+
+## Acceptance Criteria Traceability
+
+| AC ID | Acceptance Criterion | Test IDs | Coverage |
+|-------|---------------------|----------|----------|
+| AC-01 | [criterion from acceptance_criteria.md] | IT-01, AT-01 | Covered |
+| AC-02 | [criterion] | PT-01 | Covered |
+| AC-03 | [criterion] | — | NOT COVERED — [reason] |
+
+---
+
+## Integration Tests
+
+### IT-01: [Test Name]
+
+**Summary**: [One sentence: what this test verifies]
+
+**Traces to**: AC-01, AC-03
+
+**Description**: [Detailed test scenario]
+
+**Input data**:
+```
+[specific input data for this test]
+```
+
+**Expected result**:
+```
+[specific expected output or state]
+```
+
+**Max execution time**: [e.g., 5s]
+
+**Dependencies**: [other components/services that must be running]
+
+---
+
+### IT-02: [Test Name]
+
+(repeat structure)
+
+---
+
+## Performance Tests
+
+### PT-01: [Test Name]
+
+**Summary**: [One sentence: what performance aspect is tested]
+
+**Traces to**: AC-02
+
+**Load scenario**:
+- Concurrent users: [N]
+- Request rate: [N req/s]
+- Duration: [N minutes]
+- Ramp-up: [strategy]
+
+**Expected results**:
+
+| Metric | Target | Failure Threshold |
+|--------|--------|-------------------|
+| Latency (p50) | [target] | [max] |
+| Latency (p95) | [target] | [max] |
+| Latency (p99) | [target] | [max] |
+| Throughput | [target req/s] | [min req/s] |
+| Error rate | [target %] | [max %] |
+
+**Resource limits**:
+- CPU: [max %]
+- Memory: [max MB/GB]
+- Database connections: [max pool size]
+
+---
+
+### PT-02: [Test Name]
+
+(repeat structure)
+
+---
+
+## Security Tests
+
+### ST-01: [Test Name]
+
+**Summary**: [One sentence: what security aspect is tested]
+
+**Traces to**: AC-04
+
+**Attack vector**: [e.g., SQL injection on search endpoint, privilege escalation via direct ID access]
+
+**Test procedure**:
+1. [Step 1]
+2. [Step 2]
+
+**Expected behavior**: [what the system should do — reject, sanitize, log, etc.]
+
+**Pass criteria**: [specific measurable condition]
+
+**Fail criteria**: [what constitutes a failure]
+
+---
+
+### ST-02: [Test Name]
+
+(repeat structure)
+
+---
+
+## Acceptance Tests
+
+### AT-01: [Test Name]
+
+**Summary**: [One sentence: what user-facing behavior is verified]
+
+**Traces to**: AC-01
+
+**Preconditions**:
+- [Precondition 1]
+- [Precondition 2]
+
+**Steps**:
+
+| Step | Action | Expected Result |
+|------|--------|-----------------|
+| 1 | [user action] | [expected outcome] |
+| 2 | [user action] | [expected outcome] |
+| 3 | [user action] | [expected outcome] |
+
+---
+
+### AT-02: [Test Name]
+
+(repeat structure)
+
+---
+
+## Test Data Management
+
+**Required test data**:
+
+| Data Set | Description | Source | Size |
+|----------|-------------|--------|------|
+| [name] | [what it contains] | [generated / fixture / copy of prod subset] | [approx size] |
+
+**Setup procedure**:
+1. [How to prepare the test environment]
+2. [How to load test data]
+
+**Teardown procedure**:
+1. [How to clean up after tests]
+2. [How to restore initial state]
+
+**Data isolation strategy**: [How tests are isolated from each other — separate DB, transactions, namespacing]
+```
+
+---
+
+## Guidance Notes
+
+- Every test MUST trace back to at least one acceptance criterion (AC-XX). If a test doesn't trace to any, question whether it's needed.
+- If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2").
+- Performance test targets should come from the NFR section in `architecture.md`.
+- Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component.
+- Not every component needs all 4 test types. A stateless utility component may only need integration tests.
@@ -0,0 +1,470 @@
+---
+name: refactor
+description: |
+  Structured refactoring workflow (6-phase method) with three execution modes:
+  - Full Refactoring: all 6 phases — baseline, discovery, analysis, safety net, execution, hardening
+  - Targeted Refactoring: skip discovery if docs exist, focus on a specific component/area
+  - Quick Assessment: phases 0-2 only, outputs a refactoring plan without execution
+  Supports project mode (_docs/ structure) and standalone mode (@file.md).
+  Trigger phrases:
+  - "refactor", "refactoring", "improve code"
+  - "analyze coupling", "decoupling", "technical debt"
+  - "refactoring assessment", "code quality improvement"
+disable-model-invocation: true
+---
+
+# Structured Refactoring (6-Phase Method)
+
+Transform existing codebases through a systematic refactoring workflow: capture baseline, document current state, research improvements, build safety net, execute changes, and harden.
+
+## Core Principles
+
+- **Preserve behavior first**: never refactor without a passing test suite
+- **Measure before and after**: every change must be justified by metrics
+- **Small incremental changes**: commit frequently, never break tests
+- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
+- **Ask, don't assume**: when scope or priorities are unclear, STOP and ask the user
+
+## Context Resolution
+
+Determine the operating mode based on invocation before any other logic runs.
+
+**Project mode** (no explicit input file provided):
+- PROBLEM_DIR: `_docs/00_problem/`
+- SOLUTION_DIR: `_docs/01_solution/`
+- COMPONENTS_DIR: `_docs/02_components/`
+- TESTS_DIR: `_docs/02_tests/`
+- REFACTOR_DIR: `_docs/04_refactoring/`
+- All existing guardrails apply.
+
+**Standalone mode** (explicit input file provided, e.g. `/refactor @some_component.md`):
+- INPUT_FILE: the provided file (treated as component/area description)
+- Derive `<topic>` from the input filename (without extension)
+- REFACTOR_DIR: `_standalone/<topic>/refactoring/`
+- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
+- `acceptance_criteria.md` is optional — warn if absent
+
+Announce the detected mode and resolved paths to the user before proceeding.
+
+## Mode Detection
+
+After context resolution, determine the execution mode:
+
+1. **User explicitly says** "quick assessment" or "just assess" → **Quick Assessment**
+2. **User explicitly says** "refactor [component/file/area]" with a specific target → **Targeted Refactoring**
+3. **Default** → **Full Refactoring**
+
+| Mode | Phases Executed | When to Use |
+|------|----------------|-------------|
+| **Full Refactoring** | 0 → 1 → 2 → 3 → 4 → 5 | Complete refactoring of a system or major area |
+| **Targeted Refactoring** | 0 → (skip 1 if docs exist) → 2 → 3 → 4 → 5 | Refactor a specific component; docs already exist |
+| **Quick Assessment** | 0 → 1 → 2 | Produce a refactoring roadmap without executing changes |
+
+Inform the user which mode was detected and confirm before proceeding.
+
+## Prerequisite Checks (BLOCKING)
+
+**Project mode:**
+1. PROBLEM_DIR exists with `problem.md` (or `problem_description.md`) — **STOP if missing**, ask user to create it
+2. If `acceptance_criteria.md` is missing: **warn** and ask whether to proceed
+3. Create REFACTOR_DIR if it does not exist
+4. If REFACTOR_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
+
+**Standalone mode:**
+1. INPUT_FILE exists and is non-empty — **STOP if missing**
+2. Warn if no `acceptance_criteria.md` provided
+3. Create REFACTOR_DIR if it does not exist
+
+## Artifact Management
+
+### Directory Structure
+
+```
+REFACTOR_DIR/
+├── baseline_metrics.md          (Phase 0)
+├── discovery/
+│   ├── components/
+│   │   └── [##]_[name].md       (Phase 1)
+│   ├── solution.md              (Phase 1)
+│   └── system_flows.md          (Phase 1)
+├── analysis/
+│   ├── research_findings.md     (Phase 2)
+│   └── refactoring_roadmap.md   (Phase 2)
+├── test_specs/
+│   └── [##]_[test_name].md      (Phase 3)
+├── coupling_analysis.md         (Phase 4)
+├── execution_log.md             (Phase 4)
+├── hardening/
+│   ├── technical_debt.md        (Phase 5)
+│   ├── performance.md           (Phase 5)
+│   └── security.md              (Phase 5)
+└── FINAL_report.md              (after all phases)
+```
+
+### Save Timing
+
+| Phase | Save immediately after | Filename |
+|-------|------------------------|----------|
+| Phase 0 | Baseline captured | `baseline_metrics.md` |
+| Phase 1 | Each component documented | `discovery/components/[##]_[name].md` |
+| Phase 1 | Solution synthesized | `discovery/solution.md`, `discovery/system_flows.md` |
+| Phase 2 | Research complete | `analysis/research_findings.md` |
+| Phase 2 | Roadmap produced | `analysis/refactoring_roadmap.md` |
+| Phase 3 | Test specs written | `test_specs/[##]_[test_name].md` |
+| Phase 4 | Coupling analyzed | `coupling_analysis.md` |
+| Phase 4 | Execution complete | `execution_log.md` |
+| Phase 5 | Each hardening track | `hardening/<track>.md` |
+| Final | All phases done | `FINAL_report.md` |
+
+### Resumability
+
+If REFACTOR_DIR already contains artifacts:
+
+1. List existing files and match to the save timing table
+2. Identify the last completed phase based on which artifacts exist
+3. Resume from the next incomplete phase
+4. Inform the user which phases are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all applicable phases. Update status as each phase completes.
+
+## Workflow
+
+### Phase 0: Context & Baseline
+
+**Role**: Software engineer preparing for refactoring
+**Goal**: Collect refactoring goals and capture baseline metrics
+**Constraints**: Measurement only — no code changes
+
+#### 0a. Collect Goals
+
+If PROBLEM_DIR files do not yet exist, help the user create them:
+
+1. `problem.md` — what the system currently does, what changes are needed, pain points
+2. `acceptance_criteria.md` — success criteria for the refactoring
+3. `security_approach.md` — security requirements (if applicable)
+
+Store in PROBLEM_DIR.
+
+#### 0b. Capture Baseline
+
+1. Read problem description and acceptance criteria
+2. Measure current system metrics using project-appropriate tools:
+
+| Metric Category | What to Capture |
+|----------------|-----------------|
+| **Coverage** | Overall, unit, integration, critical paths |
+| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
+| **Code Smells** | Total, critical, major |
+| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
+| **Dependencies** | Total count, outdated, security vulnerabilities |
+| **Build** | Build time, test execution time, deployment time |
+
+3. Create functionality inventory: all features/endpoints with status and coverage
+
+**Self-verification**:
+- [ ] All metric categories measured (or noted as N/A with reason)
+- [ ] Functionality inventory is complete
+- [ ] Measurements are reproducible
+
+**Save action**: Write `REFACTOR_DIR/baseline_metrics.md`
+
+**BLOCKING**: Present baseline summary to user. Do NOT proceed until user confirms.
+
+---
+
+### Phase 1: Discovery
+
+**Role**: Principal software architect
+**Goal**: Generate documentation from existing code and form solution description
+**Constraints**: Document what exists, not what should be. No code changes.
+
+**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.
+
+#### 1a. Document Components
+
+For each component in the codebase:
+
+1. Analyze project structure, directories, files
+2. Go file by file, analyze each method
+3. Analyze connections between components
+
+Write per component to `REFACTOR_DIR/discovery/components/[##]_[name].md`:
+- Purpose and architectural patterns
+- Mermaid diagrams for logic flows
+- API reference table (name, description, input, output)
+- Implementation details: algorithmic complexity, state management, dependencies
+- Caveats, edge cases, known limitations
+
+#### 1b. Synthesize Solution & Flows
+
+1. Review all generated component documentation
+2. Synthesize into a cohesive solution description
+3. Create flow diagrams showing component interactions
+
+Write:
+- `REFACTOR_DIR/discovery/solution.md` — product description, component overview, interaction diagram
+- `REFACTOR_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case
+
+Also copy to project standard locations if in project mode:
+- `SOLUTION_DIR/solution.md`
+- `COMPONENTS_DIR/system_flows.md`
+
+**Self-verification**:
+- [ ] Every component in the codebase is documented
+- [ ] Solution description covers all components
+- [ ] Flow diagrams cover all major use cases
+- [ ] Mermaid diagrams are syntactically correct
+
+**Save action**: Write discovery artifacts
+
+**BLOCKING**: Present discovery summary to user. Do NOT proceed until user confirms documentation accuracy.
+
+---
+
+### Phase 2: Analysis
+
+**Role**: Researcher and software architect
+**Goal**: Research improvements and produce a refactoring roadmap
+**Constraints**: Analysis only — no code changes
+
+#### 2a. Deep Research
+
+1. Analyze current implementation patterns
+2. Research modern approaches for similar systems
+3. Identify what could be done differently
+4. Suggest improvements based on state-of-the-art practices
+
+Write `REFACTOR_DIR/analysis/research_findings.md`:
+- Current state analysis: patterns used, strengths, weaknesses
+- Alternative approaches per component: current vs alternative, pros/cons, migration effort
+- Prioritized recommendations: quick wins + strategic improvements
+
+#### 2b. Solution Assessment
+
+1. Assess current implementation against acceptance criteria
+2. Identify weak points in codebase, map to specific code areas
+3. Perform gap analysis: acceptance criteria vs current state
+4. Prioritize changes by impact and effort
+
+Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
+- Weak points assessment: location, description, impact, proposed solution
+- Gap analysis: what's missing, what needs improvement
+- Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
+
+**Self-verification**:
+- [ ] All acceptance criteria are addressed in gap analysis
+- [ ] Recommendations are grounded in actual code, not abstract
+- [ ] Roadmap phases are prioritized by impact
+- [ ] Quick wins are identified separately
+
+**Save action**: Write analysis artifacts
+
+**BLOCKING**: Present refactoring roadmap to user. Do NOT proceed until user confirms.
+
+**Quick Assessment mode stops here.** Present final summary and write `FINAL_report.md` with phases 0-2 content.
+
+---
+
+### Phase 3: Safety Net
+
+**Role**: QA engineer and developer
+**Goal**: Design and implement tests that capture current behavior before refactoring
+**Constraints**: Tests must all pass on the current codebase before proceeding
+
+#### 3a. Design Test Specs
+
+Coverage requirements (must meet before refactoring):
+- Minimum overall coverage: 75%
+- Critical path coverage: 90%
+- All public APIs must have integration tests
+- All error handling paths must be tested
+
+For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
+- Integration tests: summary, current behavior, input data, expected result, max expected time
+- Acceptance tests: summary, preconditions, steps with expected results
+- Coverage analysis: current %, target %, uncovered critical paths
+
+#### 3b. Implement Tests
+
+1. Set up test environment and infrastructure if not exists
+2. Implement each test from specs
+3. Run tests, verify all pass on current codebase
+4. Document any discovered issues
+
+**Self-verification**:
+- [ ] Coverage requirements met (75% overall, 90% critical paths)
+- [ ] All tests pass on current codebase
+- [ ] All public APIs have integration tests
+- [ ] Test data fixtures are configured
+
+**Save action**: Write test specs; implemented tests go into the project's test folder
+
+**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 4. If tests fail, fix the tests (not the code) or ask user for guidance. Do NOT proceed to Phase 4 with failing tests.
+
+---
+
+### Phase 4: Execution
+
+**Role**: Software architect and developer
+**Goal**: Analyze coupling and execute decoupling changes
+**Constraints**: Small incremental changes; tests must stay green after every change
+
+#### 4a. Analyze Coupling
+
+1. Analyze coupling between components/modules
+2. Map dependencies (direct and transitive)
+3. Identify circular dependencies
+4. Form decoupling strategy
+
+Write `REFACTOR_DIR/coupling_analysis.md`:
+- Dependency graph (Mermaid)
+- Coupling metrics per component
+- Problem areas: components involved, coupling type, severity, impact
+- Decoupling strategy: priority order, proposed interfaces/abstractions, effort estimates
+
+**BLOCKING**: Present coupling analysis to user. Do NOT proceed until user confirms strategy.
+
+#### 4b. Execute Decoupling
+
+For each change in the decoupling strategy:
+
+1. Implement the change
+2. Run integration tests
+3. Fix any failures
+4. Commit with descriptive message
+
+Address code smells encountered: long methods, large classes, duplicate code, dead code, magic numbers.
+
+Write `REFACTOR_DIR/execution_log.md`:
+- Change description, files affected, test status per change
+- Before/after metrics comparison against baseline
+
+**Self-verification**:
+- [ ] All tests still pass after execution
+- [ ] No circular dependencies remain (or reduced per plan)
+- [ ] Code smells addressed
+- [ ] Metrics improved compared to baseline
+
+**Save action**: Write execution artifacts
+
+**BLOCKING**: Present execution summary to user. Do NOT proceed until user confirms.
+
+---
+
+### Phase 5: Hardening (Optional, Parallel Tracks)
+
+**Role**: Varies per track
+**Goal**: Address technical debt, performance, and security
+**Constraints**: Each track is optional; user picks which to run
+
+Present the three tracks and let user choose which to execute:
+
+#### Track A: Technical Debt
+
+**Role**: Technical debt analyst
+
+1. Identify and categorize debt items: design, code, test, documentation
+2. Assess each: location, description, impact, effort, interest (cost of not fixing)
+3. Prioritize: quick wins → strategic debt → tolerable debt
+4. Create actionable plan with prevention measures
+
+Write `REFACTOR_DIR/hardening/technical_debt.md`
+
+#### Track B: Performance Optimization
+
+**Role**: Performance engineer
+
+1. Profile current performance, identify bottlenecks
+2. For each bottleneck: location, symptom, root cause, impact
+3. Propose optimizations with expected improvement and risk
+4. Implement one at a time, benchmark after each change
+5. Verify tests still pass
+
+Write `REFACTOR_DIR/hardening/performance.md` with before/after benchmarks
+
+#### Track C: Security Review
+
+**Role**: Security engineer
+
+1. Review code against OWASP Top 10
+2. Verify security requirements from `security_approach.md` are met
+3. Check: authentication, authorization, input validation, output encoding, encryption, logging
+
+Write `REFACTOR_DIR/hardening/security.md`:
+- Vulnerability assessment: location, type, severity, exploit scenario, fix
+- Security controls review
+- Compliance check against `security_approach.md`
+- Recommendations: critical fixes, improvements, hardening
+
+**Self-verification** (per track):
+- [ ] All findings are grounded in actual code
+- [ ] Recommendations are actionable with effort estimates
+- [ ] All tests still pass after any changes
+
+**Save action**: Write hardening artifacts
+
+---
+
+## Final Report
+
+After all executed phases complete, write `REFACTOR_DIR/FINAL_report.md`:
+
+- Refactoring mode used and phases executed
+- Baseline metrics vs final metrics comparison
+- Changes made summary
+- Remaining items (deferred to future)
+- Lessons learned
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Unclear refactoring scope | **ASK user** |
+| Ambiguous acceptance criteria | **ASK user** |
+| Tests failing before refactoring | **ASK user** — fix tests or fix code? |
+| Coupling change risks breaking external contracts | **ASK user** |
+| Performance optimization vs readability trade-off | **ASK user** |
+| Missing baseline metrics (no test suite, no CI) | **WARN user**, suggest building safety net first |
+| Security vulnerability found during refactoring | **WARN user** immediately, don't defer |
+
+## Trigger Conditions
+
+When the user wants to:
+- Improve existing code structure or quality
+- Reduce technical debt or coupling
+- Prepare codebase for new features
+- Assess code health before major changes
+
+**Keywords**: "refactor", "refactoring", "improve code", "reduce coupling", "technical debt", "code quality", "decoupling"
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│           Structured Refactoring (6-Phase Method)              │
+├────────────────────────────────────────────────────────────────┤
+│ CONTEXT: Resolve mode (project vs standalone) + set paths      │
+│ MODE: Full / Targeted / Quick Assessment                       │
+│                                                                │
+│ 0. Context & Baseline  → baseline_metrics.md                   │
+│    [BLOCKING: user confirms baseline]                          │
+│ 1. Discovery           → discovery/ (components, solution)     │
+│    [BLOCKING: user confirms documentation]                     │
+│ 2. Analysis            → analysis/ (research, roadmap)         │
+│    [BLOCKING: user confirms roadmap]                           │
+│    ── Quick Assessment stops here ──                           │
+│ 3. Safety Net          → test_specs/ + implemented tests       │
+│    [GATE: all tests must pass]                                 │
+│ 4. Execution           → coupling_analysis, execution_log      │
+│    [BLOCKING: user confirms changes]                           │
+│ 5. Hardening           → hardening/ (debt, perf, security)     │
+│    [optional, user picks tracks]                               │
+│    ─────────────────────────────────────────────────           │
+│    FINAL_report.md                                             │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Preserve behavior · Measure before/after           │
+│             Small changes · Save immediately · Ask don't assume│
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,37 @@
+# Solution Draft
+
+## Product Solution Description
+[Short description of the proposed solution. Brief component interaction diagram.]
+
+## Existing/Competitor Solutions Analysis
+[Analysis of existing solutions for similar problems, if any.]
+
+## Architecture
+
+[Architecture solution that meets restrictions and acceptance criteria.]
+
+### Component: [Component Name]
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
+| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
+
+[Repeat per component]
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- [Test 1]
+- [Test 2]
+
+### Non-Functional Tests
+- [Performance test 1]
+- [Security test 1]
+
+## References
+[All cited source links]
+
+## Related Artifacts
+- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
+- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
@@ -0,0 +1,40 @@
+# Solution Draft
+
+## Assessment Findings
+
+| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
+|------------------------|----------------------------------------------|-------------|
+| [old] | [weak point] | [new] |
+
+## Product Solution Description
+[Short description. Brief component interaction diagram. Written as if from scratch — no "updated" markers.]
+
+## Architecture
+
+[Architecture solution that meets restrictions and acceptance criteria.]
+
+### Component: [Component Name]
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
+| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
+
+[Repeat per component]
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- [Test 1]
+- [Test 2]
+
+### Non-Functional Tests
+- [Performance test 1]
+- [Security test 1]
+
+## References
+[All cited source links]
+
+## Related Artifacts
+- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
+- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
@@ -0,0 +1,311 @@
+---
+name: security-testing
+description: "Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices."
+category: specialized-testing
+priority: critical
+tokenEstimate: 1200
+agents: [qe-security-scanner, qe-api-contract-validator, qe-quality-analyzer]
+implementation_status: optimized
+optimization_version: 1.0
+last_optimized: 2025-12-02
+dependencies: []
+quick_reference_card: true
+tags: [security, owasp, sast, dast, vulnerabilities, auth, injection]
+trust_tier: 3
+validation:
+  schema_path: schemas/output.json
+  validator_path: scripts/validate-config.json
+  eval_path: evals/security-testing.yaml
+---
+
+# Security Testing
+
+<default_to_action>
+When testing security or conducting audits:
+1. TEST OWASP Top 10 vulnerabilities systematically
+2. VALIDATE authentication and authorization on every endpoint
+3. SCAN dependencies for known vulnerabilities (npm audit)
+4. CHECK for injection attacks (SQL, XSS, command)
+5. VERIFY secrets aren't exposed in code/logs
+
+**Quick Security Checks:**
+- Access control → Test horizontal/vertical privilege escalation
+- Crypto → Verify password hashing, HTTPS, no sensitive data exposed
+- Injection → Test SQL injection, XSS, command injection
+- Auth → Test weak passwords, session fixation, MFA enforcement
+- Config → Check error messages don't leak info
+
+**Critical Success Factors:**
+- Think like an attacker, build like a defender
+- Security is built in, not added at the end
+- Test continuously in CI/CD, not just before release
+</default_to_action>
+
+## Quick Reference Card
+
+### When to Use
+- Security audits and penetration testing
+- Testing authentication/authorization
+- Validating input sanitization
+- Reviewing security configuration
+
+### OWASP Top 10 (2021)
+| # | Vulnerability | Key Test |
+|---|---------------|----------|
+| 1 | Broken Access Control | User A accessing User B's data |
+| 2 | Cryptographic Failures | Plaintext passwords, HTTP |
+| 3 | Injection | SQL/XSS/command injection |
+| 4 | Insecure Design | Rate limiting, session timeout |
+| 5 | Security Misconfiguration | Verbose errors, exposed /admin |
+| 6 | Vulnerable Components | npm audit, outdated packages |
+| 7 | Auth Failures | Weak passwords, no MFA |
+| 8 | Integrity Failures | Unsigned updates, malware |
+| 9 | Logging Failures | No audit trail for breaches |
+| 10 | SSRF | Server fetching internal URLs |
+
+### Tools
+| Type | Tool | Purpose |
+|------|------|---------|
+| SAST | SonarQube, Semgrep | Static code analysis |
+| DAST | OWASP ZAP, Burp | Dynamic scanning |
+| Deps | npm audit, Snyk | Dependency vulnerabilities |
+| Secrets | git-secrets, TruffleHog | Secret scanning |
+
+### Agent Coordination
+- `qe-security-scanner`: Multi-layer SAST/DAST scanning
+- `qe-api-contract-validator`: API security testing
+- `qe-quality-analyzer`: Security code review
+
+---
+
+## Key Vulnerability Tests
+
+### 1. Broken Access Control
+```javascript
+// Horizontal escalation - User A accessing User B's data
+test('user cannot access another user\'s order', async () => {
+  const userAToken = await login('userA');
+  const userBOrder = await createOrder('userB');
+
+  const response = await api.get(`/orders/${userBOrder.id}`, {
+    headers: { Authorization: `Bearer ${userAToken}` }
+  });
+  expect(response.status).toBe(403);
+});
+
+// Vertical escalation - Regular user accessing admin
+test('regular user cannot access admin', async () => {
+  const userToken = await login('regularUser');
+  expect((await api.get('/admin/users', {
+    headers: { Authorization: `Bearer ${userToken}` }
+  })).status).toBe(403);
+});
+```
+
+### 2. Injection Attacks
+```javascript
+// SQL Injection
+test('prevents SQL injection', async () => {
+  const malicious = "' OR '1'='1";
+  const response = await api.get(`/products?search=${malicious}`);
+  expect(response.body.length).toBeLessThan(100); // Not all products
+});
+
+// XSS
+test('sanitizes HTML output', async () => {
+  const xss = '<script>alert("XSS")</script>';
+  await api.post('/comments', { text: xss });
+
+  const html = (await api.get('/comments')).body;
+  expect(html).toContain('&lt;script&gt;');
+  expect(html).not.toContain('<script>');
+});
+```
+
+### 3. Cryptographic Failures
+```javascript
+test('passwords are hashed', async () => {
+  await db.users.create({ email: 'test@example.com', password: 'MyPassword123' });
+  const user = await db.users.findByEmail('test@example.com');
+
+  expect(user.password).not.toBe('MyPassword123');
+  expect(user.password).toMatch(/^\$2[aby]\$\d{2}\$/); // bcrypt
+});
+
+test('no sensitive data in API response', async () => {
+  const response = await api.get('/users/me');
+  expect(response.body).not.toHaveProperty('password');
+  expect(response.body).not.toHaveProperty('ssn');
+});
+```
+
+### 4. Security Misconfiguration
+```javascript
+test('errors don\'t leak sensitive info', async () => {
+  const response = await api.post('/login', { email: 'nonexistent@test.com', password: 'wrong' });
+  expect(response.body.error).toBe('Invalid credentials'); // Generic message
+});
+
+test('sensitive endpoints not exposed', async () => {
+  const endpoints = ['/debug', '/.env', '/.git', '/admin'];
+  for (let ep of endpoints) {
+    expect((await fetch(`https://example.com${ep}`)).status).not.toBe(200);
+  }
+});
+```
+
+### 5. Rate Limiting
+```javascript
+test('rate limiting prevents brute force', async () => {
+  const responses = [];
+  for (let i = 0; i < 20; i++) {
+    responses.push(await api.post('/login', { email: 'test@example.com', password: 'wrong' }));
+  }
+  expect(responses.filter(r => r.status === 429).length).toBeGreaterThan(0);
+});
+```
+
+---
+
+## Security Checklist
+
+### Authentication
+- [ ] Strong password requirements (12+ chars)
+- [ ] Password hashing (bcrypt, scrypt, Argon2)
+- [ ] MFA for sensitive operations
+- [ ] Account lockout after failed attempts
+- [ ] Session ID changes after login
+- [ ] Session timeout
+
+### Authorization
+- [ ] Check authorization on every request
+- [ ] Least privilege principle
+- [ ] No horizontal escalation
+- [ ] No vertical escalation
+
+### Data Protection
+- [ ] HTTPS everywhere
+- [ ] Encrypted at rest
+- [ ] Secrets not in code/logs
+- [ ] PII compliance (GDPR)
+
+### Input Validation
+- [ ] Server-side validation
+- [ ] Parameterized queries (no SQL injection)
+- [ ] Output encoding (no XSS)
+- [ ] Rate limiting
+
+---
+
+## CI/CD Integration
+
+```yaml
+# GitHub Actions
+security-checks:
+  steps:
+    - name: Dependency audit
+      run: npm audit --audit-level=high
+
+    - name: SAST scan
+      run: npm run sast
+
+    - name: Secret scan
+      uses: trufflesecurity/trufflehog@main
+
+    - name: DAST scan
+      if: github.ref == 'refs/heads/main'
+      run: docker run owasp/zap2docker-stable zap-baseline.py -t https://staging.example.com
+```
+
+**Pre-commit hooks:**
+```bash
+#!/bin/sh
+git-secrets --scan
+npm run lint:security
+```
+
+---
+
+## Agent-Assisted Security Testing
+
+```typescript
+// Comprehensive multi-layer scan
+await Task("Security Scan", {
+  target: 'src/',
+  layers: { sast: true, dast: true, dependencies: true, secrets: true },
+  severity: ['critical', 'high', 'medium']
+}, "qe-security-scanner");
+
+// OWASP Top 10 testing
+await Task("OWASP Scan", {
+  categories: ['broken-access-control', 'injection', 'cryptographic-failures'],
+  depth: 'comprehensive'
+}, "qe-security-scanner");
+
+// Validate fix
+await Task("Validate Fix", {
+  vulnerability: 'CVE-2024-12345',
+  expectedResolution: 'upgrade package to v2.0.0',
+  retestAfterFix: true
+}, "qe-security-scanner");
+```
+
+---
+
+## Agent Coordination Hints
+
+### Memory Namespace
+```
+aqe/security/
+├── scans/*           - Scan results
+├── vulnerabilities/* - Found vulnerabilities
+├── fixes/*           - Remediation tracking
+└── compliance/*      - Compliance status
+```
+
+### Fleet Coordination
+```typescript
+const securityFleet = await FleetManager.coordinate({
+  strategy: 'security-testing',
+  agents: [
+    'qe-security-scanner',
+    'qe-api-contract-validator',
+    'qe-quality-analyzer',
+    'qe-deployment-readiness'
+  ],
+  topology: 'parallel'
+});
+```
+
+---
+
+## Common Mistakes
+
+### ❌ Security by Obscurity
+Hiding admin at `/super-secret-admin` → **Use proper auth**
+
+### ❌ Client-Side Validation Only
+JavaScript validation can be bypassed → **Always validate server-side**
+
+### ❌ Trusting User Input
+Assuming input is safe → **Sanitize, validate, escape all input**
+
+### ❌ Hardcoded Secrets
+API keys in code → **Environment variables, secret management**
+
+---
+
+## Related Skills
+- [agentic-quality-engineering](../agentic-quality-engineering/) - Security with agents
+- [api-testing-patterns](../api-testing-patterns/) - API security testing
+- [compliance-testing](../compliance-testing/) - GDPR, HIPAA, SOC2
+
+---
+
+## Remember
+
+**Think like an attacker:** What would you try to break? Test that.
+**Build like a defender:** Assume input is malicious until proven otherwise.
+**Test continuously:** Security testing is ongoing, not one-time.
+
+**With Agents:** Agents automate vulnerability scanning, track remediation, and validate fixes. Use agents to maintain security posture at scale.
@@ -0,0 +1,789 @@
+# =============================================================================
+# AQE Skill Evaluation Test Suite: Security Testing v1.0.0
+# =============================================================================
+#
+# Comprehensive evaluation suite for the security-testing skill per ADR-056.
+# Tests OWASP Top 10 2021 detection, severity classification, remediation
+# quality, and cross-model consistency.
+#
+# Schema: .claude/skills/.validation/schemas/skill-eval.schema.json
+# Validator: .claude/skills/security-testing/scripts/validate-config.json
+#
+# Coverage:
+# - OWASP A01:2021 - Broken Access Control
+# - OWASP A02:2021 - Cryptographic Failures
+# - OWASP A03:2021 - Injection (SQL, XSS, Command)
+# - OWASP A07:2021 - Identification and Authentication Failures
+# - Negative tests (no false positives on secure code)
+#
+# =============================================================================
+
+skill: security-testing
+version: 1.0.0
+description: >
+  Comprehensive evaluation suite for the security-testing skill.
+  Tests OWASP Top 10 2021 detection capabilities, CWE classification accuracy,
+  CVSS scoring, severity classification, and remediation quality.
+  Supports multi-model testing and integrates with ReasoningBank for
+  continuous improvement.
+
+# =============================================================================
+# Multi-Model Configuration
+# =============================================================================
+
+models_to_test:
+  - claude-3.5-sonnet    # Primary model (high accuracy expected)
+  - claude-3-haiku       # Fast model (minimum quality threshold)
+  - gpt-4o               # Cross-vendor validation
+
+# =============================================================================
+# MCP Integration Configuration
+# =============================================================================
+
+mcp_integration:
+  enabled: true
+  namespace: skill-validation
+
+  # Query existing security patterns before running evals
+  query_patterns: true
+
+  # Track each test outcome for learning feedback loop
+  track_outcomes: true
+
+  # Store successful patterns after evals complete
+  store_patterns: true
+
+  # Share learning with fleet coordinator agents
+  share_learning: true
+
+  # Update quality gate with validation metrics
+  update_quality_gate: true
+
+  # Target agents for learning distribution
+  target_agents:
+    - qe-learning-coordinator
+    - qe-queen-coordinator
+    - qe-security-scanner
+    - qe-security-auditor
+
+# =============================================================================
+# ReasoningBank Learning Configuration
+# =============================================================================
+
+learning:
+  store_success_patterns: true
+  store_failure_patterns: true
+  pattern_ttl_days: 90
+  min_confidence_to_store: 0.7
+  cross_model_comparison: true
+
+# =============================================================================
+# Result Format Configuration
+# =============================================================================
+
+result_format:
+  json_output: true
+  markdown_report: true
+  include_raw_output: false
+  include_timing: true
+  include_token_usage: true
+
+# =============================================================================
+# Environment Setup
+# =============================================================================
+
+setup:
+  required_tools:
+    - jq       # JSON parsing (required)
+    - npm      # Dependency audit (optional but recommended)
+
+  environment_variables:
+    SECURITY_SCAN_DEPTH: "deep"
+    OWASP_ENABLED: "true"
+    SEVERITY_THRESHOLD: "medium"
+
+  fixtures:
+    - name: vulnerable_express_app
+      path: fixtures/vulnerable-express-app.js
+      content: |
+        const express = require('express');
+        const app = express();
+
+        // SQL Injection vulnerability
+        app.get('/user', (req, res) => {
+          const query = `SELECT * FROM users WHERE id = ${req.params.id}`;
+          db.query(query);
+        });
+
+        // XSS vulnerability
+        app.get('/profile', (req, res) => {
+          res.send(`<h1>Hello ${req.query.name}</h1>`);
+        });
+
+        // Path Traversal vulnerability
+        app.get('/file', (req, res) => {
+          const path = './uploads/' + req.query.filename;
+          res.sendFile(path);
+        });
+
+# =============================================================================
+# TEST CASES
+# =============================================================================
+
+test_cases:
+  # ---------------------------------------------------------------------------
+  # CATEGORY: SQL Injection (OWASP A03:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc001_sql_injection_string_concat
+    description: "Detect SQL injection via string concatenation in Node.js"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        const express = require('express');
+        const mysql = require('mysql');
+        const app = express();
+
+        app.get('/api/users', (req, res) => {
+          const userId = req.params.id;
+          const query = `SELECT * FROM users WHERE id = ${userId}`;
+          db.query(query, (err, results) => {
+            res.json(results);
+          });
+        });
+      context:
+        language: javascript
+        framework: express
+        environment: production
+
+    expected_output:
+      must_contain:
+        - "SQL injection"
+        - "parameterized"
+      must_not_contain:
+        - "no vulnerabilities"
+        - "secure"
+      must_match_regex:
+        - "CWE-89|CWE-564"
+        - "A03:20[21][0-9]"
+      severity_classification: critical
+      finding_count:
+        min: 1
+        max: 3
+      recommendation_count:
+        min: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+      reasoning_quality_min: 0.7
+      grading_rubric:
+        completeness: 0.3
+        accuracy: 0.5
+        actionability: 0.2
+
+    timeout_ms: 30000
+
+  - id: tc002_sql_injection_parameterized_safe
+    description: "Verify parameterized queries are NOT flagged as vulnerable"
+    category: injection
+    priority: high
+
+    input:
+      code: |
+        app.get('/api/users', (req, res) => {
+          const userId = parseInt(req.params.id, 10);
+          db.query('SELECT * FROM users WHERE id = ?', [userId], (err, results) => {
+            res.json(results);
+          });
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "parameterized"
+        - "secure"
+      must_not_contain:
+        - "SQL injection"
+        - "critical"
+        - "vulnerable"
+      severity_classification: info
+      finding_count:
+        max: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+      allow_partial: true
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Cross-Site Scripting (OWASP A03:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc003_xss_reflected_html_output
+    description: "Detect reflected XSS in unescaped HTML output"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        app.get('/profile', (req, res) => {
+          const name = req.query.name;
+          res.send(`
+            <html>
+              <body>
+                <h1>Welcome, ${name}!</h1>
+                <p>Your profile has been loaded.</p>
+              </body>
+            </html>
+          `);
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "XSS"
+        - "cross-site scripting"
+        - "sanitize"
+        - "escape"
+      must_match_regex:
+        - "CWE-79"
+      severity_classification: high
+      finding_count:
+        min: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+      reasoning_quality_min: 0.75
+
+  - id: tc004_xss_dom_based_innerhtml
+    description: "Detect DOM-based XSS via innerHTML assignment"
+    category: injection
+    priority: high
+
+    input:
+      code: |
+        // Client-side JavaScript
+        const params = new URLSearchParams(window.location.search);
+        const message = params.get('msg');
+        document.getElementById('output').innerHTML = message;
+      context:
+        language: javascript
+        framework: vanilla
+        environment: production
+
+    expected_output:
+      must_contain:
+        - "DOM"
+        - "XSS"
+        - "innerHTML"
+        - "textContent"
+      must_match_regex:
+        - "CWE-79"
+      severity_classification: high
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Authentication Failures (OWASP A07:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc005_hardcoded_credentials
+    description: "Detect hardcoded credentials and API keys"
+    category: authentication
+    priority: critical
+
+    input:
+      code: |
+        const ADMIN_PASSWORD = 'admin123';
+        const API_KEY = 'sk-1234567890abcdef';
+        const DATABASE_URL = 'postgres://admin:password123@localhost/db';
+
+        app.post('/login', (req, res) => {
+          if (req.body.password === ADMIN_PASSWORD) {
+            req.session.isAdmin = true;
+            res.send('Login successful');
+          }
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "hardcoded"
+        - "credentials"
+        - "secret"
+        - "environment variable"
+      must_match_regex:
+        - "CWE-798|CWE-259"
+      severity_classification: critical
+      finding_count:
+        min: 2
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+      reasoning_quality_min: 0.8
+
+  - id: tc006_weak_password_hashing
+    description: "Detect weak password hashing algorithms (MD5, SHA1)"
+    category: authentication
+    priority: high
+
+    input:
+      code: |
+        const crypto = require('crypto');
+
+        function hashPassword(password) {
+          return crypto.createHash('md5').update(password).digest('hex');
+        }
+
+        function verifyPassword(password, hash) {
+          return hashPassword(password) === hash;
+        }
+      context:
+        language: javascript
+        framework: nodejs
+
+    expected_output:
+      must_contain:
+        - "MD5"
+        - "weak"
+        - "bcrypt"
+        - "argon2"
+      must_match_regex:
+        - "CWE-327|CWE-328|CWE-916"
+      severity_classification: high
+      finding_count:
+        min: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Broken Access Control (OWASP A01:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc007_idor_missing_authorization
+    description: "Detect IDOR vulnerability with missing authorization check"
+    category: authorization
+    priority: critical
+
+    input:
+      code: |
+        app.get('/api/users/:id/profile', (req, res) => {
+          // No authorization check - any user can access any profile
+          const userId = req.params.id;
+          db.query('SELECT * FROM profiles WHERE user_id = ?', [userId])
+            .then(profile => res.json(profile));
+        });
+
+        app.delete('/api/users/:id', (req, res) => {
+          // No check if requesting user owns this account
+          db.query('DELETE FROM users WHERE id = ?', [req.params.id]);
+          res.send('User deleted');
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "authorization"
+        - "access control"
+        - "IDOR"
+        - "ownership"
+      must_match_regex:
+        - "CWE-639|CWE-284|CWE-862"
+        - "A01:2021"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Cryptographic Failures (OWASP A02:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc008_weak_encryption_des
+    description: "Detect use of weak encryption algorithms (DES, RC4)"
+    category: cryptography
+    priority: high
+
+    input:
+      code: |
+        const crypto = require('crypto');
+
+        function encryptData(data, key) {
+          const cipher = crypto.createCipher('des', key);
+          return cipher.update(data, 'utf8', 'hex') + cipher.final('hex');
+        }
+
+        function decryptData(data, key) {
+          const decipher = crypto.createDecipher('des', key);
+          return decipher.update(data, 'hex', 'utf8') + decipher.final('utf8');
+        }
+      context:
+        language: javascript
+        framework: nodejs
+
+    expected_output:
+      must_contain:
+        - "DES"
+        - "weak"
+        - "deprecated"
+        - "AES"
+      must_match_regex:
+        - "CWE-327|CWE-328"
+        - "A02:2021"
+      severity_classification: high
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  - id: tc009_plaintext_password_storage
+    description: "Detect plaintext password storage"
+    category: cryptography
+    priority: critical
+
+    input:
+      code: |
+        class User {
+          constructor(email, password) {
+            this.email = email;
+            this.password = password;  // Stored in plaintext!
+          }
+
+          save() {
+            db.query('INSERT INTO users (email, password) VALUES (?, ?)',
+                     [this.email, this.password]);
+          }
+        }
+      context:
+        language: javascript
+        framework: nodejs
+
+    expected_output:
+      must_contain:
+        - "plaintext"
+        - "password"
+        - "hash"
+        - "bcrypt"
+      must_match_regex:
+        - "CWE-256|CWE-312"
+        - "A02:2021"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.8
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Path Traversal (Related to A01:2021)
+  # ---------------------------------------------------------------------------
+
+  - id: tc010_path_traversal_file_access
+    description: "Detect path traversal vulnerability in file access"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        const fs = require('fs');
+
+        app.get('/download', (req, res) => {
+          const filename = req.query.file;
+          const filepath = './uploads/' + filename;
+          res.sendFile(filepath);
+        });
+
+        app.get('/read', (req, res) => {
+          const content = fs.readFileSync('./data/' + req.params.name);
+          res.send(content);
+        });
+      context:
+        language: javascript
+        framework: express
+
+    expected_output:
+      must_contain:
+        - "path traversal"
+        - "directory traversal"
+        - "../"
+        - "sanitize"
+      must_match_regex:
+        - "CWE-22|CWE-23"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Negative Tests (No False Positives)
+  # ---------------------------------------------------------------------------
+
+  - id: tc011_secure_code_no_false_positives
+    description: "Verify secure code is NOT flagged as vulnerable"
+    category: negative
+    priority: critical
+
+    input:
+      code: |
+        const express = require('express');
+        const helmet = require('helmet');
+        const rateLimit = require('express-rate-limit');
+        const bcrypt = require('bcrypt');
+        const validator = require('validator');
+
+        const app = express();
+        app.use(helmet());
+        app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 }));
+
+        app.post('/api/users', async (req, res) => {
+          const { email, password } = req.body;
+
+          // Input validation
+          if (!validator.isEmail(email)) {
+            return res.status(400).json({ error: 'Invalid email' });
+          }
+
+          // Secure password hashing
+          const hashedPassword = await bcrypt.hash(password, 12);
+
+          // Parameterized query
+          await db.query(
+            'INSERT INTO users (email, password) VALUES ($1, $2)',
+            [email, hashedPassword]
+          );
+
+          res.status(201).json({ message: 'User created' });
+        });
+      context:
+        language: javascript
+        framework: express
+        environment: production
+
+    expected_output:
+      must_contain:
+        - "secure"
+        - "best practice"
+      must_not_contain:
+        - "SQL injection"
+        - "XSS"
+        - "critical vulnerability"
+        - "high severity"
+      finding_count:
+        max: 2  # Allow informational findings only
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.6
+      allow_partial: true
+
+  - id: tc012_secure_auth_implementation
+    description: "Verify secure authentication is recognized as safe"
+    category: negative
+    priority: high
+
+    input:
+      code: |
+        const bcrypt = require('bcrypt');
+        const jwt = require('jsonwebtoken');
+
+        async function login(email, password) {
+          const user = await User.findByEmail(email);
+          if (!user) {
+            return { error: 'Invalid credentials' };
+          }
+
+          const match = await bcrypt.compare(password, user.passwordHash);
+          if (!match) {
+            return { error: 'Invalid credentials' };
+          }
+
+          const token = jwt.sign(
+            { userId: user.id },
+            process.env.JWT_SECRET,
+            { expiresIn: '1h' }
+          );
+
+          return { token };
+        }
+      context:
+        language: javascript
+        framework: nodejs
+
+    expected_output:
+      must_contain:
+        - "bcrypt"
+        - "jwt"
+        - "secure"
+      must_not_contain:
+        - "vulnerable"
+        - "critical"
+        - "hardcoded"
+      severity_classification: info
+
+    validation:
+      schema_check: true
+      allow_partial: true
+
+  # ---------------------------------------------------------------------------
+  # CATEGORY: Python Security (Multi-language Support)
+  # ---------------------------------------------------------------------------
+
+  - id: tc013_python_sql_injection
+    description: "Detect SQL injection in Python Flask application"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        from flask import Flask, request
+        import sqlite3
+
+        app = Flask(__name__)
+
+        @app.route('/user')
+        def get_user():
+            user_id = request.args.get('id')
+            conn = sqlite3.connect('users.db')
+            cursor = conn.cursor()
+            cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
+            return str(cursor.fetchone())
+      context:
+        language: python
+        framework: flask
+
+    expected_output:
+      must_contain:
+        - "SQL injection"
+        - "parameterized"
+        - "f-string"
+      must_match_regex:
+        - "CWE-89"
+      severity_classification: critical
+      finding_count:
+        min: 1
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  - id: tc014_python_ssti_jinja
+    description: "Detect Server-Side Template Injection in Jinja2"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        from flask import Flask, request, render_template_string
+
+        app = Flask(__name__)
+
+        @app.route('/render')
+        def render():
+            template = request.args.get('template')
+            return render_template_string(template)
+      context:
+        language: python
+        framework: flask
+
+    expected_output:
+      must_contain:
+        - "SSTI"
+        - "template injection"
+        - "render_template_string"
+        - "Jinja2"
+      must_match_regex:
+        - "CWE-94|CWE-1336"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+  - id: tc015_python_pickle_deserialization
+    description: "Detect insecure deserialization with pickle"
+    category: injection
+    priority: critical
+
+    input:
+      code: |
+        import pickle
+        from flask import Flask, request
+
+        app = Flask(__name__)
+
+        @app.route('/load')
+        def load_data():
+            data = request.get_data()
+            obj = pickle.loads(data)
+            return str(obj)
+      context:
+        language: python
+        framework: flask
+
+    expected_output:
+      must_contain:
+        - "pickle"
+        - "deserialization"
+        - "untrusted"
+        - "RCE"
+      must_match_regex:
+        - "CWE-502"
+        - "A08:2021"
+      severity_classification: critical
+
+    validation:
+      schema_check: true
+      keyword_match_threshold: 0.7
+
+# =============================================================================
+# SUCCESS CRITERIA
+# =============================================================================
+
+success_criteria:
+  # Overall pass rate (90% of tests must pass)
+  pass_rate: 0.9
+
+  # Critical tests must ALL pass (100%)
+  critical_pass_rate: 1.0
+
+  # Average reasoning quality score
+  avg_reasoning_quality: 0.75
+
+  # Maximum suite execution time (5 minutes)
+  max_execution_time_ms: 300000
+
+  # Maximum variance between model results (15%)
+  cross_model_variance: 0.15
+
+# =============================================================================
+# METADATA
+# =============================================================================
+
+metadata:
+  author: "qe-security-auditor"
+  created: "2026-02-02"
+  last_updated: "2026-02-02"
+  coverage_target: >
+    OWASP Top 10 2021: A01 (Broken Access Control), A02 (Cryptographic Failures),
+    A03 (Injection - SQL, XSS, SSTI, Command), A07 (Authentication Failures),
+    A08 (Software Integrity - Deserialization). Covers JavaScript/Node.js
+    Express apps and Python Flask apps. 15 test cases with 90% pass rate
+    requirement and 100% critical pass rate.
@@ -0,0 +1,879 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://agentic-qe.dev/schemas/security-testing-output.json",
+  "title": "AQE Security Testing Skill Output Schema",
+  "description": "Schema for security-testing skill output validation. Extends the base skill-output template with OWASP Top 10 categories, CWE identifiers, and CVSS scoring.",
+  "type": "object",
+  "required": ["skillName", "version", "timestamp", "status", "trustTier", "output"],
+  "properties": {
+    "skillName": {
+      "type": "string",
+      "const": "security-testing",
+      "description": "Must be 'security-testing'"
+    },
+    "version": {
+      "type": "string",
+      "pattern": "^\\d+\\.\\d+\\.\\d+(-[a-zA-Z0-9]+)?$",
+      "description": "Semantic version of the skill"
+    },
+    "timestamp": {
+      "type": "string",
+      "format": "date-time",
+      "description": "ISO 8601 timestamp of output generation"
+    },
+    "status": {
+      "type": "string",
+      "enum": ["success", "partial", "failed", "skipped"],
+      "description": "Overall execution status"
+    },
+    "trustTier": {
+      "type": "integer",
+      "const": 3,
+      "description": "Trust tier 3 indicates full validation with eval suite"
+    },
+    "output": {
+      "type": "object",
+      "required": ["summary", "findings", "owaspCategories"],
+      "properties": {
+        "summary": {
+          "type": "string",
+          "minLength": 50,
+          "maxLength": 2000,
+          "description": "Human-readable summary of security findings"
+        },
+        "score": {
+          "$ref": "#/$defs/securityScore",
+          "description": "Overall security score"
+        },
+        "findings": {
+          "type": "array",
+          "items": {
+            "$ref": "#/$defs/securityFinding"
+          },
+          "maxItems": 500,
+          "description": "List of security vulnerabilities discovered"
+        },
+        "recommendations": {
+          "type": "array",
+          "items": {
+            "$ref": "#/$defs/securityRecommendation"
+          },
+          "maxItems": 100,
+          "description": "Prioritized remediation recommendations with code examples"
+        },
+        "metrics": {
+          "$ref": "#/$defs/securityMetrics",
+          "description": "Security scan metrics and statistics"
+        },
+        "owaspCategories": {
+          "$ref": "#/$defs/owaspCategoryBreakdown",
+          "description": "OWASP Top 10 2021 category breakdown"
+        },
+        "artifacts": {
+          "type": "array",
+          "items": {
+            "$ref": "#/$defs/artifact"
+          },
+          "maxItems": 50,
+          "description": "Generated security reports and scan artifacts"
+        },
+        "timeline": {
+          "type": "array",
+          "items": {
+            "$ref": "#/$defs/timelineEvent"
+          },
+          "description": "Scan execution timeline"
+        },
+        "scanConfiguration": {
+          "$ref": "#/$defs/scanConfiguration",
+          "description": "Configuration used for the security scan"
+        }
+      }
+    },
+    "metadata": {
+      "$ref": "#/$defs/metadata"
+    },
+    "validation": {
+      "$ref": "#/$defs/validationResult"
+    },
+    "learning": {
+      "$ref": "#/$defs/learningData"
+    }
+  },
+  "$defs": {
+    "securityScore": {
+      "type": "object",
+      "required": ["value", "max"],
+      "properties": {
+        "value": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 100,
+          "description": "Security score (0=critical issues, 100=no issues)"
+        },
+        "max": {
+          "type": "number",
+          "const": 100,
+          "description": "Maximum score is always 100"
+        },
+        "grade": {
+          "type": "string",
+          "pattern": "^[A-F][+-]?$",
+          "description": "Letter grade: A (90-100), B (80-89), C (70-79), D (60-69), F (<60)"
+        },
+        "trend": {
+          "type": "string",
+          "enum": ["improving", "stable", "declining", "unknown"],
+          "description": "Trend compared to previous scans"
+        },
+        "riskLevel": {
+          "type": "string",
+          "enum": ["critical", "high", "medium", "low", "minimal"],
+          "description": "Overall risk level assessment"
+        }
+      }
+    },
+    "securityFinding": {
+      "type": "object",
+      "required": ["id", "title", "severity", "owasp"],
+      "properties": {
+        "id": {
+          "type": "string",
+          "pattern": "^SEC-\\d{3,6}$",
+          "description": "Unique finding identifier (e.g., SEC-001)"
+        },
+        "title": {
+          "type": "string",
+          "minLength": 10,
+          "maxLength": 200,
+          "description": "Finding title describing the vulnerability"
+        },
+        "description": {
+          "type": "string",
+          "maxLength": 2000,
+          "description": "Detailed description of the vulnerability"
+        },
+        "severity": {
+          "type": "string",
+          "enum": ["critical", "high", "medium", "low", "info"],
+          "description": "Severity: critical (CVSS 9.0-10.0), high (7.0-8.9), medium (4.0-6.9), low (0.1-3.9), info (0)"
+        },
+        "owasp": {
+          "type": "string",
+          "pattern": "^A(0[1-9]|10):20(21|25)$",
+          "description": "OWASP Top 10 category (e.g., A01:2021, A03:2025)"
+        },
+        "owaspCategory": {
+          "type": "string",
+          "enum": [
+            "A01:2021-Broken-Access-Control",
+            "A02:2021-Cryptographic-Failures",
+            "A03:2021-Injection",
+            "A04:2021-Insecure-Design",
+            "A05:2021-Security-Misconfiguration",
+            "A06:2021-Vulnerable-Components",
+            "A07:2021-Identification-Authentication-Failures",
+            "A08:2021-Software-Data-Integrity-Failures",
+            "A09:2021-Security-Logging-Monitoring-Failures",
+            "A10:2021-Server-Side-Request-Forgery"
+          ],
+          "description": "Full OWASP category name"
+        },
+        "cwe": {
+          "type": "string",
+          "pattern": "^CWE-\\d{1,4}$",
+          "description": "CWE identifier (e.g., CWE-79 for XSS, CWE-89 for SQLi)"
+        },
+        "cvss": {
+          "type": "object",
+          "properties": {
+            "score": {
+              "type": "number",
+              "minimum": 0,
+              "maximum": 10,
+              "description": "CVSS v3.1 base score"
+            },
+            "vector": {
+              "type": "string",
+              "pattern": "^CVSS:3\\.1/AV:[NALP]/AC:[LH]/PR:[NLH]/UI:[NR]/S:[UC]/C:[NLH]/I:[NLH]/A:[NLH]$",
+              "description": "CVSS v3.1 vector string"
+            },
+            "severity": {
+              "type": "string",
+              "enum": ["None", "Low", "Medium", "High", "Critical"],
+              "description": "CVSS severity rating"
+            }
+          }
+        },
+        "location": {
+          "$ref": "#/$defs/location",
+          "description": "Location of the vulnerability"
+        },
+        "evidence": {
+          "type": "string",
+          "maxLength": 5000,
+          "description": "Evidence: code snippet, request/response, or PoC"
+        },
+        "remediation": {
+          "type": "string",
+          "maxLength": 2000,
+          "description": "Specific fix instructions for this finding"
+        },
+        "references": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "required": ["title", "url"],
+            "properties": {
+              "title": { "type": "string" },
+              "url": { "type": "string", "format": "uri" }
+            }
+          },
+          "maxItems": 10,
+          "description": "External references (OWASP, CWE, CVE, etc.)"
+        },
+        "falsePositive": {
+          "type": "boolean",
+          "default": false,
+          "description": "Potential false positive flag"
+        },
+        "confidence": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Confidence in finding accuracy (0.0-1.0)"
+        },
+        "exploitability": {
+          "type": "string",
+          "enum": ["trivial", "easy", "moderate", "difficult", "theoretical"],
+          "description": "How easy is it to exploit this vulnerability"
+        },
+        "affectedVersions": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "Affected package/library versions for dependency vulnerabilities"
+        },
+        "cve": {
+          "type": "string",
+          "pattern": "^CVE-\\d{4}-\\d{4,}$",
+          "description": "CVE identifier if applicable"
+        }
+      }
+    },
+    "securityRecommendation": {
+      "type": "object",
+      "required": ["id", "title", "priority", "owaspCategories"],
+      "properties": {
+        "id": {
+          "type": "string",
+          "pattern": "^REC-\\d{3,6}$",
+          "description": "Unique recommendation identifier"
+        },
+        "title": {
+          "type": "string",
+          "minLength": 10,
+          "maxLength": 200,
+          "description": "Recommendation title"
+        },
+        "description": {
+          "type": "string",
+          "maxLength": 2000,
+          "description": "Detailed recommendation description"
+        },
+        "priority": {
+          "type": "string",
+          "enum": ["critical", "high", "medium", "low"],
+          "description": "Remediation priority"
+        },
+        "effort": {
+          "type": "string",
+          "enum": ["trivial", "low", "medium", "high", "major"],
+          "description": "Estimated effort: trivial(<1hr), low(1-4hr), medium(1-3d), high(1-2wk), major(>2wk)"
+        },
+        "impact": {
+          "type": "integer",
+          "minimum": 1,
+          "maximum": 10,
+          "description": "Security impact if implemented (1-10)"
+        },
+        "relatedFindings": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "pattern": "^SEC-\\d{3,6}$"
+          },
+          "description": "IDs of findings this addresses"
+        },
+        "owaspCategories": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "pattern": "^A(0[1-9]|10):20(21|25)$"
+          },
+          "description": "OWASP categories this recommendation addresses"
+        },
+        "codeExample": {
+          "type": "object",
+          "properties": {
+            "before": {
+              "type": "string",
+              "maxLength": 2000,
+              "description": "Vulnerable code example"
+            },
+            "after": {
+              "type": "string",
+              "maxLength": 2000,
+              "description": "Secure code example"
+            },
+            "language": {
+              "type": "string",
+              "description": "Programming language"
+            }
+          },
+          "description": "Before/after code examples for remediation"
+        },
+        "resources": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "required": ["title", "url"],
+            "properties": {
+              "title": { "type": "string" },
+              "url": { "type": "string", "format": "uri" }
+            }
+          },
+          "maxItems": 10,
+          "description": "External resources and documentation"
+        },
+        "automatable": {
+          "type": "boolean",
+          "description": "Can this fix be automated?"
+        },
+        "fixCommand": {
+          "type": "string",
+          "description": "CLI command to apply fix if automatable"
+        }
+      }
+    },
+    "owaspCategoryBreakdown": {
+      "type": "object",
+      "description": "OWASP Top 10 2021 category scores and findings",
+      "properties": {
+        "A01:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A01:2021 - Broken Access Control"
+        },
+        "A02:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A02:2021 - Cryptographic Failures"
+        },
+        "A03:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A03:2021 - Injection"
+        },
+        "A04:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A04:2021 - Insecure Design"
+        },
+        "A05:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A05:2021 - Security Misconfiguration"
+        },
+        "A06:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A06:2021 - Vulnerable and Outdated Components"
+        },
+        "A07:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A07:2021 - Identification and Authentication Failures"
+        },
+        "A08:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A08:2021 - Software and Data Integrity Failures"
+        },
+        "A09:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A09:2021 - Security Logging and Monitoring Failures"
+        },
+        "A10:2021": {
+          "$ref": "#/$defs/owaspCategoryScore",
+          "description": "A10:2021 - Server-Side Request Forgery (SSRF)"
+        }
+      },
+      "additionalProperties": false
+    },
+    "owaspCategoryScore": {
+      "type": "object",
+      "required": ["tested", "score"],
+      "properties": {
+        "tested": {
+          "type": "boolean",
+          "description": "Whether this category was tested"
+        },
+        "score": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 100,
+          "description": "Category score (100 = no issues, 0 = critical)"
+        },
+        "grade": {
+          "type": "string",
+          "pattern": "^[A-F][+-]?$",
+          "description": "Letter grade for this category"
+        },
+        "findingCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of findings in this category"
+        },
+        "criticalCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of critical findings"
+        },
+        "highCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of high severity findings"
+        },
+        "status": {
+          "type": "string",
+          "enum": ["pass", "fail", "warn", "skip"],
+          "description": "Category status"
+        },
+        "description": {
+          "type": "string",
+          "description": "Category description and context"
+        },
+        "cwes": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "pattern": "^CWE-\\d{1,4}$"
+          },
+          "description": "CWEs found in this category"
+        }
+      }
+    },
+    "securityMetrics": {
+      "type": "object",
+      "properties": {
+        "totalFindings": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Total vulnerabilities found"
+        },
+        "criticalCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Critical severity findings"
+        },
+        "highCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "High severity findings"
+        },
+        "mediumCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Medium severity findings"
+        },
+        "lowCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Low severity findings"
+        },
+        "infoCount": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Informational findings"
+        },
+        "filesScanned": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of files analyzed"
+        },
+        "linesOfCode": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Lines of code scanned"
+        },
+        "dependenciesChecked": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Number of dependencies checked"
+        },
+        "owaspCategoriesTested": {
+          "type": "integer",
+          "minimum": 0,
+          "maximum": 10,
+          "description": "OWASP Top 10 categories tested"
+        },
+        "owaspCategoriesPassed": {
+          "type": "integer",
+          "minimum": 0,
+          "maximum": 10,
+          "description": "OWASP Top 10 categories with no findings"
+        },
+        "uniqueCwes": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Unique CWE identifiers found"
+        },
+        "falsePositiveRate": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Estimated false positive rate"
+        },
+        "scanDurationMs": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Total scan duration in milliseconds"
+        },
+        "coverage": {
+          "type": "object",
+          "properties": {
+            "sast": {
+              "type": "boolean",
+              "description": "Static analysis performed"
+            },
+            "dast": {
+              "type": "boolean",
+              "description": "Dynamic analysis performed"
+            },
+            "dependencies": {
+              "type": "boolean",
+              "description": "Dependency scan performed"
+            },
+            "secrets": {
+              "type": "boolean",
+              "description": "Secret scanning performed"
+            },
+            "configuration": {
+              "type": "boolean",
+              "description": "Configuration review performed"
+            }
+          },
+          "description": "Scan coverage indicators"
+        }
+      }
+    },
+    "scanConfiguration": {
+      "type": "object",
+      "properties": {
+        "target": {
+          "type": "string",
+          "description": "Scan target (file path, URL, or package)"
+        },
+        "targetType": {
+          "type": "string",
+          "enum": ["source", "url", "package", "container", "infrastructure"],
+          "description": "Type of target being scanned"
+        },
+        "scanTypes": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "enum": ["sast", "dast", "dependency", "secret", "configuration", "container", "iac"]
+          },
+          "description": "Types of scans performed"
+        },
+        "severity": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "enum": ["critical", "high", "medium", "low", "info"]
+          },
+          "description": "Severity levels included in scan"
+        },
+        "owaspCategories": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "pattern": "^A(0[1-9]|10):20(21|25)$"
+          },
+          "description": "OWASP categories tested"
+        },
+        "tools": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "Security tools used"
+        },
+        "excludePatterns": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "File patterns excluded from scan"
+        },
+        "rulesets": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "Security rulesets applied"
+        }
+      }
+    },
+    "location": {
+      "type": "object",
+      "properties": {
+        "file": {
+          "type": "string",
+          "maxLength": 500,
+          "description": "File path relative to project root"
+        },
+        "line": {
+          "type": "integer",
+          "minimum": 1,
+          "description": "Line number"
+        },
+        "column": {
+          "type": "integer",
+          "minimum": 1,
+          "description": "Column number"
+        },
+        "endLine": {
+          "type": "integer",
+          "minimum": 1,
+          "description": "End line for multi-line findings"
+        },
+        "endColumn": {
+          "type": "integer",
+          "minimum": 1,
+          "description": "End column"
+        },
+        "url": {
+          "type": "string",
+          "format": "uri",
+          "description": "URL for web-based findings"
+        },
+        "endpoint": {
+          "type": "string",
+          "description": "API endpoint path"
+        },
+        "method": {
+          "type": "string",
+          "enum": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS"],
+          "description": "HTTP method for API findings"
+        },
+        "parameter": {
+          "type": "string",
+          "description": "Vulnerable parameter name"
+        },
+        "component": {
+          "type": "string",
+          "description": "Affected component or module"
+        }
+      }
+    },
+    "artifact": {
+      "type": "object",
+      "required": ["type", "path"],
+      "properties": {
+        "type": {
+          "type": "string",
+          "enum": ["report", "sarif", "data", "log", "evidence"],
+          "description": "Artifact type"
+        },
+        "path": {
+          "type": "string",
+          "maxLength": 500,
+          "description": "Path to artifact"
+        },
+        "format": {
+          "type": "string",
+          "enum": ["json", "sarif", "html", "md", "txt", "xml", "csv"],
+          "description": "Artifact format"
+        },
+        "description": {
+          "type": "string",
+          "maxLength": 500,
+          "description": "Artifact description"
+        },
+        "sizeBytes": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "File size in bytes"
+        },
+        "checksum": {
+          "type": "string",
+          "pattern": "^sha256:[a-f0-9]{64}$",
+          "description": "SHA-256 checksum"
+        }
+      }
+    },
+    "timelineEvent": {
+      "type": "object",
+      "required": ["timestamp", "event"],
+      "properties": {
+        "timestamp": {
+          "type": "string",
+          "format": "date-time",
+          "description": "Event timestamp"
+        },
+        "event": {
+          "type": "string",
+          "maxLength": 200,
+          "description": "Event description"
+        },
+        "type": {
+          "type": "string",
+          "enum": ["start", "checkpoint", "warning", "error", "complete"],
+          "description": "Event type"
+        },
+        "durationMs": {
+          "type": "integer",
+          "minimum": 0,
+          "description": "Duration since previous event"
+        },
+        "phase": {
+          "type": "string",
+          "enum": ["initialization", "sast", "dast", "dependency", "secret", "reporting"],
+          "description": "Scan phase"
+        }
+      }
+    },
+    "metadata": {
+      "type": "object",
+      "properties": {
+        "executionTimeMs": {
+          "type": "integer",
+          "minimum": 0,
+          "maximum": 3600000,
+          "description": "Execution time in milliseconds"
+        },
+        "toolsUsed": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "enum": ["semgrep", "npm-audit", "trivy", "owasp-zap", "bandit", "gosec", "eslint-security", "snyk", "gitleaks", "trufflehog", "bearer"]
+          },
+          "uniqueItems": true,
+          "description": "Security tools used"
+        },
+        "agentId": {
+          "type": "string",
+          "pattern": "^qe-[a-z][a-z0-9-]*$",
+          "description": "Agent ID (e.g., qe-security-scanner)"
+        },
+        "modelUsed": {
+          "type": "string",
+          "description": "LLM model used for analysis"
+        },
+        "inputHash": {
+          "type": "string",
+          "pattern": "^[a-f0-9]{64}$",
+          "description": "SHA-256 hash of input"
+        },
+        "targetUrl": {
+          "type": "string",
+          "format": "uri",
+          "description": "Target URL if applicable"
+        },
+        "targetPath": {
+          "type": "string",
+          "description": "Target path if applicable"
+        },
+        "environment": {
+          "type": "string",
+          "enum": ["development", "staging", "production", "ci"],
+          "description": "Execution environment"
+        },
+        "retryCount": {
+          "type": "integer",
+          "minimum": 0,
+          "maximum": 10,
+          "description": "Number of retries"
+        }
+      }
+    },
+    "validationResult": {
+      "type": "object",
+      "properties": {
+        "schemaValid": {
+          "type": "boolean",
+          "description": "Passes JSON schema validation"
+        },
+        "contentValid": {
+          "type": "boolean",
+          "description": "Passes content validation"
+        },
+        "confidence": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Confidence score"
+        },
+        "warnings": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "maxLength": 500
+          },
+          "maxItems": 20,
+          "description": "Validation warnings"
+        },
+        "errors": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "maxLength": 500
+          },
+          "maxItems": 20,
+          "description": "Validation errors"
+        },
+        "validatorVersion": {
+          "type": "string",
+          "pattern": "^\\d+\\.\\d+\\.\\d+$",
+          "description": "Validator version"
+        }
+      }
+    },
+    "learningData": {
+      "type": "object",
+      "properties": {
+        "patternsDetected": {
+          "type": "array",
+          "items": {
+            "type": "string",
+            "maxLength": 200
+          },
+          "maxItems": 20,
+          "description": "Security patterns detected (e.g., sql-injection-string-concat)"
+        },
+        "reward": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 1,
+          "description": "Reward signal for learning (0.0-1.0)"
+        },
+        "feedbackLoop": {
+          "type": "object",
+          "properties": {
+            "previousRunId": {
+              "type": "string",
+              "format": "uuid",
+              "description": "Previous run ID for comparison"
+            },
+            "improvement": {
+              "type": "number",
+              "minimum": -1,
+              "maximum": 1,
+              "description": "Improvement over previous run"
+            }
+          }
+        },
+        "newVulnerabilityPatterns": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "properties": {
+              "pattern": { "type": "string" },
+              "cwe": { "type": "string" },
+              "confidence": { "type": "number" }
+            }
+          },
+          "description": "New vulnerability patterns learned"
+        }
+      }
+    }
+  }
+}
@@ -0,0 +1,45 @@
+{
+  "skillName": "security-testing",
+  "skillVersion": "1.0.0",
+  "requiredTools": [
+    "jq"
+  ],
+  "optionalTools": [
+    "npm",
+    "semgrep",
+    "trivy",
+    "ajv",
+    "jsonschema",
+    "python3"
+  ],
+  "schemaPath": "schemas/output.json",
+  "requiredFields": [
+    "skillName",
+    "status",
+    "output",
+    "output.summary",
+    "output.findings",
+    "output.owaspCategories"
+  ],
+  "requiredNonEmptyFields": [
+    "output.summary"
+  ],
+  "mustContainTerms": [
+    "OWASP",
+    "security",
+    "vulnerability"
+  ],
+  "mustNotContainTerms": [
+    "TODO",
+    "placeholder",
+    "FIXME"
+  ],
+  "enumValidations": {
+    ".status": [
+      "success",
+      "partial",
+      "failed",
+      "skipped"
+    ]
+  }
+}