Update .gitignore to include additional file types and directories for Python projects, enhancing environment management and build artifacts exclusion.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-20 21:28:16 +02:00
parent 9e5b0f2cc2
commit 7556f3b012
65 changed files with 9165 additions and 7 deletions
+460
View File
@@ -0,0 +1,460 @@
---
name: autopilot
description: |
Auto-chaining orchestrator that drives the full BUILD-SHIP workflow from problem gathering through deployment.
Detects current project state from _docs/ folder, resumes from where it left off, and flows through
problem → research → plan → decompose → implement → deploy without manual skill invocation.
Maximizes work per conversation by auto-transitioning between skills.
Trigger phrases:
- "autopilot", "auto", "start", "continue"
- "what's next", "where am I", "project status"
category: meta
tags: [orchestrator, workflow, auto-chain, state-machine, meta-skill]
disable-model-invocation: true
---
# Autopilot Orchestrator
Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Detects project state from `_docs/`, resumes from where work stopped, and flows through skills automatically. The user invokes `/autopilot` once — the engine handles sequencing, transitions, and re-entry.
## Core Principles
- **Auto-chain**: when a skill completes, immediately start the next one — no pause between skills
- **Only pause at decision points**: BLOCKING gates inside sub-skills are the natural pause points; do not add artificial stops between steps
- **State from disk**: all progress is persisted to `_docs/_autopilot_state.md` and cross-checked against `_docs/` folder structure
- **Rich re-entry**: on every invocation, read the state file for full context before continuing
- **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here
- **Sound on pause**: follow `.cursor/rules/human-input-sound.mdc` — play a notification sound before every pause that requires human input
- **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically
- **Jira MCP required**: steps that create Jira artifacts (Plan Step 6, Decompose) must have authenticated Jira MCP — never skip or substitute with local files
## Jira MCP Authentication
Several workflow steps create Jira artifacts (epics, tasks, links). The Jira MCP server must be authenticated **before** any step that writes to Jira.
### Steps That Require Jira MCP
| Step | Sub-Step | Jira Action |
|------|----------|-------------|
| 2 (Plan) | Step 6 — Jira Epics | Create epics for each component |
| 3 (Decompose) | Step 13 — All tasks | Create Jira ticket per task, link to epic |
### Authentication Gate
Before entering **Step 2 (Plan)** or **Step 3 (Decompose)** for the first time, the autopilot must:
1. Call `mcp_auth` on the Jira MCP server
2. If authentication succeeds → proceed normally
3. If the user **skips** authentication → **STOP**. Present using Choose format:
```
══════════════════════════════════════
BLOCKER: Jira MCP authentication required
══════════════════════════════════════
A) Authenticate now (retry mcp_auth)
B) Pause autopilot — resume after configuring Jira MCP
══════════════════════════════════════
Note: Jira integration is mandatory. Plan and Decompose
steps create epics and tasks that drive implementation.
Local-only workarounds are not acceptable.
══════════════════════════════════════
```
Do NOT offer a "skip Jira" or "save locally" option. The workflow depends on Jira IDs for task referencing, dependency tracking, and implementation batching.
### Re-Authentication
If Jira MCP was already authenticated in a previous invocation (verify by listing available Jira tools beyond `mcp_auth`), skip the auth gate.
## User Interaction Protocol
Every time the autopilot or a sub-skill needs a user decision, use the **Choose A / B / C / D** format. This applies to:
- State transitions where multiple valid next actions exist
- Sub-skill BLOCKING gates that require user judgment
- Any fork where the autopilot cannot confidently pick the right path
- Trade-off decisions (tech choices, scope, risk acceptance)
### When to Ask (MUST ask)
- The next action is ambiguous (e.g., "another research round or proceed?")
- The decision has irreversible consequences (e.g., architecture choices, skipping a step)
- The user's intent or preference cannot be inferred from existing artifacts
- A sub-skill's BLOCKING gate explicitly requires user confirmation
- Multiple valid approaches exist with meaningfully different trade-offs
### When NOT to Ask (auto-transition)
- Only one logical next step exists (e.g., Problem complete → Research is the only option)
- The transition is deterministic from the state (e.g., Plan complete → Decompose)
- The decision is low-risk and reversible
- Existing artifacts or prior decisions already imply the answer
### Choice Format
Always present decisions in this format:
```
══════════════════════════════════════
DECISION REQUIRED: [brief context]
══════════════════════════════════════
A) [Option A — short description]
B) [Option B — short description]
C) [Option C — short description, if applicable]
D) [Option D — short description, if applicable]
══════════════════════════════════════
Recommendation: [A/B/C/D] — [one-line reason]
══════════════════════════════════════
```
Rules:
1. Always provide 24 concrete options (never open-ended questions)
2. Always include a recommendation with a brief justification
3. Keep option descriptions to one line each
4. If only 2 options make sense, use A/B only — do not pad with filler options
5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice
6. Record every user decision in the state file's `Key Decisions` section
7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
## State File: `_docs/_autopilot_state.md`
The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
### Format
```markdown
# Autopilot State
## Current Step
step: [0-5 or "done"]
name: [Problem / Research / Plan / Decompose / Implement / Deploy / Done]
status: [not_started / in_progress / completed]
sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
## Step ↔ SubStep Reference
| Step | Name | Sub-Skill | Internal SubSteps |
|------|------------|------------------------|------------------------------------------|
| 0 | Problem | problem/SKILL.md | Phase 14 |
| 1 | Research | research/SKILL.md | Mode A: Phase 14 · Mode B: Step 08 |
| 2 | Plan | plan/SKILL.md | Step 16 |
| 3 | Decompose | decompose/SKILL.md | Step 14 |
| 4 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
| 5 | Deploy | deploy/SKILL.md | Step 17 |
When updating `Current Step`, always write it as:
step: N ← autopilot step (05)
sub_step: M ← sub-skill's own internal step/phase number + name
Example:
step: 2
name: Plan
status: in_progress
sub_step: 4 — Architecture Review & Risk Assessment
## Completed Steps
| Step | Name | Completed | Key Outcome |
|------|------|-----------|-------------|
| 0 | Problem | [date] | [one-line summary] |
| 1 | Research | [date] | [N drafts, final approach summary] |
| 2 | Plan | [date] | [N components, architecture summary] |
| 3 | Decompose | [date] | [N tasks, total complexity points] |
| 4 | Implement | [date] | [N batches, pass/fail summary] |
| 5 | Deploy | [date] | [artifacts produced] |
## Key Decisions
- [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
- [decision 2: e.g. "6 research rounds, final draft: solution_draft06.md"]
- [decision N]
## Last Session
date: [date]
ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
reason: [completed step / session boundary / user paused / context limit]
notes: [any context for next session, e.g. "User asked to revisit risk assessment"]
## Blockers
- [blocker 1, if any]
- [none]
```
### State File Rules
1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 0)
2. **Update** the state file after every step completion, every session boundary, and every BLOCKING gate confirmation
3. **Read** the state file as the first action on every invocation — before folder scanning
4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 2 but `_docs/02_plans/architecture.md` already exists), trust the folder structure and update the state file to match
5. **Never delete** the state file. It accumulates history across the entire project lifecycle
## Execution Entry Point
Every invocation of this skill follows the same sequence:
```
1. Read _docs/_autopilot_state.md (if exists)
2. Cross-check state file against _docs/ folder structure
3. Resolve current step (state file + folder scan)
4. Present Status Summary (from state file context)
5. Enter Execution Loop:
a. Read and execute the current skill's SKILL.md
b. When skill completes → update state file
c. Re-detect next step
d. If next skill is ready → auto-chain (go to 5a with next skill)
e. If session boundary reached → update state file with session notes → suggest new conversation
f. If all steps done → update state file → report completion
```
## State Detection
Read `_docs/_autopilot_state.md` first. If it exists and is consistent with the folder structure, use the `Current Step` from the state file. If the state file doesn't exist or is inconsistent, fall back to folder scanning.
### Folder Scan Rules (fallback)
Scan `_docs/` to determine the current workflow position. Check rules in order — first match wins.
### Detection Rules
**Step 0 — Problem Gathering**
Condition: `_docs/00_problem/` does not exist, OR any of these are missing/empty:
- `problem.md`
- `restrictions.md`
- `acceptance_criteria.md`
- `input_data/` (must contain at least one file)
Action: Read and execute `.cursor/skills/problem/SKILL.md`
---
**Step 1 — Research (Initial)**
Condition: `_docs/00_problem/` is complete AND `_docs/01_solution/` has no `solution_draft*.md` files
Action: Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode A)
---
**Step 1b — Research Decision**
Condition: `_docs/01_solution/` contains `solution_draft*.md` files AND `_docs/01_solution/solution.md` does not exist AND `_docs/02_plans/architecture.md` does not exist
Action: Present the current research state to the user:
- How many solution drafts exist
- Whether tech_stack.md and security_analysis.md exist
- One-line summary from the latest draft
Then present using the **Choose format**:
```
══════════════════════════════════════
DECISION REQUIRED: Research complete — next action?
══════════════════════════════════════
A) Run another research round (Mode B assessment)
B) Proceed to planning with current draft
══════════════════════════════════════
Recommendation: [A or B] — [reason based on draft quality]
══════════════════════════════════════
```
- If user picks A → Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode B)
- If user picks B → auto-chain to Step 2 (Plan)
---
**Step 2 — Plan**
Condition: `_docs/01_solution/` has `solution_draft*.md` files AND `_docs/02_plans/architecture.md` does not exist
Action:
1. The plan skill's Prereq 2 will rename the latest draft to `solution.md` — this is handled by the plan skill itself
2. Read and execute `.cursor/skills/plan/SKILL.md`
If `_docs/02_plans/` exists but is incomplete (has some artifacts but no `FINAL_report.md`), the plan skill's built-in resumability handles it.
---
**Step 3 — Decompose**
Condition: `_docs/02_plans/` contains `architecture.md` AND `_docs/02_plans/components/` has at least one component AND `_docs/02_tasks/` does not exist or has no task files (excluding `_dependencies_table.md`)
Action: Read and execute `.cursor/skills/decompose/SKILL.md`
If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
---
**Step 4 — Implement**
Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
Action: Read and execute `.cursor/skills/implement/SKILL.md`
If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
---
**Step 5 — Deploy**
Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND `_docs/04_deploy/` does not exist or is incomplete
Action: Read and execute `.cursor/skills/deploy/SKILL.md`
---
**Done**
Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md)
Action: Report project completion with summary.
## Status Summary
On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback).
Format:
```
═══════════════════════════════════════════════════
AUTOPILOT STATUS
═══════════════════════════════════════════════════
Step 0 Problem [DONE / IN PROGRESS / NOT STARTED]
Step 1 Research [DONE (N drafts) / IN PROGRESS / NOT STARTED]
Step 2 Plan [DONE / IN PROGRESS / NOT STARTED]
Step 3 Decompose [DONE (N tasks) / IN PROGRESS / NOT STARTED]
Step 4 Implement [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
Step 5 Deploy [DONE / IN PROGRESS / NOT STARTED]
═══════════════════════════════════════════════════
Current: Step N — Name
SubStep: M — [sub-skill internal step name]
Action: [what will happen next]
═══════════════════════════════════════════════════
```
For re-entry (state file exists), also include:
- Key decisions from the state file's `Key Decisions` section
- Last session context from the `Last Session` section
- Any blockers from the `Blockers` section
## Auto-Chain Rules
After a skill completes, apply these rules:
| Completed Step | Next Action |
|---------------|-------------|
| Problem Gathering | Auto-chain → Research (Mode A) |
| Research (any round) | Auto-chain → Research Decision (ask user: another round or proceed?) |
| Research Decision → proceed | Auto-chain → Plan |
| Plan | Auto-chain → Decompose |
| Decompose | **Session boundary** — suggest new conversation before Implement |
| Implement | Auto-chain → Deploy |
| Deploy | Report completion |
### Session Boundary: Decompose → Implement
After decompose completes, **do not auto-chain to implement**. Instead:
1. Update state file: mark Decompose as completed, set current step to 4 (Implement) with status `not_started`
2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
3. Present a summary: number of tasks, estimated batches, total complexity points
4. Use Choose format:
```
══════════════════════════════════════
DECISION REQUIRED: Decompose complete — start implementation?
══════════════════════════════════════
A) Start a new conversation for implementation (recommended for context freshness)
B) Continue implementation in this conversation
══════════════════════════════════════
Recommendation: A — implementation is the longest phase, fresh context helps
══════════════════════════════════════
```
This is the only hard session boundary. All other transitions auto-chain.
## Skill Delegation
For each step, the delegation pattern is:
1. Update state file: set `step` to the autopilot step number (05), status to `in_progress`, set `sub_step` to the sub-skill's current internal step/phase number and name
2. Announce: "Starting [Skill Name]..."
3. Read the skill file: `.cursor/skills/[name]/SKILL.md`
4. Execute the skill's workflow exactly as written, including:
- All BLOCKING gates (present to user, wait for confirmation)
- All self-verification checklists
- All save actions
- All escalation rules
- Update `sub_step` in the state file each time the sub-skill advances to a new internal step/phase
5. When the skill's workflow is fully complete:
- Update state file: mark step as `completed`, record date, write one-line key outcome
- Add any key decisions made during this step to the `Key Decisions` section
- Return to the auto-chain rules
Do NOT modify, skip, or abbreviate any part of the sub-skill's workflow. The autopilot is a sequencer, not an optimizer.
## Re-Entry Protocol
When the user invokes `/autopilot` and work already exists:
1. Read `_docs/_autopilot_state.md`
2. Cross-check against `_docs/` folder structure
3. Present Status Summary with context from state file (key decisions, last session, blockers)
4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
5. Continue execution from detected state
## Error Handling
All error situations that require user input MUST use the **Choose A / B / C / D** format.
| Situation | Action |
|-----------|--------|
| State detection is ambiguous (artifacts suggest two different steps) | Present findings and use Choose format with the candidate steps as options |
| Sub-skill fails or hits an unrecoverable blocker | Use Choose format: A) retry, B) skip with warning, C) abort and fix manually |
| User wants to skip a step | Use Choose format: A) skip (with dependency warning), B) execute the step |
| User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
| User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |
## Trigger Conditions
This skill activates when the user wants to:
- Start a new project from scratch
- Continue an in-progress project
- Check project status
- Let the AI guide them through the full workflow
**Keywords**: "autopilot", "auto", "start", "continue", "what's next", "where am I", "project status"
**Differentiation**:
- User wants only research → use `/research` directly
- User wants only planning → use `/plan` directly
- User wants the full guided workflow → use `/autopilot`
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Autopilot (Auto-Chain Orchestrator) │
├────────────────────────────────────────────────────────────────┤
│ EVERY INVOCATION: │
│ 1. State Detection (scan _docs/) │
│ 2. Status Summary (show progress) │
│ 3. Execute current skill │
│ 4. Auto-chain to next skill (loop) │
│ │
│ WORKFLOW: │
│ Step 0 Problem → .cursor/skills/problem/SKILL.md │
│ ↓ auto-chain │
│ Step 1 Research → .cursor/skills/research/SKILL.md │
│ ↓ auto-chain (ask: another round?) │
│ Step 2 Plan → .cursor/skills/plan/SKILL.md │
│ ↓ auto-chain │
│ Step 3 Decompose → .cursor/skills/decompose/SKILL.md │
│ ↓ SESSION BOUNDARY (suggest new conversation) │
│ Step 4 Implement → .cursor/skills/implement/SKILL.md │
│ ↓ auto-chain │
│ Step 5 Deploy → .cursor/skills/deploy/SKILL.md │
│ ↓ │
│ DONE │
│ │
│ STATE FILE: _docs/_autopilot_state.md │
│ FALLBACK: _docs/ folder structure scan │
│ PAUSE POINTS: sub-skill BLOCKING gates only │
│ SESSION BREAK: after Decompose (before Implement) │
│ USER INPUT: Choose A/B/C/D format at genuine decisions only │
│ AUTO-TRANSITION: when path is unambiguous, don't ask │
├────────────────────────────────────────────────────────────────┤
│ Principles: Auto-chain · State to file · Rich re-entry │
│ Delegate don't duplicate · Pause at decisions only │
│ Minimize interruptions · Choose format for decisions │
└────────────────────────────────────────────────────────────────┘
```
+154
View File
@@ -0,0 +1,154 @@
---
name: code-review
description: |
Multi-phase code review against task specs with structured findings output.
6-phase workflow: context loading, spec compliance, code quality, security quick-scan, performance scan, cross-task consistency.
Produces a structured report with severity-ranked findings and a PASS/FAIL/PASS_WITH_WARNINGS verdict.
Invoked by /implement skill after each batch, or manually.
Trigger phrases:
- "code review", "review code", "review implementation"
- "check code quality", "review against specs"
category: review
tags: [code-review, quality, security-scan, performance, SOLID]
disable-model-invocation: true
---
# Code Review
Multi-phase code review that verifies implementation against task specs, checks code quality, and produces structured findings.
## Core Principles
- **Understand intent first**: read the task specs before reviewing code — know what it should do before judging how
- **Structured output**: every finding has severity, category, location, description, and suggestion
- **Deduplicate**: same issue at the same location is reported once using `{file}:{line}:{title}` as key
- **Severity-ranked**: findings sorted Critical > High > Medium > Low
- **Verdict-driven**: clear PASS/FAIL/PASS_WITH_WARNINGS drives automation decisions
## Input
- List of task spec files that were just implemented (paths to `[JIRA-ID]_[short_name].md`)
- Changed files (detected via `git diff` or provided by the `/implement` skill)
- Project context: `_docs/00_problem/restrictions.md`, `_docs/01_solution/solution.md`
## Phase 1: Context Loading
Before reviewing code, build understanding of intent:
1. Read each task spec — acceptance criteria, scope, constraints, dependencies
2. Read project restrictions and solution overview
3. Map which changed files correspond to which task specs
4. Understand what the code is supposed to do before judging how it does it
## Phase 2: Spec Compliance Review
For each task, verify implementation satisfies every acceptance criterion:
- Walk through each AC (Given/When/Then) and trace it in the code
- Check that unit tests cover each AC
- Check that integration tests exist where specified in the task spec
- Flag any AC that is not demonstrably satisfied as a **Spec-Gap** finding (severity: High)
- Flag any scope creep (implementation beyond what the spec asked for) as a **Scope** finding (severity: Low)
## Phase 3: Code Quality Review
Check implemented code against quality standards:
- **SOLID principles** — single responsibility, open/closed, Liskov, interface segregation, dependency inversion
- **Error handling** — consistent strategy, no bare catch/except, meaningful error messages
- **Naming** — clear intent, follows project conventions
- **Complexity** — functions longer than 50 lines or cyclomatic complexity > 10
- **DRY** — duplicated logic across files
- **Test quality** — tests assert meaningful behavior, not just "no error thrown"
- **Dead code** — unused imports, unreachable branches
## Phase 4: Security Quick-Scan
Lightweight security checks (defer deep analysis to the `/security` skill):
- SQL injection via string interpolation
- Command injection (subprocess with shell=True, exec, eval)
- Hardcoded secrets, API keys, passwords
- Missing input validation on external inputs
- Sensitive data in logs or error messages
- Insecure deserialization
## Phase 5: Performance Scan
Check for common performance anti-patterns:
- O(n^2) or worse algorithms where O(n) is possible
- N+1 query patterns
- Unbounded data fetching (missing pagination/limits)
- Blocking I/O in async contexts
- Unnecessary memory copies or allocations in hot paths
## Phase 6: Cross-Task Consistency
When multiple tasks were implemented in the same batch:
- Interfaces between tasks are compatible (method signatures, DTOs match)
- No conflicting patterns (e.g., one task uses repository pattern, another does raw SQL)
- Shared code is not duplicated across task implementations
- Dependencies declared in task specs are properly wired
## Output Format
Produce a structured report with findings deduplicated and sorted by severity:
```markdown
# Code Review Report
**Batch**: [task list]
**Date**: [YYYY-MM-DD]
**Verdict**: PASS | PASS_WITH_WARNINGS | FAIL
## Findings
| # | Severity | Category | File:Line | Title |
|---|----------|----------|-----------|-------|
| 1 | Critical | Security | src/api/auth.py:42 | SQL injection via f-string |
| 2 | High | Spec-Gap | src/service/orders.py | AC-3 not satisfied |
### Finding Details
**F1: SQL injection via f-string** (Critical / Security)
- Location: `src/api/auth.py:42`
- Description: User input interpolated directly into SQL query
- Suggestion: Use parameterized query via bind parameters
- Task: 04_auth_service
**F2: AC-3 not satisfied** (High / Spec-Gap)
- Location: `src/service/orders.py`
- Description: AC-3 requires order total recalculation on item removal, but no such logic exists
- Suggestion: Add recalculation in remove_item() method
- Task: 07_order_processing
```
## Severity Definitions
| Severity | Meaning | Blocks? |
|----------|---------|---------|
| Critical | Security vulnerability, data loss, crash | Yes — verdict FAIL |
| High | Spec gap, logic bug, broken test | Yes — verdict FAIL |
| Medium | Performance issue, maintainability concern, missing validation | No — verdict PASS_WITH_WARNINGS |
| Low | Style, minor improvement, scope creep | No — verdict PASS_WITH_WARNINGS |
## Category Values
Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope
## Verdict Logic
- **FAIL**: any Critical or High finding exists
- **PASS_WITH_WARNINGS**: only Medium or Low findings
- **PASS**: no findings
## Integration with /implement
The `/implement` skill invokes this skill after each batch completes:
1. Collects changed files from all implementer agents in the batch
2. Passes task spec paths + changed files to this skill
3. If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms
4. If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info)
+295
View File
@@ -0,0 +1,295 @@
---
name: decompose
description: |
Decompose planned components into atomic implementable tasks with bootstrap structure plan.
4-step workflow: bootstrap structure plan, component task decomposition, integration test task decomposition, and cross-task verification.
Supports full decomposition (_docs/ structure) and single component mode.
Trigger phrases:
- "decompose", "decompose features", "feature decomposition"
- "task decomposition", "break down components"
- "prepare for implementation"
category: build
tags: [decomposition, tasks, dependencies, jira, implementation-prep]
disable-model-invocation: true
---
# Task Decomposition
Decompose planned components into atomic, implementable task specs with a bootstrap structure plan through a systematic workflow. All tasks are named with their Jira ticket ID prefix in a flat directory.
## Core Principles
- **Atomic tasks**: each task does one thing; if it exceeds 5 complexity points, split it
- **Behavioral specs, not implementation plans**: describe what the system should do, not how to build it
- **Flat structure**: all tasks are Jira-ID-prefixed files in TASKS_DIR — no component subdirectories
- **Save immediately**: write artifacts to disk after each task; never accumulate unsaved work
- **Jira inline**: create Jira ticket immediately after writing each task file
- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
- **Plan, don't code**: this workflow produces documents and Jira tasks, never implementation code
## Context Resolution
Determine the operating mode based on invocation before any other logic runs.
**Default** (no explicit input file provided):
- PLANS_DIR: `_docs/02_plans/`
- TASKS_DIR: `_docs/02_tasks/`
- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, PLANS_DIR
- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (integration tests) + Step 4 (cross-verification)
**Single component mode** (provided file is within `_docs/02_plans/` and inside a `components/` subdirectory):
- PLANS_DIR: `_docs/02_plans/`
- TASKS_DIR: `_docs/02_tasks/`
- Derive component number and component name from the file path
- Ask user for the parent Epic ID
- Runs Step 2 (that component only, appending to existing task numbering)
Announce the detected mode and resolved paths to the user before proceeding.
## Input Specification
### Required Files
**Default:**
| File | Purpose |
|------|---------|
| `_docs/00_problem/problem.md` | Problem description and context |
| `_docs/00_problem/restrictions.md` | Constraints and limitations |
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
| `_docs/01_solution/solution.md` | Finalized solution |
| `PLANS_DIR/architecture.md` | Architecture from plan skill |
| `PLANS_DIR/system-flows.md` | System flows from plan skill |
| `PLANS_DIR/components/[##]_[name]/description.md` | Component specs from plan skill |
| `PLANS_DIR/integration_tests/` | Integration test specs from plan skill |
**Single component mode:**
| File | Purpose |
|------|---------|
| The provided component `description.md` | Component spec to decompose |
| Corresponding `tests.md` in the same directory (if available) | Test specs for context |
### Prerequisite Checks (BLOCKING)
**Default:**
1. PLANS_DIR contains `architecture.md` and `components/`**STOP if missing**
2. Create TASKS_DIR if it does not exist
3. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?**
**Single component mode:**
1. The provided component file exists and is non-empty — **STOP if missing**
## Artifact Management
### Directory Structure
```
TASKS_DIR/
├── [JIRA-ID]_initial_structure.md
├── [JIRA-ID]_[short_name].md
├── [JIRA-ID]_[short_name].md
├── ...
└── _dependencies_table.md
```
**Naming convention**: Each task file is initially saved with a temporary numeric prefix (`[##]_[short_name].md`). After creating the Jira ticket, rename the file to use the Jira ticket ID as prefix (`[JIRA-ID]_[short_name].md`). For example: `01_initial_structure.md``AZ-42_initial_structure.md`.
### Save Timing
| Step | Save immediately after | Filename |
|------|------------------------|----------|
| Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` |
| Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
| Step 3 | Each integration test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
| Step 4 | Cross-task verification complete | `_dependencies_table.md` |
### Resumability
If TASKS_DIR already contains task files:
1. List existing `*_*.md` files (excluding `_dependencies_table.md`) and count them
2. Resume numbering from the next number (for temporary numeric prefix before Jira rename)
3. Inform the user which tasks already exist and are being skipped
## Progress Tracking
At the start of execution, create a TodoWrite with all applicable steps. Update status as each step/component completes.
## Workflow
### Step 1: Bootstrap Structure Plan (default mode only)
**Role**: Professional software architect
**Goal**: Produce `01_initial_structure.md` — the first task describing the project skeleton
**Constraints**: This is a plan document, not code. The `/implement` skill executes it.
1. Read architecture.md, all component specs, system-flows.md, data_model.md, and `deployment/` from PLANS_DIR
2. Read problem, solution, and restrictions from `_docs/00_problem/` and `_docs/01_solution/`
3. Research best implementation patterns for the identified tech stack
4. Document the structure plan using `templates/initial-structure-task.md`
The bootstrap structure plan must include:
- Project folder layout with all component directories
- Shared models, interfaces, and DTOs
- Dockerfile per component (multi-stage, non-root, health checks, pinned base images)
- `docker-compose.yml` for local development (all components + database + dependencies)
- `docker-compose.test.yml` for integration test environment (black-box test runner)
- `.dockerignore`
- CI/CD pipeline file (`.github/workflows/ci.yml` or `azure-pipelines.yml`) with stages from `deployment/ci_cd_pipeline.md`
- Database migration setup and initial seed data scripts
- Observability configuration: structured logging setup, health check endpoints (`/health/live`, `/health/ready`), metrics endpoint (`/metrics`)
- Environment variable documentation (`.env.example`)
- Test structure with unit and integration test locations
**Self-verification**:
- [ ] All components have corresponding folders in the layout
- [ ] All inter-component interfaces have DTOs defined
- [ ] Dockerfile defined for each component
- [ ] `docker-compose.yml` covers all components and dependencies
- [ ] `docker-compose.test.yml` enables black-box integration testing
- [ ] CI/CD pipeline file defined with lint, test, security, build, deploy stages
- [ ] Database migration setup included
- [ ] Health check endpoints specified for each service
- [ ] Structured logging configuration included
- [ ] `.env.example` with all required environment variables
- [ ] Environment strategy covers dev, staging, production
- [ ] Test structure includes unit and integration test locations
**Save action**: Write `01_initial_structure.md` (temporary numeric name)
**Jira action**: Create a Jira ticket for this task under the "Bootstrap & Initial Structure" epic. Write the Jira ticket ID and Epic ID back into the task header.
**Rename action**: Rename the file from `01_initial_structure.md` to `[JIRA-ID]_initial_structure.md` (e.g., `AZ-42_initial_structure.md`). Update the **Task** field inside the file to match the new filename.
**BLOCKING**: Present structure plan summary to user. Do NOT proceed until user confirms.
---
### Step 2: Task Decomposition (all modes)
**Role**: Professional software architect
**Goal**: Decompose each component into atomic, implementable task specs — numbered sequentially starting from 02
**Constraints**: Behavioral specs only — describe what, not how. No implementation code.
**Numbering**: Tasks are numbered sequentially across all components in dependency order. Start from 02 (01 is initial_structure). In single component mode, start from the next available number in TASKS_DIR.
**Component ordering**: Process components in dependency order — foundational components first (shared models, database), then components that depend on them.
For each component (or the single provided component):
1. Read the component's `description.md` and `tests.md` (if available)
2. Decompose into atomic tasks; create only 1 task if the component is simple or atomic
3. Split into multiple tasks only when it is necessary and would be easier to implement
4. Do not create tasks for other components — only tasks for the current component
5. Each task should be atomic, containing 0 APIs or a list of semantically connected APIs
6. Write each task spec using `templates/task.md`
7. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
8. Note task dependencies (referencing Jira IDs of already-created dependency tasks, e.g., `AZ-42_initial_structure`)
9. **Immediately after writing each task file**: create a Jira ticket, link it to the component's epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
**Self-verification** (per component):
- [ ] Every task is atomic (single concern)
- [ ] No task exceeds 5 complexity points
- [ ] Task dependencies reference correct Jira IDs
- [ ] Tasks cover all interfaces defined in the component spec
- [ ] No tasks duplicate work from other components
- [ ] Every task has a Jira ticket linked to the correct epic
**Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename the file to `[JIRA-ID]_[short_name].md`. Update the **Task** field inside the file to match the new filename. Update **Dependencies** references in the file to use Jira IDs of the dependency tasks.
---
### Step 3: Integration Test Task Decomposition (default mode only)
**Role**: Professional Quality Assurance Engineer
**Goal**: Decompose integration test specs into atomic, implementable task specs
**Constraints**: Behavioral specs only — describe what, not how. No test code.
**Numbering**: Continue sequential numbering from where Step 2 left off.
1. Read all test specs from `PLANS_DIR/integration_tests/` (functional_tests.md, non_functional_tests.md)
2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
3. Each task should reference the specific test scenarios it implements and the environment/test_data specs
4. Dependencies: integration test tasks depend on the component implementation tasks they exercise
5. Write each task spec using `templates/task.md`
6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
7. Note task dependencies (referencing Jira IDs of already-created dependency tasks)
8. **Immediately after writing each task file**: create a Jira ticket under the "Integration Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
**Self-verification**:
- [ ] Every functional test scenario from `integration_tests/functional_tests.md` is covered by a task
- [ ] Every non-functional test scenario from `integration_tests/non_functional_tests.md` is covered by a task
- [ ] No task exceeds 5 complexity points
- [ ] Dependencies correctly reference the component tasks being tested
- [ ] Every task has a Jira ticket linked to the "Integration Tests" epic
**Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`.
---
### Step 4: Cross-Task Verification (default mode only)
**Role**: Professional software architect and analyst
**Goal**: Verify task consistency and produce `_dependencies_table.md`
**Constraints**: Review step — fix gaps found, do not add new tasks
1. Verify task dependencies across all tasks are consistent
2. Check no gaps: every interface in architecture.md has tasks covering it
3. Check no overlaps: tasks don't duplicate work across components
4. Check no circular dependencies in the task graph
5. Produce `_dependencies_table.md` using `templates/dependencies-table.md`
**Self-verification**:
- [ ] Every architecture interface is covered by at least one task
- [ ] No circular dependencies in the task graph
- [ ] Cross-component dependencies are explicitly noted in affected task specs
- [ ] `_dependencies_table.md` contains every task with correct dependencies
**Save action**: Write `_dependencies_table.md`
**BLOCKING**: Present dependency summary to user. Do NOT proceed until user confirms.
---
## Common Mistakes
- **Coding during decomposition**: this workflow produces specs, never code
- **Over-splitting**: don't create many tasks if the component is simple — 1 task is fine
- **Tasks exceeding 5 points**: split them; no task should be too complex for a single implementer
- **Cross-component tasks**: each task belongs to exactly one component
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
- **Creating git branches**: branch creation is an implementation concern, not a decomposition one
- **Creating component subdirectories**: all tasks go flat in TASKS_DIR
- **Forgetting Jira**: every task must have a Jira ticket created inline — do not defer to a separate step
- **Forgetting to rename**: after Jira ticket creation, always rename the file from numeric prefix to Jira ID prefix
## Escalation Rules
| Situation | Action |
|-----------|--------|
| Ambiguous component boundaries | ASK user |
| Task complexity exceeds 5 points after splitting | ASK user |
| Missing component specs in PLANS_DIR | ASK user |
| Cross-component dependency conflict | ASK user |
| Jira epic not found for a component | ASK user for Epic ID |
| Task naming | PROCEED, confirm at next BLOCKING gate |
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Task Decomposition (4-Step Method) │
├────────────────────────────────────────────────────────────────┤
│ CONTEXT: Resolve mode (default / single component) │
│ 1. Bootstrap Structure → [JIRA-ID]_initial_structure.md │
│ [BLOCKING: user confirms structure] │
│ 2. Component Tasks → [JIRA-ID]_[short_name].md each │
│ 3. Integration Tests → [JIRA-ID]_[short_name].md each │
│ 4. Cross-Verification → _dependencies_table.md │
│ [BLOCKING: user confirms dependencies] │
├────────────────────────────────────────────────────────────────┤
│ Principles: Atomic tasks · Behavioral specs · Flat structure │
│ Jira inline · Rename to Jira ID · Save now · Ask don't assume│
└────────────────────────────────────────────────────────────────┘
```
@@ -0,0 +1,31 @@
# Dependencies Table Template
Use this template after cross-task verification. Save as `TASKS_DIR/_dependencies_table.md`.
---
```markdown
# Dependencies Table
**Date**: [YYYY-MM-DD]
**Total Tasks**: [N]
**Total Complexity Points**: [N]
| Task | Name | Complexity | Dependencies | Epic |
|------|------|-----------|-------------|------|
| [JIRA-ID] | initial_structure | [points] | None | [EPIC-ID] |
| [JIRA-ID] | [short_name] | [points] | [JIRA-ID] | [EPIC-ID] |
| [JIRA-ID] | [short_name] | [points] | [JIRA-ID] | [EPIC-ID] |
| [JIRA-ID] | [short_name] | [points] | [JIRA-ID], [JIRA-ID] | [EPIC-ID] |
| ... | ... | ... | ... | ... |
```
---
## Guidelines
- Every task from TASKS_DIR must appear in this table
- Dependencies column lists Jira IDs (e.g., "AZ-43, AZ-44") or "None"
- No circular dependencies allowed
- Tasks should be listed in recommended execution order
- The `/implement` skill reads this table to compute parallel batches
@@ -0,0 +1,135 @@
# Initial Structure Task Template
Use this template for the bootstrap structure plan. Save as `TASKS_DIR/01_initial_structure.md` initially, then rename to `TASKS_DIR/[JIRA-ID]_initial_structure.md` after Jira ticket creation.
---
```markdown
# Initial Project Structure
**Task**: [JIRA-ID]_initial_structure
**Name**: Initial Structure
**Description**: Scaffold the project skeleton — folders, shared models, interfaces, stubs, CI/CD, DB migrations, test structure
**Complexity**: [3|5] points
**Dependencies**: None
**Component**: Bootstrap
**Jira**: [TASK-ID]
**Epic**: [EPIC-ID]
## Project Folder Layout
```
project-root/
├── [folder structure based on tech stack and components]
└── ...
```
### Layout Rationale
[Brief explanation of why this structure was chosen — language conventions, framework patterns, etc.]
## DTOs and Interfaces
### Shared DTOs
| DTO Name | Used By Components | Fields Summary |
|----------|-------------------|---------------|
| [name] | [component list] | [key fields] |
### Component Interfaces
| Component | Interface | Methods | Exposed To |
|-----------|-----------|---------|-----------|
| [name] | [InterfaceName] | [method list] | [consumers] |
## CI/CD Pipeline
| Stage | Purpose | Trigger |
|-------|---------|---------|
| Build | Compile/bundle the application | Every push |
| Lint / Static Analysis | Code quality and style checks | Every push |
| Unit Tests | Run unit test suite | Every push |
| Integration Tests | Run integration test suite | Every push |
| Security Scan | SAST / dependency check | Every push |
| Deploy to Staging | Deploy to staging environment | Merge to staging branch |
### Pipeline Configuration Notes
[Framework-specific notes: CI tool, runners, caching, parallelism, etc.]
## Environment Strategy
| Environment | Purpose | Configuration Notes |
|-------------|---------|-------------------|
| Development | Local development | [local DB, mock services, debug flags] |
| Staging | Pre-production testing | [staging DB, staging services, production-like config] |
| Production | Live system | [production DB, real services, optimized config] |
### Environment Variables
| Variable | Dev | Staging | Production | Description |
|----------|-----|---------|------------|-------------|
| [VAR_NAME] | [value/source] | [value/source] | [value/source] | [purpose] |
## Database Migration Approach
**Migration tool**: [tool name]
**Strategy**: [migration strategy — e.g., versioned scripts, ORM migrations]
### Initial Schema
[Key tables/collections that need to be created, referencing component data access patterns]
## Test Structure
```
tests/
├── unit/
│ ├── [component_1]/
│ ├── [component_2]/
│ └── ...
├── integration/
│ ├── test_data/
│ └── [test files]
└── ...
```
### Test Configuration Notes
[Test runner, fixtures, test data management, isolation strategy]
## Implementation Order
| Order | Component | Reason |
|-------|-----------|--------|
| 1 | [name] | [why first — foundational, no dependencies] |
| 2 | [name] | [depends on #1] |
| ... | ... | ... |
## Acceptance Criteria
**AC-1: Project scaffolded**
Given the structure plan above
When the implementer executes this task
Then all folders, stubs, and configuration files exist
**AC-2: Tests runnable**
Given the scaffolded project
When the test suite is executed
Then all stub tests pass (even if they only assert true)
**AC-3: CI/CD configured**
Given the scaffolded project
When CI pipeline runs
Then build, lint, and test stages complete successfully
```
---
## Guidance Notes
- This is a PLAN document, not code. The `/implement` skill executes it.
- Focus on structure and organization decisions, not implementation details.
- Reference component specs for interface and DTO details — don't repeat everything.
- The folder layout should follow conventions of the identified tech stack.
- Environment strategy should account for secrets management and configuration.
+113
View File
@@ -0,0 +1,113 @@
# Task Specification Template
Create a focused behavioral specification that describes **what** the system should do, not **how** it should be built.
Save as `TASKS_DIR/[##]_[short_name].md` initially, then rename to `TASKS_DIR/[JIRA-ID]_[short_name].md` after Jira ticket creation.
---
```markdown
# [Feature Name]
**Task**: [JIRA-ID]_[short_name]
**Name**: [short human name]
**Description**: [one-line description of what this task delivers]
**Complexity**: [1|2|3|5] points
**Dependencies**: [AZ-43_shared_models, AZ-44_db_migrations] or "None"
**Component**: [component name for context]
**Jira**: [TASK-ID]
**Epic**: [EPIC-ID]
## Problem
Clear, concise statement of the problem users are facing.
## Outcome
- Measurable or observable goal 1
- Measurable or observable goal 2
- ...
## Scope
### Included
- What's in scope for this task
### Excluded
- Explicitly what's NOT in scope
## Acceptance Criteria
**AC-1: [Title]**
Given [precondition]
When [action]
Then [expected result]
**AC-2: [Title]**
Given [precondition]
When [action]
Then [expected result]
## Non-Functional Requirements
**Performance**
- [requirement if relevant]
**Compatibility**
- [requirement if relevant]
**Reliability**
- [requirement if relevant]
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | [test subject] | [expected result] |
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | [setup] | [test subject] | [expected behavior] | [NFR if any] |
## Constraints
- [Architectural pattern constraint if critical]
- [Technical limitation]
- [Integration requirement]
## Risks & Mitigation
**Risk 1: [Title]**
- *Risk*: [Description]
- *Mitigation*: [Approach]
```
---
## Complexity Points Guide
- 1 point: Trivial, self-contained, no dependencies
- 2 points: Non-trivial, low complexity, minimal coordination
- 3 points: Multi-step, moderate complexity, potential alignment needed
- 5 points: Difficult, interconnected logic, medium-high risk
- 8 points: Too complex — split into smaller tasks
## Output Guidelines
**DO:**
- Focus on behavior and user experience
- Use clear, simple language
- Keep acceptance criteria testable (Gherkin format)
- Include realistic scope boundaries
- Write from the user's perspective
- Include complexity estimation
- Reference dependencies by Jira ID (e.g., AZ-43_shared_models)
**DON'T:**
- Include implementation details (file paths, classes, methods)
- Prescribe technical solutions or libraries
- Add architectural diagrams or code examples
- Specify exact API endpoints or data structures
- Include step-by-step implementation instructions
- Add "how to build" guidance
+491
View File
@@ -0,0 +1,491 @@
---
name: deploy
description: |
Comprehensive deployment skill covering status check, env setup, containerization, CI/CD pipeline, environment strategy, observability, deployment procedures, and deployment scripts.
7-step workflow: Status & env check, Docker containerization, CI/CD pipeline definition, environment strategy, observability planning, deployment procedures, deployment scripts.
Uses _docs/04_deploy/ structure.
Trigger phrases:
- "deploy", "deployment", "deployment strategy"
- "CI/CD", "pipeline", "containerize"
- "observability", "monitoring", "logging"
- "dockerize", "docker compose"
category: ship
tags: [deployment, docker, ci-cd, observability, monitoring, containerization, scripts]
disable-model-invocation: true
---
# Deployment Planning
Plan and document the full deployment lifecycle: check deployment status and environment requirements, containerize the application, define CI/CD pipelines, configure environments, set up observability, document deployment procedures, and generate deployment scripts.
## Core Principles
- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker
- **Infrastructure as code**: all deployment configuration is version-controlled
- **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
- **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
- **Ask, don't assume**: when infrastructure constraints or preferences are unclear, ask the user
- **Plan, don't code**: this workflow produces deployment documents and specifications, not implementation code (except deployment scripts in Step 7)
## Context Resolution
Fixed paths:
- PLANS_DIR: `_docs/02_plans/`
- DEPLOY_DIR: `_docs/04_deploy/`
- REPORTS_DIR: `_docs/04_deploy/reports/`
- SCRIPTS_DIR: `scripts/`
- ARCHITECTURE: `_docs/02_plans/architecture.md`
- COMPONENTS_DIR: `_docs/02_plans/components/`
Announce the resolved paths to the user before proceeding.
## Input Specification
### Required Files
| File | Purpose |
|------|---------|
| `_docs/00_problem/problem.md` | Problem description and context |
| `_docs/00_problem/restrictions.md` | Constraints and limitations |
| `_docs/01_solution/solution.md` | Finalized solution |
| `PLANS_DIR/architecture.md` | Architecture from plan skill |
| `PLANS_DIR/components/` | Component specs |
### Prerequisite Checks (BLOCKING)
1. `architecture.md` exists — **STOP if missing**, run `/plan` first
2. At least one component spec exists in `PLANS_DIR/components/`**STOP if missing**
3. Create DEPLOY_DIR, REPORTS_DIR, and SCRIPTS_DIR if they do not exist
4. If DEPLOY_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
## Artifact Management
### Directory Structure
```
DEPLOY_DIR/
├── containerization.md
├── ci_cd_pipeline.md
├── environment_strategy.md
├── observability.md
├── deployment_procedures.md
├── deploy_scripts.md
└── reports/
└── deploy_status_report.md
SCRIPTS_DIR/ (project root)
├── deploy.sh
├── pull-images.sh
├── start-services.sh
├── stop-services.sh
└── health-check.sh
.env (project root, git-ignored)
.env.example (project root, committed)
```
### Save Timing
| Step | Save immediately after | Filename |
|------|------------------------|----------|
| Step 1 | Status check & env setup complete | `reports/deploy_status_report.md` + `.env` + `.env.example` |
| Step 2 | Containerization plan complete | `containerization.md` |
| Step 3 | CI/CD pipeline defined | `ci_cd_pipeline.md` |
| Step 4 | Environment strategy documented | `environment_strategy.md` |
| Step 5 | Observability plan complete | `observability.md` |
| Step 6 | Deployment procedures documented | `deployment_procedures.md` |
| Step 7 | Deployment scripts created | `deploy_scripts.md` + scripts in `SCRIPTS_DIR/` |
### Resumability
If DEPLOY_DIR already contains artifacts:
1. List existing files and match to the save timing table
2. Identify the last completed step
3. Resume from the next incomplete step
4. Inform the user which steps are being skipped
## Progress Tracking
At the start of execution, create a TodoWrite with all steps (1 through 7). Update status as each step completes.
## Workflow
### Step 1: Deployment Status & Environment Setup
**Role**: DevOps / Platform engineer
**Goal**: Assess current deployment readiness, identify all required environment variables, and create `.env` files
**Constraints**: Must complete before any other step
1. Read architecture.md, all component specs, and restrictions.md
2. Assess deployment readiness:
- List all components and their current state (planned / implemented / tested)
- Identify external dependencies (databases, APIs, message queues, cloud services)
- Identify infrastructure prerequisites (container registry, cloud accounts, DNS, SSL certificates)
- Check if any deployment blockers exist
3. Identify all required environment variables by scanning:
- Component specs for configuration needs
- Database connection requirements
- External API endpoints and credentials
- Feature flags and runtime configuration
- Container registry credentials
- Cloud provider credentials
- Monitoring/logging service endpoints
4. Generate `.env.example` in project root with all variables and placeholder values (committed to VCS)
5. Generate `.env` in project root with development defaults filled in where safe (git-ignored)
6. Ensure `.gitignore` includes `.env` (but NOT `.env.example`)
7. Produce a deployment status report summarizing readiness, blockers, and required setup
**Self-verification**:
- [ ] All components assessed for deployment readiness
- [ ] External dependencies catalogued
- [ ] Infrastructure prerequisites identified
- [ ] All required environment variables discovered
- [ ] `.env.example` created with placeholder values
- [ ] `.env` created with safe development defaults
- [ ] `.gitignore` updated to exclude `.env`
- [ ] Status report written to `reports/deploy_status_report.md`
**Save action**: Write `reports/deploy_status_report.md` using `templates/deploy_status_report.md`, create `.env` and `.env.example` in project root
**BLOCKING**: Present status report and environment variables to user. Do NOT proceed until confirmed.
---
### Step 2: Containerization
**Role**: DevOps / Platform engineer
**Goal**: Define Docker configuration for every component, local development, and integration test environments
**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
1. Read architecture.md and all component specs
2. Read restrictions.md for infrastructure constraints
3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization)
4. For each component, define:
- Base image (pinned version, prefer alpine/distroless for production)
- Build stages (dependency install, build, production)
- Non-root user configuration
- Health check endpoint and command
- Exposed ports
- `.dockerignore` contents
5. Define `docker-compose.yml` for local development:
- All application components
- Database (Postgres) with named volume
- Any message queues, caches, or external service mocks
- Shared network
- Environment variable files (`.env`)
6. Define `docker-compose.test.yml` for integration tests:
- Application components under test
- Test runner container (black-box, no internal imports)
- Isolated database with seed data
- All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only
**Self-verification**:
- [ ] Every component has a Dockerfile specification
- [ ] Multi-stage builds specified for all production images
- [ ] Non-root user for all containers
- [ ] Health checks defined for every service
- [ ] docker-compose.yml covers all components + dependencies
- [ ] docker-compose.test.yml enables black-box integration testing
- [ ] `.dockerignore` defined
**Save action**: Write `containerization.md` using `templates/containerization.md`
**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed.
---
### Step 3: CI/CD Pipeline
**Role**: DevOps engineer
**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment
**Constraints**: Pipeline definition only — produce YAML specification, not implementation
1. Read architecture.md for tech stack and deployment targets
2. Read restrictions.md for CI/CD constraints (cloud provider, registry, etc.)
3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines)
4. Define pipeline stages:
| Stage | Trigger | Steps | Quality Gate |
|-------|---------|-------|-------------|
| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage |
| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
| **Push** | After build | Push to container registry | Push succeeds |
| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass |
| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass |
| **Deploy Production** | Manual approval | Deploy to production | Health checks pass |
5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches
6. Define parallelization: which stages can run concurrently
7. Define notifications: build failures, deployment status, security alerts
**Self-verification**:
- [ ] All pipeline stages defined with triggers and gates
- [ ] Coverage threshold enforced (75%+)
- [ ] Security scanning included (dependencies + images + SAST)
- [ ] Caching configured for dependencies and Docker layers
- [ ] Multi-environment deployment (staging → production)
- [ ] Rollback procedure referenced
- [ ] Notifications configured
**Save action**: Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md`
---
### Step 4: Environment Strategy
**Role**: Platform engineer
**Goal**: Define environment configuration, secrets management, and environment parity
**Constraints**: Strategy document — no secrets or credentials in output
1. Define environments:
| Environment | Purpose | Infrastructure | Data |
|-------------|---------|---------------|------|
| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs |
| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data |
| **Production** | Live system | Full infrastructure | Real data |
2. Define environment variable management:
- Reference `.env.example` created in Step 1
- Per-environment variable sources (`.env` for dev, secret manager for staging/prod)
- Validation: fail fast on missing required variables at startup
3. Define secrets management:
- Never commit secrets to version control
- Development: `.env` files (git-ignored)
- Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault)
- Rotation policy
4. Define database management per environment:
- Development: Docker Postgres with named volume, seed data
- Staging: managed Postgres, migrations applied via CI/CD
- Production: managed Postgres, migrations require approval
**Self-verification**:
- [ ] All three environments defined with clear purpose
- [ ] Environment variable documentation complete (references `.env.example` from Step 1)
- [ ] No secrets in any output document
- [ ] Secret manager specified for staging/production
- [ ] Database strategy per environment
**Save action**: Write `environment_strategy.md` using `templates/environment_strategy.md`
---
### Step 5: Observability
**Role**: Site Reliability Engineer (SRE)
**Goal**: Define logging, metrics, tracing, and alerting strategy
**Constraints**: Strategy document — describe what to implement, not how to wire it
1. Read architecture.md and component specs for service boundaries
2. Research observability best practices for the tech stack
**Logging**:
- Structured JSON to stdout/stderr (no file logging in containers)
- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context`
- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
- No PII in logs
- Retention: dev = console, staging = 7 days, production = 30 days
**Metrics**:
- Expose Prometheus-compatible `/metrics` endpoint per service
- System metrics: CPU, memory, disk, network
- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections`
- Business metrics: derived from acceptance criteria
- Collection interval: 15s
**Distributed Tracing**:
- OpenTelemetry SDK integration
- Trace context propagation via HTTP headers and message queue metadata
- Span naming: `<service>.<operation>`
- Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
**Alerting**:
| Severity | Response Time | Condition Examples |
|----------|---------------|-------------------|
| Critical | 5 min | Service down, data loss, health check failed |
| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
| Medium | 4 hours | Disk > 80%, elevated latency |
| Low | Next business day | Non-critical warnings |
**Dashboards**:
- Operations: service health, request rate, error rate, response time percentiles, resource utilization
- Business: key business metrics from acceptance criteria
**Self-verification**:
- [ ] Structured logging format defined with required fields
- [ ] Metrics endpoint specified per service
- [ ] OpenTelemetry tracing configured
- [ ] Alert severities with response times defined
- [ ] Dashboards cover operations and business metrics
- [ ] PII exclusion from logs addressed
**Save action**: Write `observability.md` using `templates/observability.md`
---
### Step 6: Deployment Procedures
**Role**: DevOps / Platform engineer
**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist
**Constraints**: Procedures document — no implementation
1. Define deployment strategy:
- Preferred pattern: blue-green / rolling / canary (choose based on architecture)
- Zero-downtime requirement for production
- Graceful shutdown: 30-second grace period for in-flight requests
- Database migration ordering: migrate before deploy, backward-compatible only
2. Define health checks:
| Check | Type | Endpoint | Interval | Threshold |
|-------|------|----------|----------|-----------|
| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart |
| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB |
| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max |
3. Define rollback procedures:
- Trigger criteria: health check failures, error rate spike, critical alert
- Rollback steps: redeploy previous image tag, verify health, rollback database if needed
- Communication: notify stakeholders during rollback
- Post-mortem: required after every production rollback
4. Define deployment checklist:
- [ ] All tests pass in CI
- [ ] Security scan clean (zero critical/high CVEs)
- [ ] Database migrations reviewed and tested
- [ ] Environment variables configured
- [ ] Health check endpoints responding
- [ ] Monitoring alerts configured
- [ ] Rollback plan documented and tested
- [ ] Stakeholders notified
**Self-verification**:
- [ ] Deployment strategy chosen and justified
- [ ] Zero-downtime approach specified
- [ ] Health checks defined (liveness, readiness, startup)
- [ ] Rollback trigger criteria and steps documented
- [ ] Deployment checklist complete
**Save action**: Write `deployment_procedures.md` using `templates/deployment_procedures.md`
**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed.
---
### Step 7: Deployment Scripts
**Role**: DevOps / Platform engineer
**Goal**: Create executable deployment scripts for pulling Docker images and running services on the remote target machine
**Constraints**: Produce real, executable shell scripts. This is the ONLY step that creates implementation artifacts.
1. Read containerization.md and deployment_procedures.md from previous steps
2. Read `.env.example` for required variables
3. Create the following scripts in `SCRIPTS_DIR/`:
**`deploy.sh`** — Main deployment orchestrator:
- Validates that required environment variables are set (sources `.env` if present)
- Calls `pull-images.sh`, then `stop-services.sh`, then `start-services.sh`, then `health-check.sh`
- Exits with non-zero code on any failure
- Supports `--rollback` flag to redeploy previous image tags
**`pull-images.sh`** — Pull Docker images to target machine:
- Reads image list and tags from environment or config
- Authenticates with container registry
- Pulls all required images
- Verifies image integrity (digest check)
**`start-services.sh`** — Start services on target machine:
- Runs `docker compose up -d` or individual `docker run` commands
- Applies environment variables from `.env`
- Configures networks and volumes
- Waits for containers to reach healthy state
**`stop-services.sh`** — Graceful shutdown:
- Stops services with graceful shutdown period
- Saves current image tags for rollback reference
- Cleans up orphaned containers/networks
**`health-check.sh`** — Verify deployment health:
- Checks all health endpoints
- Reports status per service
- Returns non-zero if any service is unhealthy
4. All scripts must:
- Be POSIX-compatible (#!/bin/bash with set -euo pipefail)
- Source `.env` from project root or accept env vars from the environment
- Include usage/help output (`--help` flag)
- Be idempotent where possible
- Handle SSH connection to remote target (configurable via `DEPLOY_HOST` env var)
5. Document all scripts in `deploy_scripts.md`
**Self-verification**:
- [ ] All five scripts created and executable
- [ ] Scripts source environment variables correctly
- [ ] `deploy.sh` orchestrates the full flow
- [ ] `pull-images.sh` handles registry auth and image pull
- [ ] `start-services.sh` starts containers with correct config
- [ ] `stop-services.sh` handles graceful shutdown
- [ ] `health-check.sh` validates all endpoints
- [ ] Rollback supported via `deploy.sh --rollback`
- [ ] Scripts work for remote deployment via SSH (DEPLOY_HOST)
- [ ] `deploy_scripts.md` documents all scripts
**Save action**: Write scripts to `SCRIPTS_DIR/`, write `deploy_scripts.md` using `templates/deploy_scripts.md`
---
## Escalation Rules
| Situation | Action |
|-----------|--------|
| Unknown cloud provider or hosting | **ASK user** |
| Container registry not specified | **ASK user** |
| CI/CD platform preference unclear | **ASK user** — default to GitHub Actions |
| Secret manager not chosen | **ASK user** |
| Deployment pattern trade-offs | **ASK user** with recommendation |
| Missing architecture.md | **STOP** — run `/plan` first |
| Remote target machine details unknown | **ASK user** for SSH access, OS, and specs |
## Common Mistakes
- **Implementing during planning**: Steps 16 produce documents, not code (Step 7 is the exception — it creates scripts)
- **Hardcoding secrets**: never include real credentials in deployment documents or scripts
- **Ignoring integration test containerization**: the test environment must be containerized alongside the app
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
- **Using `:latest` tags**: always pin base image versions
- **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
- **Committing `.env`**: only `.env.example` goes to version control; `.env` must be in `.gitignore`
- **Non-portable scripts**: deployment scripts must work across environments; avoid hardcoded paths
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Deployment Planning (7-Step Method) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ: architecture.md + component specs exist │
│ │
│ 1. Status & Env → reports/deploy_status_report.md │
│ + .env + .env.example │
│ [BLOCKING: user confirms status & env vars] │
│ 2. Containerization → containerization.md │
│ [BLOCKING: user confirms Docker plan] │
│ 3. CI/CD Pipeline → ci_cd_pipeline.md │
│ 4. Environment → environment_strategy.md │
│ 5. Observability → observability.md │
│ 6. Procedures → deployment_procedures.md │
│ [BLOCKING: user confirms deployment plan] │
│ 7. Scripts → deploy_scripts.md + scripts/ │
├────────────────────────────────────────────────────────────────┤
│ Principles: Docker-first · IaC · Observability built-in │
│ Environment parity · Save immediately │
└────────────────────────────────────────────────────────────────┘
```
@@ -0,0 +1,87 @@
# CI/CD Pipeline Template
Save as `_docs/04_deploy/ci_cd_pipeline.md`.
---
```markdown
# [System Name] — CI/CD Pipeline
## Pipeline Overview
| Stage | Trigger | Quality Gate |
|-------|---------|-------------|
| Lint | Every push | Zero lint errors |
| Test | Every push | 75%+ coverage, all tests pass |
| Security | Every push | Zero critical/high CVEs |
| Build | PR merge to dev | Docker build succeeds |
| Push | After build | Images pushed to registry |
| Deploy Staging | After push | Health checks pass |
| Smoke Tests | After staging deploy | Critical paths pass |
| Deploy Production | Manual approval | Health checks pass |
## Stage Details
### Lint
- [Language-specific linters and formatters]
- Runs in parallel per language
### Test
- Unit tests: [framework and command]
- Integration tests: [framework and command, uses docker-compose.test.yml]
- Coverage threshold: 75% overall, 90% critical paths
- Coverage report published as pipeline artifact
### Security
- Dependency audit: [tool, e.g., npm audit / pip-audit / dotnet list package --vulnerable]
- SAST scan: [tool, e.g., Semgrep / SonarQube]
- Image scan: Trivy on built Docker images
- Block on: critical or high severity findings
### Build
- Docker images built using multi-stage Dockerfiles
- Tagged with git SHA: `<registry>/<component>:<sha>`
- Build cache: Docker layer cache via CI cache action
### Push
- Registry: [container registry URL]
- Authentication: [method]
### Deploy Staging
- Deployment method: [docker compose / Kubernetes / cloud service]
- Pre-deploy: run database migrations
- Post-deploy: verify health check endpoints
- Automated rollback on health check failure
### Smoke Tests
- Subset of integration tests targeting staging environment
- Validates critical user flows
- Timeout: [maximum duration]
### Deploy Production
- Requires manual approval via [mechanism]
- Deployment strategy: [blue-green / rolling / canary]
- Pre-deploy: database migration review
- Post-deploy: health checks + monitoring for 15 min
## Caching Strategy
| Cache | Key | Restore Keys |
|-------|-----|-------------|
| Dependencies | [lockfile hash] | [partial match] |
| Docker layers | [Dockerfile hash] | [partial match] |
| Build artifacts | [source hash] | [partial match] |
## Parallelization
[Diagram or description of which stages run concurrently]
## Notifications
| Event | Channel | Recipients |
|-------|---------|-----------|
| Build failure | [Slack/email] | [team] |
| Security alert | [Slack/email] | [team + security] |
| Deploy success | [Slack] | [team] |
| Deploy failure | [Slack/email + PagerDuty] | [on-call] |
```
@@ -0,0 +1,94 @@
# Containerization Plan Template
Save as `_docs/04_deploy/containerization.md`.
---
```markdown
# [System Name] — Containerization
## Component Dockerfiles
### [Component Name]
| Property | Value |
|----------|-------|
| Base image | [e.g., mcr.microsoft.com/dotnet/aspnet:8.0-alpine] |
| Build image | [e.g., mcr.microsoft.com/dotnet/sdk:8.0-alpine] |
| Stages | [dependency install → build → production] |
| User | [non-root user name] |
| Health check | [endpoint and command] |
| Exposed ports | [port list] |
| Key build args | [if any] |
### [Repeat for each component]
## Docker Compose — Local Development
```yaml
# docker-compose.yml structure
services:
[component]:
build: ./[path]
ports: ["host:container"]
environment: [reference .env.dev]
depends_on: [dependencies with health condition]
healthcheck: [command, interval, timeout, retries]
db:
image: [postgres:version-alpine]
volumes: [named volume]
environment: [credentials from .env.dev]
healthcheck: [pg_isready]
volumes:
[named volumes]
networks:
[shared network]
```
## Docker Compose — Integration Tests
```yaml
# docker-compose.test.yml structure
services:
[app components under test]
test-runner:
build: ./tests/integration
depends_on: [app components with health condition]
environment: [test configuration]
# Exit code determines test pass/fail
db:
image: [postgres:version-alpine]
volumes: [seed data mount]
```
Run: `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
## Image Tagging Strategy
| Context | Tag Format | Example |
|---------|-----------|---------|
| CI build | `<registry>/<project>/<component>:<git-sha>` | `ghcr.io/org/api:a1b2c3d` |
| Release | `<registry>/<project>/<component>:<semver>` | `ghcr.io/org/api:1.2.0` |
| Local dev | `<component>:latest` | `api:latest` |
## .dockerignore
```
.git
.cursor
_docs
_standalone
node_modules
**/bin
**/obj
**/__pycache__
*.md
.env*
docker-compose*.yml
```
```
@@ -0,0 +1,73 @@
# Deployment Status Report Template
Save as `_docs/04_deploy/reports/deploy_status_report.md`.
---
```markdown
# [System Name] — Deployment Status Report
## Deployment Readiness Summary
| Aspect | Status | Notes |
|--------|--------|-------|
| Architecture defined | ✅ / ❌ | |
| Component specs complete | ✅ / ❌ | |
| Infrastructure prerequisites met | ✅ / ❌ | |
| External dependencies identified | ✅ / ❌ | |
| Blockers | [count] | [summary] |
## Component Status
| Component | State | Docker-ready | Notes |
|-----------|-------|-------------|-------|
| [Component 1] | planned / implemented / tested | yes / no | |
| [Component 2] | planned / implemented / tested | yes / no | |
## External Dependencies
| Dependency | Type | Required For | Status |
|------------|------|-------------|--------|
| [e.g., PostgreSQL] | Database | Data persistence | [available / needs setup] |
| [e.g., Redis] | Cache | Session management | [available / needs setup] |
| [e.g., External API] | API | [purpose] | [available / needs setup] |
## Infrastructure Prerequisites
| Prerequisite | Status | Action Needed |
|-------------|--------|--------------|
| Container registry | [ready / not set up] | [action] |
| Cloud account | [ready / not set up] | [action] |
| DNS configuration | [ready / not set up] | [action] |
| SSL certificates | [ready / not set up] | [action] |
| CI/CD platform | [ready / not set up] | [action] |
| Secret manager | [ready / not set up] | [action] |
## Deployment Blockers
| Blocker | Severity | Resolution |
|---------|----------|-----------|
| [blocker description] | critical / high / medium | [resolution steps] |
## Required Environment Variables
| Variable | Purpose | Required In | Default (Dev) | Source (Staging/Prod) |
|----------|---------|------------|---------------|----------------------|
| `DATABASE_URL` | Postgres connection string | All components | `postgres://dev:dev@db:5432/app` | Secret manager |
| `DEPLOY_HOST` | Remote target machine | Deployment scripts | `localhost` | Environment |
| `REGISTRY_URL` | Container registry URL | CI/CD, deploy scripts | `localhost:5000` | Environment |
| `REGISTRY_USER` | Registry username | CI/CD, deploy scripts | — | Secret manager |
| `REGISTRY_PASS` | Registry password | CI/CD, deploy scripts | — | Secret manager |
| [add all required variables] | | | | |
## .env Files Created
- `.env.example` — committed to VCS, contains all variable names with placeholder values
- `.env` — git-ignored, contains development defaults
## Next Steps
1. [Resolve any blockers listed above]
2. [Set up missing infrastructure prerequisites]
3. [Proceed to containerization planning]
```
@@ -0,0 +1,103 @@
# Deployment Procedures Template
Save as `_docs/04_deploy/deployment_procedures.md`.
---
```markdown
# [System Name] — Deployment Procedures
## Deployment Strategy
**Pattern**: [blue-green / rolling / canary]
**Rationale**: [why this pattern fits the architecture]
**Zero-downtime**: required for production deployments
### Graceful Shutdown
- Grace period: 30 seconds for in-flight requests
- Sequence: stop accepting new requests → drain connections → shutdown
- Container orchestrator: `terminationGracePeriodSeconds: 40`
### Database Migration Ordering
- Migrations run **before** new code deploys
- All migrations must be backward-compatible (old code works with new schema)
- Irreversible migrations require explicit approval
## Health Checks
| Check | Type | Endpoint | Interval | Failure Threshold | Action |
|-------|------|----------|----------|-------------------|--------|
| Liveness | HTTP GET | `/health/live` | 10s | 3 failures | Restart container |
| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures | Remove from load balancer |
| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts | Kill and recreate |
### Health Check Responses
- `/health/live`: returns 200 if process is running (no dependency checks)
- `/health/ready`: returns 200 if all dependencies (DB, cache, queues) are reachable
## Staging Deployment
1. CI/CD builds and pushes Docker images tagged with git SHA
2. Run database migrations against staging
3. Deploy new images to staging environment
4. Wait for health checks to pass (readiness probe)
5. Run smoke tests against staging
6. If smoke tests fail: automatic rollback to previous image
## Production Deployment
1. **Approval**: manual approval required via [mechanism]
2. **Pre-deploy checks**:
- [ ] Staging smoke tests passed
- [ ] Security scan clean
- [ ] Database migration reviewed
- [ ] Monitoring alerts configured
- [ ] Rollback plan confirmed
3. **Deploy**: apply deployment strategy (blue-green / rolling / canary)
4. **Verify**: health checks pass, error rate stable, latency within baseline
5. **Monitor**: observe dashboards for 15 minutes post-deploy
6. **Finalize**: mark deployment as successful or trigger rollback
## Rollback Procedures
### Trigger Criteria
- Health check failures persist after deploy
- Error rate exceeds 5% for more than 5 minutes
- Critical alert fires within 15 minutes of deploy
- Manual decision by on-call engineer
### Rollback Steps
1. Redeploy previous Docker image tag (from CI/CD artifact)
2. Verify health checks pass
3. If database migration was applied:
- Run DOWN migration if reversible
- If irreversible: assess data impact, escalate if needed
4. Notify stakeholders
5. Schedule post-mortem within 24 hours
### Post-Mortem
Required after every production rollback:
- Timeline of events
- Root cause
- What went wrong
- Prevention measures
## Deployment Checklist
- [ ] All tests pass in CI
- [ ] Security scan clean (zero critical/high CVEs)
- [ ] Docker images built and pushed
- [ ] Database migrations reviewed and tested
- [ ] Environment variables configured for target environment
- [ ] Health check endpoints verified
- [ ] Monitoring alerts configured
- [ ] Rollback plan documented and tested
- [ ] Stakeholders notified of deployment window
- [ ] On-call engineer available during deployment
```
@@ -0,0 +1,61 @@
# Environment Strategy Template
Save as `_docs/04_deploy/environment_strategy.md`.
---
```markdown
# [System Name] — Environment Strategy
## Environments
| Environment | Purpose | Infrastructure | Data Source |
|-------------|---------|---------------|-------------|
| Development | Local developer workflow | docker-compose | Seed data, mocked externals |
| Staging | Pre-production validation | [mirrors production] | Anonymized production-like data |
| Production | Live system | [full infrastructure] | Real data |
## Environment Variables
### Required Variables
| Variable | Purpose | Dev Default | Staging/Prod Source |
|----------|---------|-------------|-------------------|
| `DATABASE_URL` | Postgres connection | `postgres://dev:dev@db:5432/app` | Secret manager |
| [add all required variables] | | | |
### `.env.example`
```env
# Copy to .env and fill in values
DATABASE_URL=postgres://user:pass@host:5432/dbname
# [all required variables with placeholder values]
```
### Variable Validation
All services validate required environment variables at startup and fail fast with a clear error message if any are missing.
## Secrets Management
| Environment | Method | Tool |
|-------------|--------|------|
| Development | `.env` file (git-ignored) | dotenv |
| Staging | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
| Production | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
Rotation policy: [frequency and procedure]
## Database Management
| Environment | Type | Migrations | Data |
|-------------|------|-----------|------|
| Development | Docker Postgres, named volume | Applied on container start | Seed data via init script |
| Staging | Managed Postgres | Applied via CI/CD pipeline | Anonymized production snapshot |
| Production | Managed Postgres | Applied via CI/CD with approval | Live data |
Migration rules:
- All migrations must be backward-compatible (support old and new code simultaneously)
- Reversible migrations required (DOWN/rollback script)
- Production migrations require review before apply
```
@@ -0,0 +1,132 @@
# Observability Template
Save as `_docs/04_deploy/observability.md`.
---
```markdown
# [System Name] — Observability
## Logging
### Format
Structured JSON to stdout/stderr. No file-based logging in containers.
```json
{
"timestamp": "ISO8601",
"level": "INFO",
"service": "service-name",
"correlation_id": "uuid",
"message": "Event description",
"context": {}
}
```
### Log Levels
| Level | Usage | Example |
|-------|-------|---------|
| ERROR | Exceptions, failures requiring attention | Database connection failed |
| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
| INFO | Significant business events | User registered, Order placed |
| DEBUG | Detailed diagnostics (dev/staging only) | Request payload, Query params |
### Retention
| Environment | Destination | Retention |
|-------------|-------------|-----------|
| Development | Console | Session |
| Staging | [log aggregator] | 7 days |
| Production | [log aggregator] | 30 days |
### PII Rules
- Never log passwords, tokens, or session IDs
- Mask email addresses and personal identifiers
- Log user IDs (opaque) instead of usernames
## Metrics
### Endpoints
Every service exposes Prometheus-compatible metrics at `/metrics`.
### Application Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `request_count` | Counter | Total HTTP requests by method, path, status |
| `request_duration_seconds` | Histogram | Response time by method, path |
| `error_count` | Counter | Failed requests by type |
| `active_connections` | Gauge | Current open connections |
### System Metrics
- CPU usage, Memory usage, Disk I/O, Network I/O
### Business Metrics
| Metric | Type | Description | Source |
|--------|------|-------------|--------|
| [from acceptance criteria] | | | |
Collection interval: 15 seconds
## Distributed Tracing
### Configuration
- SDK: OpenTelemetry
- Propagation: W3C Trace Context via HTTP headers
- Span naming: `<service>.<operation>`
### Sampling
| Environment | Rate | Rationale |
|-------------|------|-----------|
| Development | 100% | Full visibility |
| Staging | 100% | Full visibility |
| Production | 10% | Balance cost vs observability |
### Integration Points
- HTTP requests: automatic instrumentation
- Database queries: automatic instrumentation
- Message queues: manual span creation on publish/consume
## Alerting
| Severity | Response Time | Conditions |
|----------|---------------|-----------|
| Critical | 5 min | Service unreachable, health check failed for 1 min, data loss detected |
| High | 30 min | Error rate > 5% for 5 min, P95 latency > 2x baseline for 10 min |
| Medium | 4 hours | Disk usage > 80%, elevated latency, connection pool exhaustion |
| Low | Next business day | Non-critical warnings, deprecated API usage |
### Notification Channels
| Severity | Channel |
|----------|---------|
| Critical | [PagerDuty / phone] |
| High | [Slack + email] |
| Medium | [Slack] |
| Low | [Dashboard only] |
## Dashboards
### Operations Dashboard
- Service health status (up/down per component)
- Request rate and error rate
- Response time percentiles (P50, P95, P99)
- Resource utilization (CPU, memory per container)
- Active alerts
### Business Dashboard
- [Key business metrics from acceptance criteria]
- [User activity indicators]
- [Transaction volumes]
```
+177
View File
@@ -0,0 +1,177 @@
---
name: implement
description: |
Orchestrate task implementation with dependency-aware batching, parallel subagents, and integrated code review.
Reads flat task files and _dependencies_table.md from TASKS_DIR, computes execution batches via topological sort,
launches up to 4 implementer subagents in parallel, runs code-review skill after each batch, and loops until done.
Use after /decompose has produced task files.
Trigger phrases:
- "implement", "start implementation", "implement tasks"
- "run implementers", "execute tasks"
category: build
tags: [implementation, orchestration, batching, parallel, code-review]
disable-model-invocation: true
---
# Implementation Orchestrator
Orchestrate the implementation of all tasks produced by the `/decompose` skill. This skill is a **pure orchestrator** — it does NOT write implementation code itself. It reads task specs, computes execution order, delegates to `implementer` subagents, validates results via the `/code-review` skill, and escalates issues.
The `implementer` agent is the specialist that writes all the code — it receives a task spec, analyzes the codebase, implements the feature, writes tests, and verifies acceptance criteria.
## Core Principles
- **Orchestrate, don't implement**: this skill delegates all coding to `implementer` subagents
- **Dependency-aware batching**: tasks run only when all their dependencies are satisfied
- **Max 4 parallel agents**: never launch more than 4 implementer subagents simultaneously
- **File isolation**: no two parallel agents may write to the same file
- **Integrated review**: `/code-review` skill runs automatically after each batch
- **Auto-start**: batches launch immediately — no user confirmation before a batch
- **Gate on failure**: user confirmation is required only when code review returns FAIL
- **Commit and push per batch**: after each batch is confirmed, commit and push to remote
## Context Resolution
- TASKS_DIR: `_docs/02_tasks/`
- Task files: all `*.md` files in TASKS_DIR (excluding files starting with `_`)
- Dependency table: `TASKS_DIR/_dependencies_table.md`
## Prerequisite Checks (BLOCKING)
1. TASKS_DIR exists and contains at least one task file — **STOP if missing**
2. `_dependencies_table.md` exists — **STOP if missing**
3. At least one task is not yet completed — **STOP if all done**
## Algorithm
### 1. Parse
- Read all task `*.md` files from TASKS_DIR (excluding files starting with `_`)
- Read `_dependencies_table.md` — parse into a dependency graph (DAG)
- Validate: no circular dependencies, all referenced dependencies exist
### 2. Detect Progress
- Scan the codebase to determine which tasks are already completed
- Match implemented code against task acceptance criteria
- Mark completed tasks as done in the DAG
- Report progress to user: "X of Y tasks completed"
### 3. Compute Next Batch
- Topological sort remaining tasks
- Select tasks whose dependencies are ALL satisfied (completed)
- If a ready task depends on any task currently being worked on in this batch, it must wait for the next batch
- Cap the batch at 4 parallel agents
- If the batch would exceed 20 total complexity points, suggest splitting and let the user decide
### 4. Assign File Ownership
For each task in the batch:
- Parse the task spec's Component field and Scope section
- Map the component to directories/files in the project
- Determine: files OWNED (exclusive write), files READ-ONLY (shared interfaces, types), files FORBIDDEN (other agents' owned files)
- If two tasks in the same batch would modify the same file, schedule them sequentially instead of in parallel
### 5. Update Jira Status → In Progress
For each task in the batch, transition its Jira ticket status to **In Progress** via Jira MCP before launching the implementer.
### 6. Launch Implementer Subagents
For each task in the batch, launch an `implementer` subagent with:
- Path to the task spec file
- List of files OWNED (exclusive write access)
- List of files READ-ONLY
- List of files FORBIDDEN
Launch all subagents immediately — no user confirmation.
### 7. Monitor
- Wait for all subagents to complete
- Collect structured status reports from each implementer
- If any implementer reports "Blocked", log the blocker and continue with others
### 8. Code Review
- Run `/code-review` skill on the batch's changed files + corresponding task specs
- The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL
### 9. Gate
- If verdict is **FAIL**: present findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding.
- If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically.
### 10. Test
- Run the full test suite
- If failures: report to user with details
### 11. Commit and Push
- After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
- `git add` all changed files from the batch
- `git commit` with a message that includes ALL JIRA-IDs of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[JIRA-ID-1] [JIRA-ID-2] ... Summary of changes`
- `git push` to the remote branch
### 12. Update Jira Status → In Testing
After the batch is committed and pushed, transition the Jira ticket status of each task in the batch to **In Testing** via Jira MCP.
### 13. Loop
- Go back to step 2 until all tasks are done
- When all tasks are complete, report final summary
## Batch Report Persistence
After each batch completes, save the batch report to `_docs/03_implementation/batch_[NN]_report.md`. Create the directory if it doesn't exist. When all tasks are complete, produce `_docs/03_implementation/FINAL_implementation_report.md` with a summary of all batches.
## Batch Report
After each batch, produce a structured report:
```markdown
# Batch Report
**Batch**: [N]
**Tasks**: [list]
**Date**: [YYYY-MM-DD]
## Task Results
| Task | Status | Files Modified | Tests | Issues |
|------|--------|---------------|-------|--------|
| [JIRA-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |
## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
## Next Batch: [task list] or "All tasks complete"
```
## Stop Conditions and Escalation
| Situation | Action |
|-----------|--------|
| Implementer fails same approach 3+ times | Stop it, escalate to user |
| Task blocked on external dependency (not in task list) | Report and skip |
| File ownership conflict unresolvable | ASK user |
| Test failures exceed 50% of suite after a batch | Stop and escalate |
| All tasks complete | Report final summary, suggest final commit |
| `_dependencies_table.md` missing | STOP — run `/decompose` first |
## Recovery
Each batch commit serves as a rollback checkpoint. If recovery is needed:
- **Tests fail after a batch commit**: `git revert <batch-commit-hash>` using the hash from the batch report in `_docs/03_implementation/`
- **Resuming after interruption**: Read `_docs/03_implementation/batch_*_report.md` files to determine which batches completed, then continue from the next batch
- **Multiple consecutive batches fail**: Stop and escalate to user with links to batch reports and commit hashes
## Safety Rules
- Never launch tasks whose dependencies are not yet completed
- Never allow two parallel agents to write to the same file
- If a subagent fails, do NOT retry automatically — report and let user decide
- Always run tests after each batch completes
@@ -0,0 +1,31 @@
# Batching Algorithm Reference
## Topological Sort with Batch Grouping
The `/implement` skill uses a topological sort to determine execution order,
then groups tasks into batches for parallel execution.
## Algorithm
1. Build adjacency list from `_dependencies_table.md`
2. Compute in-degree for each task node
3. Initialize batch 0 with all nodes that have in-degree 0
4. For each batch:
a. Select up to 4 tasks from the ready set
b. Check file ownership — if two tasks would write the same file, defer one to the next batch
c. Launch selected tasks as parallel implementer subagents
d. When all complete, remove them from the graph and decrement in-degrees of dependents
e. Add newly zero-in-degree nodes to the next batch's ready set
5. Repeat until the graph is empty
## File Ownership Conflict Resolution
When two tasks in the same batch map to overlapping files:
- Prefer to run the lower-numbered task first (it's more foundational)
- Defer the higher-numbered task to the next batch
- If both have equal priority, ask the user
## Complexity Budget
Each batch should not exceed 20 total complexity points.
If it does, split the batch and let the user choose which tasks to include.
@@ -0,0 +1,36 @@
# Batch Report Template
Use this template after each implementation batch completes.
---
```markdown
# Batch Report
**Batch**: [N]
**Tasks**: [list of task names]
**Date**: [YYYY-MM-DD]
## Task Results
| Task | Status | Files Modified | Tests | Issues |
|------|--------|---------------|-------|--------|
| [JIRA-ID]_[name] | Done/Blocked/Partial | [count] files | [X/Y pass] | [count or None] |
## Code Review Verdict: [PASS / FAIL / PASS_WITH_WARNINGS]
[Link to code review report if FAIL or PASS_WITH_WARNINGS]
## Test Suite
- Total: [N] tests
- Passed: [N]
- Failed: [N]
- Skipped: [N]
## Commit
[Suggested commit message]
## Next Batch: [task list] or "All tasks complete"
```
+557
View File
@@ -0,0 +1,557 @@
---
name: plan
description: |
Decompose a solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics.
Systematic 6-step planning workflow with BLOCKING gates, self-verification, and structured artifact management.
Uses _docs/ + _docs/02_plans/ structure.
Trigger phrases:
- "plan", "decompose solution", "architecture planning"
- "break down the solution", "create planning documents"
- "component decomposition", "solution analysis"
category: build
tags: [planning, architecture, components, testing, jira, epics]
disable-model-invocation: true
---
# Solution Planning
Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics through a systematic 6-step workflow.
## Core Principles
- **Single Responsibility**: each component does one thing well; do not spread related logic across components
- **Dumb code, smart data**: keep logic simple, push complexity into data structures and configuration
- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
- **Plan, don't code**: this workflow produces documents and specs, never implementation code
## Context Resolution
Fixed paths — no mode detection needed:
- PROBLEM_FILE: `_docs/00_problem/problem.md`
- SOLUTION_FILE: `_docs/01_solution/solution.md`
- PLANS_DIR: `_docs/02_plans/`
Announce the resolved paths to the user before proceeding.
## Input Specification
### Required Files
| File | Purpose |
|------|---------|
| `_docs/00_problem/problem.md` | Problem description and context |
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
| `_docs/00_problem/restrictions.md` | Constraints and limitations |
| `_docs/00_problem/input_data/` | Reference data examples |
| `_docs/01_solution/solution.md` | Finalized solution to decompose |
### Prerequisite Checks (BLOCKING)
Run sequentially before any planning step:
**Prereq 1: Data Gate**
1. `_docs/00_problem/acceptance_criteria.md` exists and is non-empty — **STOP if missing**
2. `_docs/00_problem/restrictions.md` exists and is non-empty — **STOP if missing**
3. `_docs/00_problem/input_data/` exists and contains at least one data file — **STOP if missing**
4. `_docs/00_problem/problem.md` exists and is non-empty — **STOP if missing**
All four are mandatory. If any is missing or empty, STOP and ask the user to provide them. If the user cannot provide the required data, planning cannot proceed — just stop.
**Prereq 2: Finalize Solution Draft**
Only runs after the Data Gate passes:
1. Scan `_docs/01_solution/` for files matching `solution_draft*.md`
2. Identify the highest-numbered draft (e.g. `solution_draft06.md`)
3. **Rename** it to `_docs/01_solution/solution.md`
4. If `solution.md` already exists, ask the user whether to overwrite or keep existing
5. Verify `solution.md` is non-empty — **STOP if missing or empty**
**Prereq 3: Workspace Setup**
1. Create PLANS_DIR if it does not exist
2. If PLANS_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
## Artifact Management
### Directory Structure
All artifacts are written directly under PLANS_DIR:
```
PLANS_DIR/
├── integration_tests/
│ ├── environment.md
│ ├── test_data.md
│ ├── functional_tests.md
│ ├── non_functional_tests.md
│ └── traceability_matrix.md
├── architecture.md
├── system-flows.md
├── data_model.md
├── deployment/
│ ├── containerization.md
│ ├── ci_cd_pipeline.md
│ ├── environment_strategy.md
│ ├── observability.md
│ └── deployment_procedures.md
├── risk_mitigations.md
├── risk_mitigations_02.md (iterative, ## as sequence)
├── components/
│ ├── 01_[name]/
│ │ ├── description.md
│ │ └── tests.md
│ ├── 02_[name]/
│ │ ├── description.md
│ │ └── tests.md
│ └── ...
├── common-helpers/
│ ├── 01_helper_[name]/
│ ├── 02_helper_[name]/
│ └── ...
├── diagrams/
│ ├── components.drawio
│ └── flows/
│ ├── flow_[name].md (Mermaid)
│ └── ...
└── FINAL_report.md
```
### Save Timing
| Step | Save immediately after | Filename |
|------|------------------------|----------|
| Step 1 | Integration test environment spec | `integration_tests/environment.md` |
| Step 1 | Integration test data spec | `integration_tests/test_data.md` |
| Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
| Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
| Step 2 | Architecture analysis complete | `architecture.md` |
| Step 2 | System flows documented | `system-flows.md` |
| Step 2 | Data model documented | `data_model.md` |
| Step 2 | Deployment plan complete | `deployment/` (5 files) |
| Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
| Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
| Step 3 | Diagrams generated | `diagrams/` |
| Step 4 | Risk assessment complete | `risk_mitigations.md` |
| Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
| Step 6 | Epics created in Jira | Jira via MCP |
| Final | All steps complete | `FINAL_report.md` |
### Save Principles
1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
2. **Incremental updates**: same file can be updated multiple times; append or replace
3. **Preserve process**: keep all intermediate files even after integration into final report
4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
### Resumability
If PLANS_DIR already contains artifacts:
1. List existing files and match them to the save timing table above
2. Identify the last completed step based on which artifacts exist
3. Resume from the next incomplete step
4. Inform the user which steps are being skipped
## Progress Tracking
At the start of execution, create a TodoWrite with all steps (1 through 6). Update status as each step completes.
## Workflow
### Step 1: Integration Tests
**Role**: Professional Quality Assurance Engineer
**Goal**: Analyze input data completeness and produce detailed black-box integration test specifications
**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
#### Phase 1a: Input Data Completeness Analysis
1. Read `_docs/01_solution/solution.md` (finalized in Prereq 2)
2. Read `acceptance_criteria.md`, `restrictions.md`
3. Read testing strategy from solution.md
4. Analyze `input_data/` contents against:
- Coverage of acceptance criteria scenarios
- Coverage of restriction edge cases
- Coverage of testing strategy requirements
5. Threshold: at least 70% coverage of the scenarios
6. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/`
7. Present coverage assessment to user
**BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient.
#### Phase 1b: Black-Box Test Scenario Specification
Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
1. Define test environment using `templates/integration-environment.md` as structure
2. Define test data management using `templates/integration-test-data.md` as structure
3. Write functional test scenarios (positive + negative) using `templates/integration-functional-tests.md` as structure
4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `templates/integration-non-functional-tests.md` as structure
5. Build traceability matrix using `templates/integration-traceability-matrix.md` as structure
**Self-verification**:
- [ ] Every acceptance criterion is covered by at least one test scenario
- [ ] Every restriction is verified by at least one test scenario
- [ ] Positive and negative scenarios are balanced
- [ ] Consumer app has no direct access to system internals
- [ ] Docker environment is self-contained (`docker compose up` sufficient)
- [ ] External dependencies have mock/stub services defined
- [ ] Traceability matrix has no uncovered AC or restrictions
**Save action**: Write all files under `integration_tests/`:
- `environment.md`
- `test_data.md`
- `functional_tests.md`
- `non_functional_tests.md`
- `traceability_matrix.md`
**BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed.
Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3.
---
### Step 2: Solution Analysis
**Role**: Professional software architect
**Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
**Constraints**: No code, no component-level detail yet; focus on system-level view
#### Phase 2a: Architecture & Flows
1. Read all input files thoroughly
2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
3. Research unknown or questionable topics via internet; ask user about ambiguities
4. Document architecture using `templates/architecture.md` as structure
5. Document system flows using `templates/system-flows.md` as structure
**Self-verification**:
- [ ] Architecture covers all capabilities mentioned in solution.md
- [ ] System flows cover all main user/system interactions
- [ ] No contradictions with problem.md or restrictions.md
- [ ] Technology choices are justified
- [ ] Integration test findings are reflected in architecture decisions
**Save action**: Write `architecture.md` and `system-flows.md`
**BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
#### Phase 2b: Data Model
**Role**: Professional software architect
**Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
1. Extract core entities from architecture.md and solution.md
2. Define entity attributes, types, and constraints
3. Define relationships between entities (Mermaid ERD)
4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
5. Define seed data requirements per environment (dev, staging)
6. Define backward compatibility approach for schema changes (additive-only by default)
**Self-verification**:
- [ ] Every entity mentioned in architecture.md is defined
- [ ] Relationships are explicit with cardinality
- [ ] Migration strategy specifies reversibility requirement
- [ ] Seed data requirements defined
- [ ] Backward compatibility approach documented
**Save action**: Write `data_model.md`
#### Phase 2c: Deployment Planning
**Role**: DevOps / Platform engineer
**Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
Use the `/deploy` skill's templates as structure for each artifact:
1. Read architecture.md and restrictions.md for infrastructure constraints
2. Research Docker best practices for the project's tech stack
3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
5. Define environment strategy: dev, staging, production with secrets management
6. Define observability: structured logging, metrics, tracing, alerting
7. Define deployment procedures: strategy, health checks, rollback, checklist
**Self-verification**:
- [ ] Every component has a Docker specification
- [ ] CI/CD pipeline covers lint, test, security, build, deploy
- [ ] Environment strategy covers dev, staging, production
- [ ] Observability covers logging, metrics, tracing, alerting
- [ ] Deployment procedures include rollback and health checks
**Save action**: Write all 5 files under `deployment/`:
- `containerization.md`
- `ci_cd_pipeline.md`
- `environment_strategy.md`
- `observability.md`
- `deployment_procedures.md`
---
### Step 3: Component Decomposition
**Role**: Professional software architect
**Goal**: Decompose the architecture into components with detailed specs
**Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
1. Identify components from the architecture; think about separation, reusability, and communication patterns
2. Use integration test scenarios from Step 1 to validate component boundaries
3. If additional components are needed (data preparation, shared helpers), create them
4. For each component, write a spec using `templates/component-spec.md` as structure
5. Generate diagrams:
- draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
- Mermaid flowchart per main control flow
6. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified.
**Self-verification**:
- [ ] Each component has a single, clear responsibility
- [ ] No functionality is spread across multiple components
- [ ] All inter-component interfaces are defined (who calls whom, with what)
- [ ] Component dependency graph has no circular dependencies
- [ ] All components from architecture.md are accounted for
- [ ] Every integration test scenario can be traced through component interactions
**Save action**: Write:
- each component `components/[##]_[name]/description.md`
- common helper `common-helpers/[##]_helper_[name].md`
- diagrams `diagrams/`
**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
---
### Step 4: Architecture Review & Risk Assessment
**Role**: Professional software architect and analyst
**Goal**: Validate all artifacts for consistency, then identify and mitigate risks
**Constraints**: This is a review step — fix problems found, do not add new features
#### 4a. Evaluator Pass (re-read ALL artifacts)
Review checklist:
- [ ] All components follow Single Responsibility Principle
- [ ] All components follow dumb code / smart data principle
- [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
- [ ] No circular dependencies in the dependency graph
- [ ] No missing interactions between components
- [ ] No over-engineering — is there a simpler decomposition?
- [ ] Security considerations addressed in component design
- [ ] Performance bottlenecks identified
- [ ] API contracts are consistent across components
Fix any issues found before proceeding to risk identification.
#### 4b. Risk Identification
1. Identify technical and project risks
2. Assess probability and impact using `templates/risk-register.md`
3. Define mitigation strategies
4. Apply mitigations to architecture, flows, and component documents where applicable
**Self-verification**:
- [ ] Every High/Critical risk has a concrete mitigation strategy
- [ ] Mitigations are reflected in the relevant component or architecture docs
- [ ] No new risks introduced by the mitigations themselves
**Save action**: Write `risk_mitigations.md`
**BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
**Iterative**: If user requests another round, repeat Step 4 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
---
### Step 5: Test Specifications
**Role**: Professional Quality Assurance Engineer
**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
**Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
1. For each component, write tests using `templates/test-spec.md` as structure
2. Cover all 4 types: integration, performance, security, acceptance
3. Include test data management (setup, teardown, isolation)
4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
**Self-verification**:
- [ ] Every acceptance criterion has at least one test covering it
- [ ] Test inputs are realistic and well-defined
- [ ] Expected results are specific and measurable
- [ ] No component is left without tests
**Save action**: Write each `components/[##]_[name]/tests.md`
---
### Step 6: Jira Epics
**Role**: Professional product manager
**Goal**: Create Jira epics from components, ordered by dependency
**Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the Jira epic should understand the full context without needing to open separate files.
1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md`
3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
4. Include effort estimation per epic (T-shirt size or story points range)
5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
6. Generate Mermaid diagrams showing component-to-epic mapping and component relationships
**CRITICAL — Epic description richness requirements**:
Each epic description in Jira MUST include ALL of the following sections with substantial content:
- **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections)
- **Problem / Context**: what problem this component solves, why it exists, current pain points
- **Scope**: detailed in-scope and out-of-scope lists
- **Architecture notes**: relevant ADRs, technology choices, patterns used, key design decisions
- **Interface specification**: full method signatures, input/output types, error types (from component description.md)
- **Data flow**: how data enters and exits this component (include Mermaid sequence or flowchart diagram)
- **Dependencies**: epic dependencies (with Jira IDs) and external dependencies (libraries, hardware, services)
- **Acceptance criteria**: measurable criteria with specific thresholds (from component tests.md)
- **Non-functional requirements**: latency, memory, throughput targets with failure thresholds
- **Risks & mitigations**: relevant risks from risk_mitigations.md with concrete mitigation strategies
- **Effort estimation**: T-shirt size and story points range
- **Child issues**: planned task breakdown with complexity points
- **Key constraints**: from restrictions.md that affect this component
- **Testing strategy**: summary of test types and coverage from tests.md
Do NOT create minimal epics with just a summary and short description. The Jira epic is the primary reference document for the implementation team.
**Self-verification**:
- [ ] "Bootstrap & Initial Structure" epic exists and is first in order
- [ ] "Integration Tests" epic exists
- [ ] Every component maps to exactly one epic
- [ ] Dependency order is respected (no epic depends on a later one)
- [ ] Acceptance criteria are measurable
- [ ] Effort estimates are realistic
- [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
- [ ] Epic descriptions are self-contained — readable without opening other files
7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
**Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
---
## Quality Checklist (before FINAL_report.md)
Before writing the final report, verify ALL of the following:
### Integration Tests
- [ ] Every acceptance criterion is covered in traceability_matrix.md
- [ ] Every restriction is verified by at least one test
- [ ] Positive and negative scenarios are balanced
- [ ] Docker environment is self-contained
- [ ] Consumer app treats main system as black box
- [ ] CI/CD integration and reporting defined
### Architecture
- [ ] Covers all capabilities from solution.md
- [ ] Technology choices are justified
- [ ] Deployment model is defined
- [ ] Integration test findings are reflected in architecture decisions
### Data Model
- [ ] Every entity from architecture.md is defined
- [ ] Relationships have explicit cardinality
- [ ] Migration strategy with reversibility requirement
- [ ] Seed data requirements defined
- [ ] Backward compatibility approach documented
### Deployment
- [ ] Containerization plan covers all components
- [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
- [ ] Environment strategy covers dev, staging, production
- [ ] Observability covers logging, metrics, tracing, alerting
- [ ] Deployment procedures include rollback and health checks
### Components
- [ ] Every component follows SRP
- [ ] No circular dependencies
- [ ] All inter-component interfaces are defined and consistent
- [ ] No orphan components (unused by any flow)
- [ ] Every integration test scenario can be traced through component interactions
### Risks
- [ ] All High/Critical risks have mitigations
- [ ] Mitigations are reflected in component/architecture docs
- [ ] User has confirmed risk assessment is sufficient
### Tests
- [ ] Every acceptance criterion is covered by at least one test
- [ ] All 4 test types are represented per component (where applicable)
- [ ] Test data management is defined
### Epics
- [ ] "Bootstrap & Initial Structure" epic exists
- [ ] "Integration Tests" epic exists
- [ ] Every component maps to an epic
- [ ] Dependency order is correct
- [ ] Acceptance criteria are measurable
**Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
## Common Mistakes
- **Proceeding without input data**: all three data gate items (acceptance_criteria, restrictions, input_data) must be present before any planning begins
- **Coding during planning**: this workflow produces documents, never code
- **Multi-responsibility components**: if a component does two things, split it
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
- **Diagrams without data**: generate diagrams only after the underlying structure is documented
- **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input
- **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output
- **Ignoring restrictions.md**: every constraint must be traceable in the architecture or risk register
- **Ignoring integration test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)
## Escalation Rules
| Situation | Action |
|-----------|--------|
| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — planning cannot proceed |
| Ambiguous requirements | ASK user |
| Input data coverage below 70% | Search internet for supplementary data, ASK user to validate |
| Technology choice with multiple valid options | ASK user |
| Component naming | PROCEED, confirm at next BLOCKING gate |
| File structure within templates | PROCEED |
| Contradictions between input files | ASK user |
| Risk mitigation requires architecture change | ASK user |
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Solution Planning (6-Step Method) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ 1: Data Gate (BLOCKING) │
│ → verify AC, restrictions, input_data exist — STOP if not │
│ PREREQ 2: Finalize solution draft │
│ → rename highest solution_draft##.md to solution.md │
│ PREREQ 3: Workspace setup │
│ → create PLANS_DIR/ if needed │
│ │
│ 1. Integration Tests → integration_tests/ (5 files) │
│ [BLOCKING: user confirms test coverage] │
│ 2a. Architecture → architecture.md, system-flows.md │
│ [BLOCKING: user confirms architecture] │
│ 2b. Data Model → data_model.md │
│ 2c. Deployment → deployment/ (5 files) │
│ 3. Component Decompose → components/[##]_[name]/description │
│ [BLOCKING: user confirms decomposition] │
│ 4. Review & Risk → risk_mitigations.md │
│ [BLOCKING: user confirms risks, iterative] │
│ 5. Test Specifications → components/[##]_[name]/tests.md │
│ 6. Jira Epics → Jira via MCP │
│ ───────────────────────────────────────────────── │
│ Quality Checklist → FINAL_report.md │
├────────────────────────────────────────────────────────────────┤
│ Principles: SRP · Dumb code/smart data · Save immediately │
│ Ask don't assume · Plan don't code │
└────────────────────────────────────────────────────────────────┘
```
@@ -0,0 +1,128 @@
# Architecture Document Template
Use this template for the architecture document. Save as `_docs/02_plans/architecture.md`.
---
```markdown
# [System Name] — Architecture
## 1. System Context
**Problem being solved**: [One paragraph summarizing the problem from problem.md]
**System boundaries**: [What is inside the system vs. external]
**External systems**:
| System | Integration Type | Direction | Purpose |
|--------|-----------------|-----------|---------|
| [name] | REST / Queue / DB / File | Inbound / Outbound / Both | [why] |
## 2. Technology Stack
| Layer | Technology | Version | Rationale |
|-------|-----------|---------|-----------|
| Language | | | |
| Framework | | | |
| Database | | | |
| Cache | | | |
| Message Queue | | | |
| Hosting | | | |
| CI/CD | | | |
**Key constraints from restrictions.md**:
- [Constraint 1 and how it affects technology choices]
- [Constraint 2]
## 3. Deployment Model
**Environments**: Development, Staging, Production
**Infrastructure**:
- [Cloud provider / On-prem / Hybrid]
- [Container orchestration if applicable]
- [Scaling strategy: horizontal / vertical / auto]
**Environment-specific configuration**:
| Config | Development | Production |
|--------|-------------|------------|
| Database | [local/docker] | [managed service] |
| Secrets | [.env file] | [secret manager] |
| Logging | [console] | [centralized] |
## 4. Data Model Overview
> High-level data model covering the entire system. Detailed per-component models go in component specs.
**Core entities**:
| Entity | Description | Owned By Component |
|--------|-------------|--------------------|
| [entity] | [what it represents] | [component ##] |
**Key relationships**:
- [Entity A] → [Entity B]: [relationship description]
**Data flow summary**:
- [Source] → [Transform] → [Destination]: [what data and why]
## 5. Integration Points
### Internal Communication
| From | To | Protocol | Pattern | Notes |
|------|----|----------|---------|-------|
| [component] | [component] | Sync REST / Async Queue / Direct call | Request-Response / Event / Command | |
### External Integrations
| External System | Protocol | Auth | Rate Limits | Failure Mode |
|----------------|----------|------|-------------|--------------|
| [system] | [REST/gRPC/etc] | [API key/OAuth/etc] | [limits] | [retry/circuit breaker/fallback] |
## 6. Non-Functional Requirements
| Requirement | Target | Measurement | Priority |
|------------|--------|-------------|----------|
| Availability | [e.g., 99.9%] | [how measured] | High/Medium/Low |
| Latency (p95) | [e.g., <200ms] | [endpoint/operation] | |
| Throughput | [e.g., 1000 req/s] | [peak/sustained] | |
| Data retention | [e.g., 90 days] | [which data] | |
| Recovery (RPO/RTO) | [e.g., RPO 1hr, RTO 4hr] | | |
| Scalability | [e.g., 10x current load] | [timeline] | |
## 7. Security Architecture
**Authentication**: [mechanism — JWT / session / API key]
**Authorization**: [RBAC / ABAC / per-resource]
**Data protection**:
- At rest: [encryption method]
- In transit: [TLS version]
- Secrets management: [tool/approach]
**Audit logging**: [what is logged, where, retention]
## 8. Key Architectural Decisions
Record significant decisions that shaped the architecture.
### ADR-001: [Decision Title]
**Context**: [Why this decision was needed]
**Decision**: [What was decided]
**Alternatives considered**:
1. [Alternative 1] — rejected because [reason]
2. [Alternative 2] — rejected because [reason]
**Consequences**: [Trade-offs accepted]
### ADR-002: [Decision Title]
...
```
@@ -0,0 +1,156 @@
# Component Specification Template
Use this template for each component. Save as `components/[##]_[name]/description.md`.
---
```markdown
# [Component Name]
## 1. High-Level Overview
**Purpose**: [One sentence: what this component does and its role in the system]
**Architectural Pattern**: [e.g., Repository, Event-driven, Pipeline, Facade, etc.]
**Upstream dependencies**: [Components that this component calls or consumes from]
**Downstream consumers**: [Components that call or consume from this component]
## 2. Internal Interfaces
For each interface this component exposes internally:
### Interface: [InterfaceName]
| Method | Input | Output | Async | Error Types |
|--------|-------|--------|-------|-------------|
| `method_name` | `InputDTO` | `OutputDTO` | Yes/No | `ErrorType1`, `ErrorType2` |
**Input DTOs**:
```
[DTO name]:
field_1: type (required/optional) — description
field_2: type (required/optional) — description
```
**Output DTOs**:
```
[DTO name]:
field_1: type — description
field_2: type — description
```
## 3. External API Specification
> Include this section only if the component exposes an external HTTP/gRPC API.
> Skip if the component is internal-only.
| Endpoint | Method | Auth | Rate Limit | Description |
|----------|--------|------|------------|-------------|
| `/api/v1/...` | GET/POST/PUT/DELETE | Required/Public | X req/min | Brief description |
**Request/Response schemas**: define per endpoint using OpenAPI-style notation.
**Example request/response**:
```json
// Request
{ }
// Response
{ }
```
## 4. Data Access Patterns
### Queries
| Query | Frequency | Hot Path | Index Needed |
|-------|-----------|----------|--------------|
| [describe query] | High/Medium/Low | Yes/No | Yes/No |
### Caching Strategy
| Data | Cache Type | TTL | Invalidation |
|------|-----------|-----|-------------|
| [data item] | In-memory / Redis / None | [duration] | [trigger] |
### Storage Estimates
| Table/Collection | Est. Row Count (1yr) | Row Size | Total Size | Growth Rate |
|-----------------|---------------------|----------|------------|-------------|
| [table_name] | | | | /month |
### Data Management
**Seed data**: [Required seed data and how to load it]
**Rollback**: [Rollback procedure for this component's data changes]
## 5. Implementation Details
**Algorithmic Complexity**: [Big O for critical methods — only if non-trivial]
**State Management**: [Local state / Global state / Stateless — explain how state is handled]
**Key Dependencies**: [External libraries and their purpose]
| Library | Version | Purpose |
|---------|---------|---------|
| [name] | [version] | [why needed] |
**Error Handling Strategy**:
- [How errors are caught, propagated, and reported]
- [Retry policy if applicable]
- [Circuit breaker if applicable]
## 6. Extensions and Helpers
> List any shared utilities this component needs that should live in a `helpers/` folder.
| Helper | Purpose | Used By |
|--------|---------|---------|
| [helper_name] | [what it does] | [list of components] |
## 7. Caveats & Edge Cases
**Known limitations**:
- [Limitation 1]
**Potential race conditions**:
- [Race condition scenario, if any]
**Performance bottlenecks**:
- [Bottleneck description and mitigation approach]
## 8. Dependency Graph
**Must be implemented after**: [list of component numbers/names]
**Can be implemented in parallel with**: [list of component numbers/names]
**Blocks**: [list of components that depend on this one]
## 9. Logging Strategy
| Log Level | When | Example |
|-----------|------|---------|
| ERROR | Unrecoverable failures | `Failed to process order {id}: {error}` |
| WARN | Recoverable issues | `Retry attempt {n} for {operation}` |
| INFO | Key business events | `Order {id} created by user {uid}` |
| DEBUG | Development diagnostics | `Query returned {n} rows in {ms}ms` |
**Log format**: [structured JSON / plaintext — match system standard]
**Log storage**: [stdout / file / centralized logging service]
```
---
## Guidance Notes
- **Section 3 (External API)**: skip entirely for internal-only components. Include for any component that exposes HTTP endpoints, WebSocket connections, or gRPC services.
- **Section 4 (Storage Estimates)**: critical for components that manage persistent data. Skip for stateless components.
- **Section 5 (Algorithmic Complexity)**: only document if the algorithm is non-trivial (O(n^2) or worse, recursive, etc.). Simple CRUD operations don't need this.
- **Section 6 (Helpers)**: if the helper is used by only one component, keep it inside that component. Only extract to `helpers/` if shared by 2+ components.
- **Section 8 (Dependency Graph)**: this is essential for determining implementation order. Be precise about what "depends on" means — data dependency, API dependency, or shared infrastructure.
+127
View File
@@ -0,0 +1,127 @@
# Jira Epic Template
Use this template for each Jira epic. Create epics via Jira MCP.
---
```markdown
## Epic: [Component Name] — [Outcome]
**Example**: Data Ingestion — Near-real-time pipeline
### Epic Summary
[1-2 sentences: what we are building + why it matters]
### Problem / Context
[Current state, pain points, constraints, business opportunities.
Link to architecture.md and relevant component spec.]
### Scope
**In Scope**:
- [Capability 1 — describe what, not how]
- [Capability 2]
- [Capability 3]
**Out of Scope**:
- [Explicit exclusion 1 — prevents scope creep]
- [Explicit exclusion 2]
### Assumptions
- [System design assumption]
- [Data structure assumption]
- [Infrastructure assumption]
### Dependencies
**Epic dependencies** (must be completed first):
- [Epic name / ID]
**External dependencies**:
- [Services, hardware, environments, certificates, data sources]
### Effort Estimation
**T-shirt size**: S / M / L / XL
**Story points range**: [min]-[max]
### Users / Consumers
| Type | Who | Key Use Cases |
|------|-----|--------------|
| Internal | [team/role] | [use case] |
| External | [user type] | [use case] |
| System | [service name] | [integration point] |
### Requirements
**Functional**:
- [API expectations, events, data handling]
- [Idempotency, retry behavior]
**Non-functional**:
- [Availability, latency, throughput targets]
- [Scalability, processing limits, data retention]
**Security / Compliance**:
- [Authentication, encryption, secrets management]
- [Logging, audit trail]
- [SOC2 / ISO / GDPR if applicable]
### Design & Architecture
- Architecture doc: `_docs/02_plans/architecture.md`
- Component spec: `_docs/02_plans/components/[##]_[name]/description.md`
- System flows: `_docs/02_plans/system-flows.md`
### Definition of Done
- [ ] All in-scope capabilities implemented
- [ ] Automated tests pass (unit + integration + e2e)
- [ ] Minimum coverage threshold met (75%)
- [ ] Runbooks written (if applicable)
- [ ] Documentation updated
### Acceptance Criteria
| # | Criterion | Measurable Condition |
|---|-----------|---------------------|
| 1 | [criterion] | [how to verify] |
| 2 | [criterion] | [how to verify] |
### Risks & Mitigations
| # | Risk | Mitigation | Owner |
|---|------|------------|-------|
| 1 | [top risk] | [mitigation] | [owner] |
| 2 | | | |
| 3 | | | |
### Labels
- `component:[name]`
- `env:prod` / `env:stg`
- `type:platform` / `type:data` / `type:integration`
### Child Issues
| Type | Title | Points |
|------|-------|--------|
| Spike | [research/investigation task] | [1-3] |
| Task | [implementation task] | [1-5] |
| Task | [implementation task] | [1-5] |
| Enabler | [infrastructure/setup task] | [1-3] |
```
---
## Guidance Notes
- Be concise. Fewer words with the same meaning = better epic.
- Capabilities in scope are "what", not "how" — avoid describing implementation details.
- Dependency order matters: epics that must be done first should be listed earlier in the backlog.
- Every epic maps to exactly one component. If a component is too large for one epic, split the component first.
- Complexity points for child issues follow the project standard: 1, 2, 3, 5, 8. Do not create issues above 5 points — split them.
@@ -0,0 +1,104 @@
# Final Planning Report Template
Use this template after completing all 5 steps and the quality checklist. Save as `_docs/02_plans/FINAL_report.md`.
---
```markdown
# [System Name] — Planning Report
## Executive Summary
[2-3 sentences: what was planned, the core architectural approach, and the key outcome (number of components, epics, estimated effort)]
## Problem Statement
[Brief restatement from problem.md — transformed, not copy-pasted]
## Architecture Overview
[Key architectural decisions and technology stack summary. Reference `architecture.md` for full details.]
**Technology stack**: [language, framework, database, hosting — one line]
**Deployment**: [environment strategy — one line]
## Component Summary
| # | Component | Purpose | Dependencies | Epic |
|---|-----------|---------|-------------|------|
| 01 | [name] | [one-line purpose] | — | [Jira ID] |
| 02 | [name] | [one-line purpose] | 01 | [Jira ID] |
| ... | | | | |
**Implementation order** (based on dependency graph):
1. [Phase 1: components that can start immediately]
2. [Phase 2: components that depend on Phase 1]
3. [Phase 3: ...]
## System Flows
| Flow | Description | Key Components |
|------|-------------|---------------|
| [name] | [one-line summary] | [component list] |
[Reference `system-flows.md` for full diagrams and details.]
## Risk Summary
| Level | Count | Key Risks |
|-------|-------|-----------|
| Critical | [N] | [brief list] |
| High | [N] | [brief list] |
| Medium | [N] | — |
| Low | [N] | — |
**Iterations completed**: [N]
**All Critical/High risks mitigated**: Yes / No — [details if No]
[Reference `risk_mitigations.md` for full register.]
## Test Coverage
| Component | Integration | Performance | Security | Acceptance | AC Coverage |
|-----------|-------------|-------------|----------|------------|-------------|
| [name] | [N tests] | [N tests] | [N tests] | [N tests] | [X/Y ACs] |
| ... | | | | | |
**Overall acceptance criteria coverage**: [X / Y total ACs covered] ([percentage]%)
## Epic Roadmap
| Order | Epic | Component | Effort | Dependencies |
|-------|------|-----------|--------|-------------|
| 1 | [Jira ID]: [name] | [component] | [S/M/L/XL] | — |
| 2 | [Jira ID]: [name] | [component] | [S/M/L/XL] | Epic 1 |
| ... | | | | |
**Total estimated effort**: [sum or range]
## Key Decisions Made
| # | Decision | Rationale | Alternatives Rejected |
|---|----------|-----------|----------------------|
| 1 | [decision] | [why] | [what was rejected] |
| 2 | | | |
## Open Questions
| # | Question | Impact | Assigned To |
|---|----------|--------|-------------|
| 1 | [unresolved question] | [what it blocks or affects] | [who should answer] |
## Artifact Index
| File | Description |
|------|-------------|
| `architecture.md` | System architecture |
| `system-flows.md` | System flows and diagrams |
| `components/01_[name]/description.md` | Component spec |
| `components/01_[name]/tests.md` | Test spec |
| `risk_mitigations.md` | Risk register |
| `diagrams/components.drawio` | Component diagram |
| `diagrams/flows/flow_[name].md` | Flow diagrams |
```
@@ -0,0 +1,90 @@
# E2E Test Environment Template
Save as `PLANS_DIR/integration_tests/environment.md`.
---
```markdown
# E2E Test Environment
## Overview
**System under test**: [main system name and entry points — API URLs, message queues, serial ports, etc.]
**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals.
## Docker Environment
### Services
| Service | Image / Build | Purpose | Ports |
|---------|--------------|---------|-------|
| system-under-test | [main app image or build context] | The main system being tested | [ports] |
| test-db | [postgres/mysql/etc.] | Database for the main system | [ports] |
| e2e-consumer | [build context for consumer app] | Black-box test runner | — |
| [dependency] | [image] | [purpose — cache, queue, mock, etc.] | [ports] |
### Networks
| Network | Services | Purpose |
|---------|----------|---------|
| e2e-net | all | Isolated test network |
### Volumes
| Volume | Mounted to | Purpose |
|--------|-----------|---------|
| [name] | [service:path] | [test data, DB persistence, etc.] |
### docker-compose structure
```yaml
# Outline only — not runnable code
services:
system-under-test:
# main system
test-db:
# database
e2e-consumer:
# consumer test app
depends_on:
- system-under-test
```
## Consumer Application
**Tech stack**: [language, framework, test runner]
**Entry point**: [how it starts — e.g., pytest, jest, custom runner]
### Communication with system under test
| Interface | Protocol | Endpoint / Topic | Authentication |
|-----------|----------|-----------------|----------------|
| [API name] | [HTTP/gRPC/AMQP/etc.] | [URL or topic] | [method] |
### What the consumer does NOT have access to
- No direct database access to the main system
- No internal module imports
- No shared memory or file system with the main system
## CI/CD Integration
**When to run**: [e.g., on PR merge to dev, nightly, before production deploy]
**Pipeline stage**: [where in the CI pipeline this fits]
**Gate behavior**: [block merge / warning only / manual approval]
**Timeout**: [max total suite duration before considered failed]
## Reporting
**Format**: CSV
**Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
**Output path**: [where the CSV is written — e.g., ./e2e-results/report.csv]
```
---
## Guidance Notes
- The consumer app must treat the main system as a true black box — no internal imports, no direct DB queries against the main system's database.
- Docker environment should be self-contained — `docker compose up` must be sufficient to run the full suite.
- If the main system requires external services (payment gateways, third-party APIs), define mock/stub services in the Docker environment.
@@ -0,0 +1,78 @@
# E2E Functional Tests Template
Save as `PLANS_DIR/integration_tests/functional_tests.md`.
---
```markdown
# E2E Functional Tests
## Positive Scenarios
### FT-P-01: [Scenario Name]
**Summary**: [One sentence: what end-to-end use case this validates]
**Traces to**: AC-[ID], AC-[ID]
**Category**: [which AC category — e.g., Position Accuracy, Image Processing, etc.]
**Preconditions**:
- [System state required before test]
**Input data**: [reference to specific data set or file from test_data.md]
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | [call / send / provide input] | [response / event / output] |
| 2 | [call / send / provide input] | [response / event / output] |
**Expected outcome**: [specific, measurable result]
**Max execution time**: [e.g., 10s]
---
### FT-P-02: [Scenario Name]
(repeat structure)
---
## Negative Scenarios
### FT-N-01: [Scenario Name]
**Summary**: [One sentence: what invalid/edge input this tests]
**Traces to**: AC-[ID] (negative case), RESTRICT-[ID]
**Category**: [which AC/restriction category]
**Preconditions**:
- [System state required before test]
**Input data**: [reference to specific invalid data or edge case]
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | [provide invalid input / trigger edge case] | [error response / graceful degradation / fallback behavior] |
**Expected outcome**: [system rejects gracefully / falls back to X / returns error Y]
**Max execution time**: [e.g., 5s]
---
### FT-N-02: [Scenario Name]
(repeat structure)
```
---
## Guidance Notes
- Functional tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
- Positive scenarios validate the system does what it should.
- Negative scenarios validate the system rejects or handles gracefully what it shouldn't accept.
- Expected outcomes must be specific and measurable — not "works correctly" but "returns position within 50m of ground truth."
- Input data references should point to specific entries in test_data.md.
@@ -0,0 +1,97 @@
# E2E Non-Functional Tests Template
Save as `PLANS_DIR/integration_tests/non_functional_tests.md`.
---
```markdown
# E2E Non-Functional Tests
## Performance Tests
### NFT-PERF-01: [Test Name]
**Summary**: [What performance characteristic this validates]
**Traces to**: AC-[ID]
**Metric**: [what is measured — latency, throughput, frame rate, etc.]
**Preconditions**:
- [System state, load profile, data volume]
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | [action] | [what to measure and how] |
**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
**Duration**: [how long the test runs]
---
## Resilience Tests
### NFT-RES-01: [Test Name]
**Summary**: [What failure/recovery scenario this validates]
**Traces to**: AC-[ID]
**Preconditions**:
- [System state before fault injection]
**Fault injection**:
- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | [inject fault] | [system behavior during fault] |
| 2 | [observe recovery] | [system behavior after recovery] |
**Pass criteria**: [recovery time, data integrity, continued operation]
---
## Security Tests
### NFT-SEC-01: [Test Name]
**Summary**: [What security property this validates]
**Traces to**: AC-[ID], RESTRICT-[ID]
**Steps**:
| Step | Consumer Action | Expected Response |
|------|----------------|------------------|
| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
**Pass criteria**: [specific security outcome]
---
## Resource Limit Tests
### NFT-RES-LIM-01: [Test Name]
**Summary**: [What resource constraint this validates]
**Traces to**: AC-[ID], RESTRICT-[ID]
**Preconditions**:
- [System running under specified constraints]
**Monitoring**:
- [What resources to monitor — memory, CPU, GPU, disk, temperature]
**Duration**: [how long to run]
**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
```
---
## Guidance Notes
- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
- Security tests at E2E level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
@@ -0,0 +1,46 @@
# E2E Test Data Template
Save as `PLANS_DIR/integration_tests/test_data.md`.
---
```markdown
# E2E Test Data Management
## Seed Data Sets
| Data Set | Description | Used by Tests | How Loaded | Cleanup |
|----------|-------------|---------------|-----------|---------|
| [name] | [what it contains] | [test IDs] | [SQL script / API call / fixture file / volume mount] | [how removed after test] |
## Data Isolation Strategy
[e.g., each test run gets a fresh container restart, or transactions are rolled back, or namespaced data, or separate DB per test group]
## Input Data Mapping
| Input Data File | Source Location | Description | Covers Scenarios |
|-----------------|----------------|-------------|-----------------|
| [filename] | `_docs/00_problem/input_data/[filename]` | [what it contains] | [test IDs that use this data] |
## External Dependency Mocks
| External Service | Mock/Stub | How Provided | Behavior |
|-----------------|-----------|-------------|----------|
| [service name] | [mock type] | [Docker service / in-process stub / recorded responses] | [what it returns / simulates] |
## Data Validation Rules
| Data Type | Validation | Invalid Examples | Expected System Behavior |
|-----------|-----------|-----------------|------------------------|
| [type] | [rules] | [invalid input examples] | [how system should respond] |
```
---
## Guidance Notes
- Every seed data set should be traceable to specific test scenarios.
- Input data from `_docs/00_problem/input_data/` should be mapped to test scenarios that use it.
- External mocks must be deterministic — same input always produces same output.
- Data isolation must guarantee no test can affect another test's outcome.
@@ -0,0 +1,47 @@
# E2E Traceability Matrix Template
Save as `PLANS_DIR/integration_tests/traceability_matrix.md`.
---
```markdown
# E2E Traceability Matrix
## Acceptance Criteria Coverage
| AC ID | Acceptance Criterion | Test IDs | Coverage |
|-------|---------------------|----------|----------|
| AC-01 | [criterion text] | FT-P-01, NFT-PERF-01 | Covered |
| AC-02 | [criterion text] | FT-P-02, FT-N-01 | Covered |
| AC-03 | [criterion text] | — | NOT COVERED — [reason and mitigation] |
## Restrictions Coverage
| Restriction ID | Restriction | Test IDs | Coverage |
|---------------|-------------|----------|----------|
| RESTRICT-01 | [restriction text] | FT-N-02, NFT-RES-LIM-01 | Covered |
| RESTRICT-02 | [restriction text] | — | NOT COVERED — [reason and mitigation] |
## Coverage Summary
| Category | Total Items | Covered | Not Covered | Coverage % |
|----------|-----------|---------|-------------|-----------|
| Acceptance Criteria | [N] | [N] | [N] | [%] |
| Restrictions | [N] | [N] | [N] | [%] |
| **Total** | [N] | [N] | [N] | [%] |
## Uncovered Items Analysis
| Item | Reason Not Covered | Risk | Mitigation |
|------|-------------------|------|-----------|
| [AC/Restriction ID] | [why it cannot be tested at E2E level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
```
---
## Guidance Notes
- Every acceptance criterion must appear in the matrix — either covered or explicitly marked as not covered with a reason.
- Every restriction must appear in the matrix.
- NOT COVERED items must have a reason and a mitigation strategy (e.g., "covered at component test level" or "requires real hardware").
- Coverage percentage should be at least 75% for acceptance criteria at the E2E level.
@@ -0,0 +1,99 @@
# Risk Register Template
Use this template for risk assessment. Save as `_docs/02_plans/risk_mitigations.md`.
Subsequent iterations: `risk_mitigations_02.md`, `risk_mitigations_03.md`, etc.
---
```markdown
# Risk Assessment — [Topic] — Iteration [##]
## Risk Scoring Matrix
| | Low Impact | Medium Impact | High Impact |
|--|------------|---------------|-------------|
| **High Probability** | Medium | High | Critical |
| **Medium Probability** | Low | Medium | High |
| **Low Probability** | Low | Low | Medium |
## Acceptance Criteria by Risk Level
| Level | Action Required |
|-------|----------------|
| Low | Accepted, monitored quarterly |
| Medium | Mitigation plan required before implementation |
| High | Mitigation + contingency plan required, reviewed weekly |
| Critical | Must be resolved before proceeding to next planning step |
## Risk Register
| ID | Risk | Category | Probability | Impact | Score | Mitigation | Owner | Status |
|----|------|----------|-------------|--------|-------|------------|-------|--------|
| R01 | [risk description] | [category] | High/Med/Low | High/Med/Low | Critical/High/Med/Low | [mitigation strategy] | [owner] | Open/Mitigated/Accepted |
| R02 | | | | | | | | |
## Risk Categories
### Technical Risks
- Technology choices may not meet requirements
- Integration complexity underestimated
- Performance targets unachievable
- Security vulnerabilities in design
- Data model cannot support future requirements
### Schedule Risks
- Dependencies delayed
- Scope creep from ambiguous requirements
- Underestimated complexity
### Resource Risks
- Key person dependency
- Team lacks experience with chosen technology
- Infrastructure not available in time
### External Risks
- Third-party API changes or deprecation
- Vendor reliability or pricing changes
- Regulatory or compliance changes
- Data source availability
## Detailed Risk Analysis
### R01: [Risk Title]
**Description**: [Detailed description of the risk]
**Trigger conditions**: [What would cause this risk to materialize]
**Affected components**: [List of components impacted]
**Mitigation strategy**:
1. [Action 1]
2. [Action 2]
**Contingency plan**: [What to do if mitigation fails]
**Residual risk after mitigation**: [Low/Medium/High]
**Documents updated**: [List architecture/component docs that were updated to reflect this mitigation]
---
### R02: [Risk Title]
(repeat structure above)
## Architecture/Component Changes Applied
| Risk ID | Document Modified | Change Description |
|---------|------------------|--------------------|
| R01 | `architecture.md` §3 | [what changed] |
| R01 | `components/02_[name]/description.md` §5 | [what changed] |
## Summary
**Total risks identified**: [N]
**Critical**: [N] | **High**: [N] | **Medium**: [N] | **Low**: [N]
**Risks mitigated this iteration**: [N]
**Risks requiring user decision**: [list]
```
@@ -0,0 +1,108 @@
# System Flows Template
Use this template for the system flows document. Save as `_docs/02_plans/system-flows.md`.
Individual flow diagrams go in `_docs/02_plans/diagrams/flows/flow_[name].md`.
---
```markdown
# [System Name] — System Flows
## Flow Inventory
| # | Flow Name | Trigger | Primary Components | Criticality |
|---|-----------|---------|-------------------|-------------|
| F1 | [name] | [user action / scheduled / event] | [component list] | High/Medium/Low |
| F2 | [name] | | | |
| ... | | | | |
## Flow Dependencies
| Flow | Depends On | Shares Data With |
|------|-----------|-----------------|
| F1 | — | F2 (via [entity]) |
| F2 | F1 must complete first | F3 |
---
## Flow F1: [Flow Name]
### Description
[1-2 sentences: what this flow does, who triggers it, what the outcome is]
### Preconditions
- [Condition 1]
- [Condition 2]
### Sequence Diagram
```mermaid
sequenceDiagram
participant User
participant ComponentA
participant ComponentB
participant Database
User->>ComponentA: [action]
ComponentA->>ComponentB: [call with params]
ComponentB->>Database: [query/write]
Database-->>ComponentB: [result]
ComponentB-->>ComponentA: [response]
ComponentA-->>User: [result]
```
### Flowchart
```mermaid
flowchart TD
Start([Trigger]) --> Step1[Step description]
Step1 --> Decision{Condition?}
Decision -->|Yes| Step2[Step description]
Decision -->|No| Step3[Step description]
Step2 --> EndNode([Result])
Step3 --> EndNode
```
### Data Flow
| Step | From | To | Data | Format |
|------|------|----|------|--------|
| 1 | [source] | [destination] | [what data] | [DTO/event/etc] |
| 2 | | | | |
### Error Scenarios
| Error | Where | Detection | Recovery |
|-------|-------|-----------|----------|
| [error type] | [which step] | [how detected] | [what happens] |
### Performance Expectations
| Metric | Target | Notes |
|--------|--------|-------|
| End-to-end latency | [target] | [conditions] |
| Throughput | [target] | [peak/sustained] |
---
## Flow F2: [Flow Name]
(repeat structure above)
```
---
## Mermaid Diagram Conventions
Follow these conventions for consistency across all flow diagrams:
- **Participants**: use component names matching `components/[##]_[name]`
- **Node IDs**: camelCase, no spaces (e.g., `validateInput`, `saveOrder`)
- **Decision nodes**: use `{Question?}` format
- **Start/End**: use `([label])` stadium shape
- **External systems**: use `[[label]]` subroutine shape
- **Subgraphs**: group by component or bounded context
- **No styling**: do not add colors or CSS classes — let the renderer theme handle it
- **Edge labels**: wrap special characters in quotes (e.g., `-->|"O(n) check"|`)
+172
View File
@@ -0,0 +1,172 @@
# Test Specification Template
Use this template for each component's test spec. Save as `components/[##]_[name]/tests.md`.
---
```markdown
# Test Specification — [Component Name]
## Acceptance Criteria Traceability
| AC ID | Acceptance Criterion | Test IDs | Coverage |
|-------|---------------------|----------|----------|
| AC-01 | [criterion from acceptance_criteria.md] | IT-01, AT-01 | Covered |
| AC-02 | [criterion] | PT-01 | Covered |
| AC-03 | [criterion] | — | NOT COVERED — [reason] |
---
## Integration Tests
### IT-01: [Test Name]
**Summary**: [One sentence: what this test verifies]
**Traces to**: AC-01, AC-03
**Description**: [Detailed test scenario]
**Input data**:
```
[specific input data for this test]
```
**Expected result**:
```
[specific expected output or state]
```
**Max execution time**: [e.g., 5s]
**Dependencies**: [other components/services that must be running]
---
### IT-02: [Test Name]
(repeat structure)
---
## Performance Tests
### PT-01: [Test Name]
**Summary**: [One sentence: what performance aspect is tested]
**Traces to**: AC-02
**Load scenario**:
- Concurrent users: [N]
- Request rate: [N req/s]
- Duration: [N minutes]
- Ramp-up: [strategy]
**Expected results**:
| Metric | Target | Failure Threshold |
|--------|--------|-------------------|
| Latency (p50) | [target] | [max] |
| Latency (p95) | [target] | [max] |
| Latency (p99) | [target] | [max] |
| Throughput | [target req/s] | [min req/s] |
| Error rate | [target %] | [max %] |
**Resource limits**:
- CPU: [max %]
- Memory: [max MB/GB]
- Database connections: [max pool size]
---
### PT-02: [Test Name]
(repeat structure)
---
## Security Tests
### ST-01: [Test Name]
**Summary**: [One sentence: what security aspect is tested]
**Traces to**: AC-04
**Attack vector**: [e.g., SQL injection on search endpoint, privilege escalation via direct ID access]
**Test procedure**:
1. [Step 1]
2. [Step 2]
**Expected behavior**: [what the system should do — reject, sanitize, log, etc.]
**Pass criteria**: [specific measurable condition]
**Fail criteria**: [what constitutes a failure]
---
### ST-02: [Test Name]
(repeat structure)
---
## Acceptance Tests
### AT-01: [Test Name]
**Summary**: [One sentence: what user-facing behavior is verified]
**Traces to**: AC-01
**Preconditions**:
- [Precondition 1]
- [Precondition 2]
**Steps**:
| Step | Action | Expected Result |
|------|--------|-----------------|
| 1 | [user action] | [expected outcome] |
| 2 | [user action] | [expected outcome] |
| 3 | [user action] | [expected outcome] |
---
### AT-02: [Test Name]
(repeat structure)
---
## Test Data Management
**Required test data**:
| Data Set | Description | Source | Size |
|----------|-------------|--------|------|
| [name] | [what it contains] | [generated / fixture / copy of prod subset] | [approx size] |
**Setup procedure**:
1. [How to prepare the test environment]
2. [How to load test data]
**Teardown procedure**:
1. [How to clean up after tests]
2. [How to restore initial state]
**Data isolation strategy**: [How tests are isolated from each other — separate DB, transactions, namespacing]
```
---
## Guidance Notes
- Every test MUST trace back to at least one acceptance criterion (AC-XX). If a test doesn't trace to any, question whether it's needed.
- If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2").
- Performance test targets should come from the NFR section in `architecture.md`.
- Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component.
- Not every component needs all 4 test types. A stateless utility component may only need integration tests.
+240
View File
@@ -0,0 +1,240 @@
---
name: problem
description: |
Interactive problem gathering skill that builds _docs/00_problem/ through structured interview.
Iteratively asks probing questions until the problem, restrictions, acceptance criteria, and input data
are fully understood. Produces all required files for downstream skills (research, plan, etc.).
Trigger phrases:
- "problem", "define problem", "problem gathering"
- "what am I building", "describe problem"
- "start project", "new project"
category: build
tags: [problem, gathering, interview, requirements, acceptance-criteria]
disable-model-invocation: true
---
# Problem Gathering
Build a complete problem definition through structured, interactive interview with the user. Produces all required files in `_docs/00_problem/` that downstream skills (research, plan, decompose, implement, deploy) depend on.
## Core Principles
- **Ask, don't assume**: never infer requirements the user hasn't stated
- **Exhaust before writing**: keep asking until all dimensions are covered; do not write files prematurely
- **Concrete over vague**: push for measurable values, specific constraints, real numbers
- **Save immediately**: once the user confirms, write all files at once
- **User is the authority**: the AI suggests, the user decides
## Context Resolution
Fixed paths:
- OUTPUT_DIR: `_docs/00_problem/`
- INPUT_DATA_DIR: `_docs/00_problem/input_data/`
## Prerequisite Checks
1. If OUTPUT_DIR already exists and contains files, present what exists and ask user: **resume and fill gaps, overwrite, or skip?**
2. If overwrite or fresh start, create OUTPUT_DIR and INPUT_DATA_DIR
## Completeness Criteria
The interview is complete when the AI can write ALL of these:
| File | Complete when |
|------|--------------|
| `problem.md` | Clear problem statement: what is being built, why, for whom, what it does |
| `restrictions.md` | All constraints identified: hardware, software, environment, operational, regulatory, budget, timeline |
| `acceptance_criteria.md` | Measurable success criteria with specific numeric targets grouped by category |
| `input_data/` | At least one reference data file or detailed data description document |
| `security_approach.md` | (optional) Security requirements identified, or explicitly marked as not applicable |
## Interview Protocol
### Phase 1: Open Discovery
Start with broad, open questions. Let the user describe the problem in their own words.
**Opening**: Ask the user to describe what they are building and what problem it solves. Do not interrupt or narrow down yet.
After the user responds, summarize what you understood and ask: "Did I get this right? What did I miss?"
### Phase 2: Structured Probing
Work through each dimension systematically. For each dimension, ask only what the user hasn't already covered. Skip dimensions that were fully answered in Phase 1.
**Dimension checklist:**
1. **Problem & Goals**
- What exactly does the system do?
- What problem does it solve? Why does it need to exist?
- Who are the users / operators / stakeholders?
- What is the expected usage pattern (frequency, load, environment)?
2. **Scope & Boundaries**
- What is explicitly IN scope?
- What is explicitly OUT of scope?
- Are there related systems this integrates with?
- What does the system NOT do (common misconceptions)?
3. **Hardware & Environment**
- What hardware does it run on? (CPU, GPU, memory, storage)
- What operating system / platform?
- What is the deployment environment? (cloud, edge, embedded, on-prem)
- Any physical constraints? (power, thermal, size, connectivity)
4. **Software & Tech Constraints**
- Required programming languages or frameworks?
- Required protocols or interfaces?
- Existing systems it must integrate with?
- Libraries or tools that must or must not be used?
5. **Acceptance Criteria**
- What does "done" look like?
- Performance targets: latency, throughput, accuracy, error rates?
- Quality bars: reliability, availability, recovery time?
- Push for specific numbers: "less than Xms", "above Y%", "within Z meters"
- Edge cases: what happens when things go wrong?
- Startup and shutdown behavior?
6. **Input Data**
- What data does the system consume?
- Formats, schemas, volumes, update frequency?
- Does the user have sample/reference data to provide?
- If no data exists yet, what would representative data look like?
7. **Security** (optional, probe gently)
- Authentication / authorization requirements?
- Data sensitivity (PII, classified, proprietary)?
- Communication security (encryption, TLS)?
- If the user says "not a concern", mark as N/A and move on
8. **Operational Constraints**
- Budget constraints?
- Timeline constraints?
- Team size / expertise constraints?
- Regulatory or compliance requirements?
- Geographic restrictions?
### Phase 3: Gap Analysis
After all dimensions are covered:
1. Internally assess completeness against the Completeness Criteria table
2. Present a completeness summary to the user:
```
Completeness Check:
- problem.md: READY / GAPS: [list missing aspects]
- restrictions.md: READY / GAPS: [list missing aspects]
- acceptance_criteria.md: READY / GAPS: [list missing aspects]
- input_data/: READY / GAPS: [list missing aspects]
- security_approach.md: READY / N/A / GAPS: [list missing aspects]
```
3. If gaps exist, ask targeted follow-up questions for each gap
4. Repeat until all required files show READY
### Phase 4: Draft & Confirm
1. Draft all files in the conversation (show the user what will be written)
2. Present each file's content for review
3. Ask: "Should I save these files? Any changes needed?"
4. Apply any requested changes
5. Save all files to OUTPUT_DIR
## Output File Formats
### problem.md
Free-form text. Clear, concise description of:
- What is being built
- What problem it solves
- How it works at a high level
- Key context the reader needs to understand the problem
No headers required. Paragraph format. Should be readable by someone unfamiliar with the project.
### restrictions.md
Categorized constraints with markdown headers and bullet points:
```markdown
# [Category Name]
- Constraint description with specific values where applicable
- Another constraint
```
Categories are derived from the interview (hardware, software, environment, operational, etc.). Each restriction should be specific and testable.
### acceptance_criteria.md
Categorized measurable criteria with markdown headers and bullet points:
```markdown
# [Category Name]
- Criterion with specific numeric target
- Another criterion with measurable threshold
```
Every criterion must have a measurable value. Vague criteria like "should be fast" are not acceptable — push for "less than 400ms end-to-end".
### input_data/
At least one file. Options:
- User provides actual data files (CSV, JSON, images, etc.) — save as-is
- User describes data parameters — save as `data_parameters.md`
- User provides URLs to data — save as `data_sources.md` with links and descriptions
### security_approach.md (optional)
If security requirements exist, document them. If the user says security is not a concern for this project, skip this file entirely.
## Progress Tracking
Create a TodoWrite with phases 1-4. Update as each phase completes.
## Escalation Rules
| Situation | Action |
|-----------|--------|
| User cannot provide acceptance criteria numbers | Suggest industry benchmarks, ASK user to confirm or adjust |
| User has no input data at all | ASK what representative data would look like, create a `data_parameters.md` describing expected data |
| User says "I don't know" to a critical dimension | Research the domain briefly, suggest reasonable defaults, ASK user to confirm |
| Conflicting requirements discovered | Present the conflict, ASK user which takes priority |
| User wants to skip a required file | Explain why downstream skills need it, ASK if they want a minimal placeholder |
## Common Mistakes
- **Writing files before the interview is complete**: gather everything first, then write
- **Accepting vague criteria**: "fast", "accurate", "reliable" are not acceptance criteria without numbers
- **Assuming technical choices**: do not suggest specific technologies unless the user constrains them
- **Over-engineering the problem statement**: problem.md should be concise, not a dissertation
- **Inventing restrictions**: only document what the user actually states as a constraint
- **Skipping input data**: downstream skills (especially research and plan) need concrete data context
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Problem Gathering (4-Phase Interview) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ: Check if _docs/00_problem/ exists (resume/overwrite?) │
│ │
│ Phase 1: Open Discovery │
│ → "What are you building?" → summarize → confirm │
│ Phase 2: Structured Probing │
│ → 8 dimensions: problem, scope, hardware, software, │
│ acceptance criteria, input data, security, operations │
│ → skip what Phase 1 already covered │
│ Phase 3: Gap Analysis │
│ → assess completeness per file → fill gaps iteratively │
│ Phase 4: Draft & Confirm │
│ → show all files → user confirms → save to _docs/00_problem/ │
├────────────────────────────────────────────────────────────────┤
│ Principles: Ask don't assume · Concrete over vague │
│ Exhaust before writing · User is authority │
└────────────────────────────────────────────────────────────────┘
```
+471
View File
@@ -0,0 +1,471 @@
---
name: refactor
description: |
Structured refactoring workflow (6-phase method) with three execution modes:
- Full Refactoring: all 6 phases — baseline, discovery, analysis, safety net, execution, hardening
- Targeted Refactoring: skip discovery if docs exist, focus on a specific component/area
- Quick Assessment: phases 0-2 only, outputs a refactoring plan without execution
Supports project mode (_docs/ structure) and standalone mode (@file.md).
Trigger phrases:
- "refactor", "refactoring", "improve code"
- "analyze coupling", "decoupling", "technical debt"
- "refactoring assessment", "code quality improvement"
category: evolve
tags: [refactoring, coupling, technical-debt, performance, hardening]
disable-model-invocation: true
---
# Structured Refactoring (6-Phase Method)
Transform existing codebases through a systematic refactoring workflow: capture baseline, document current state, research improvements, build safety net, execute changes, and harden.
## Core Principles
- **Preserve behavior first**: never refactor without a passing test suite
- **Measure before and after**: every change must be justified by metrics
- **Small incremental changes**: commit frequently, never break tests
- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
- **Ask, don't assume**: when scope or priorities are unclear, STOP and ask the user
## Context Resolution
Determine the operating mode based on invocation before any other logic runs.
**Project mode** (no explicit input file provided):
- PROBLEM_DIR: `_docs/00_problem/`
- SOLUTION_DIR: `_docs/01_solution/`
- COMPONENTS_DIR: `_docs/02_components/`
- TESTS_DIR: `_docs/02_tests/`
- REFACTOR_DIR: `_docs/04_refactoring/`
- All existing guardrails apply.
**Standalone mode** (explicit input file provided, e.g. `/refactor @some_component.md`):
- INPUT_FILE: the provided file (treated as component/area description)
- REFACTOR_DIR: `_standalone/refactoring/`
- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
- `acceptance_criteria.md` is optional — warn if absent
Announce the detected mode and resolved paths to the user before proceeding.
## Mode Detection
After context resolution, determine the execution mode:
1. **User explicitly says** "quick assessment" or "just assess" → **Quick Assessment**
2. **User explicitly says** "refactor [component/file/area]" with a specific target → **Targeted Refactoring**
3. **Default****Full Refactoring**
| Mode | Phases Executed | When to Use |
|------|----------------|-------------|
| **Full Refactoring** | 0 → 1 → 2 → 3 → 4 → 5 | Complete refactoring of a system or major area |
| **Targeted Refactoring** | 0 → (skip 1 if docs exist) → 2 → 3 → 4 → 5 | Refactor a specific component; docs already exist |
| **Quick Assessment** | 0 → 1 → 2 | Produce a refactoring roadmap without executing changes |
Inform the user which mode was detected and confirm before proceeding.
## Prerequisite Checks (BLOCKING)
**Project mode:**
1. PROBLEM_DIR exists with `problem.md` (or `problem_description.md`) — **STOP if missing**, ask user to create it
2. If `acceptance_criteria.md` is missing: **warn** and ask whether to proceed
3. Create REFACTOR_DIR if it does not exist
4. If REFACTOR_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
**Standalone mode:**
1. INPUT_FILE exists and is non-empty — **STOP if missing**
2. Warn if no `acceptance_criteria.md` provided
3. Create REFACTOR_DIR if it does not exist
## Artifact Management
### Directory Structure
```
REFACTOR_DIR/
├── baseline_metrics.md (Phase 0)
├── discovery/
│ ├── components/
│ │ └── [##]_[name].md (Phase 1)
│ ├── solution.md (Phase 1)
│ └── system_flows.md (Phase 1)
├── analysis/
│ ├── research_findings.md (Phase 2)
│ └── refactoring_roadmap.md (Phase 2)
├── test_specs/
│ └── [##]_[test_name].md (Phase 3)
├── coupling_analysis.md (Phase 4)
├── execution_log.md (Phase 4)
├── hardening/
│ ├── technical_debt.md (Phase 5)
│ ├── performance.md (Phase 5)
│ └── security.md (Phase 5)
└── FINAL_report.md (after all phases)
```
### Save Timing
| Phase | Save immediately after | Filename |
|-------|------------------------|----------|
| Phase 0 | Baseline captured | `baseline_metrics.md` |
| Phase 1 | Each component documented | `discovery/components/[##]_[name].md` |
| Phase 1 | Solution synthesized | `discovery/solution.md`, `discovery/system_flows.md` |
| Phase 2 | Research complete | `analysis/research_findings.md` |
| Phase 2 | Roadmap produced | `analysis/refactoring_roadmap.md` |
| Phase 3 | Test specs written | `test_specs/[##]_[test_name].md` |
| Phase 4 | Coupling analyzed | `coupling_analysis.md` |
| Phase 4 | Execution complete | `execution_log.md` |
| Phase 5 | Each hardening track | `hardening/<track>.md` |
| Final | All phases done | `FINAL_report.md` |
### Resumability
If REFACTOR_DIR already contains artifacts:
1. List existing files and match to the save timing table
2. Identify the last completed phase based on which artifacts exist
3. Resume from the next incomplete phase
4. Inform the user which phases are being skipped
## Progress Tracking
At the start of execution, create a TodoWrite with all applicable phases. Update status as each phase completes.
## Workflow
### Phase 0: Context & Baseline
**Role**: Software engineer preparing for refactoring
**Goal**: Collect refactoring goals and capture baseline metrics
**Constraints**: Measurement only — no code changes
#### 0a. Collect Goals
If PROBLEM_DIR files do not yet exist, help the user create them:
1. `problem.md` — what the system currently does, what changes are needed, pain points
2. `acceptance_criteria.md` — success criteria for the refactoring
3. `security_approach.md` — security requirements (if applicable)
Store in PROBLEM_DIR.
#### 0b. Capture Baseline
1. Read problem description and acceptance criteria
2. Measure current system metrics using project-appropriate tools:
| Metric Category | What to Capture |
|----------------|-----------------|
| **Coverage** | Overall, unit, integration, critical paths |
| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
| **Code Smells** | Total, critical, major |
| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
| **Dependencies** | Total count, outdated, security vulnerabilities |
| **Build** | Build time, test execution time, deployment time |
3. Create functionality inventory: all features/endpoints with status and coverage
**Self-verification**:
- [ ] All metric categories measured (or noted as N/A with reason)
- [ ] Functionality inventory is complete
- [ ] Measurements are reproducible
**Save action**: Write `REFACTOR_DIR/baseline_metrics.md`
**BLOCKING**: Present baseline summary to user. Do NOT proceed until user confirms.
---
### Phase 1: Discovery
**Role**: Principal software architect
**Goal**: Generate documentation from existing code and form solution description
**Constraints**: Document what exists, not what should be. No code changes.
**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.
#### 1a. Document Components
For each component in the codebase:
1. Analyze project structure, directories, files
2. Go file by file, analyze each method
3. Analyze connections between components
Write per component to `REFACTOR_DIR/discovery/components/[##]_[name].md`:
- Purpose and architectural patterns
- Mermaid diagrams for logic flows
- API reference table (name, description, input, output)
- Implementation details: algorithmic complexity, state management, dependencies
- Caveats, edge cases, known limitations
#### 1b. Synthesize Solution & Flows
1. Review all generated component documentation
2. Synthesize into a cohesive solution description
3. Create flow diagrams showing component interactions
Write:
- `REFACTOR_DIR/discovery/solution.md` — product description, component overview, interaction diagram
- `REFACTOR_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case
Also copy to project standard locations if in project mode:
- `SOLUTION_DIR/solution.md`
- `COMPONENTS_DIR/system_flows.md`
**Self-verification**:
- [ ] Every component in the codebase is documented
- [ ] Solution description covers all components
- [ ] Flow diagrams cover all major use cases
- [ ] Mermaid diagrams are syntactically correct
**Save action**: Write discovery artifacts
**BLOCKING**: Present discovery summary to user. Do NOT proceed until user confirms documentation accuracy.
---
### Phase 2: Analysis
**Role**: Researcher and software architect
**Goal**: Research improvements and produce a refactoring roadmap
**Constraints**: Analysis only — no code changes
#### 2a. Deep Research
1. Analyze current implementation patterns
2. Research modern approaches for similar systems
3. Identify what could be done differently
4. Suggest improvements based on state-of-the-art practices
Write `REFACTOR_DIR/analysis/research_findings.md`:
- Current state analysis: patterns used, strengths, weaknesses
- Alternative approaches per component: current vs alternative, pros/cons, migration effort
- Prioritized recommendations: quick wins + strategic improvements
#### 2b. Solution Assessment
1. Assess current implementation against acceptance criteria
2. Identify weak points in codebase, map to specific code areas
3. Perform gap analysis: acceptance criteria vs current state
4. Prioritize changes by impact and effort
Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
- Weak points assessment: location, description, impact, proposed solution
- Gap analysis: what's missing, what needs improvement
- Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
**Self-verification**:
- [ ] All acceptance criteria are addressed in gap analysis
- [ ] Recommendations are grounded in actual code, not abstract
- [ ] Roadmap phases are prioritized by impact
- [ ] Quick wins are identified separately
**Save action**: Write analysis artifacts
**BLOCKING**: Present refactoring roadmap to user. Do NOT proceed until user confirms.
**Quick Assessment mode stops here.** Present final summary and write `FINAL_report.md` with phases 0-2 content.
---
### Phase 3: Safety Net
**Role**: QA engineer and developer
**Goal**: Design and implement tests that capture current behavior before refactoring
**Constraints**: Tests must all pass on the current codebase before proceeding
#### 3a. Design Test Specs
Coverage requirements (must meet before refactoring):
- Minimum overall coverage: 75%
- Critical path coverage: 90%
- All public APIs must have integration tests
- All error handling paths must be tested
For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
- Integration tests: summary, current behavior, input data, expected result, max expected time
- Acceptance tests: summary, preconditions, steps with expected results
- Coverage analysis: current %, target %, uncovered critical paths
#### 3b. Implement Tests
1. Set up test environment and infrastructure if not exists
2. Implement each test from specs
3. Run tests, verify all pass on current codebase
4. Document any discovered issues
**Self-verification**:
- [ ] Coverage requirements met (75% overall, 90% critical paths)
- [ ] All tests pass on current codebase
- [ ] All public APIs have integration tests
- [ ] Test data fixtures are configured
**Save action**: Write test specs; implemented tests go into the project's test folder
**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 4. If tests fail, fix the tests (not the code) or ask user for guidance. Do NOT proceed to Phase 4 with failing tests.
---
### Phase 4: Execution
**Role**: Software architect and developer
**Goal**: Analyze coupling and execute decoupling changes
**Constraints**: Small incremental changes; tests must stay green after every change
#### 4a. Analyze Coupling
1. Analyze coupling between components/modules
2. Map dependencies (direct and transitive)
3. Identify circular dependencies
4. Form decoupling strategy
Write `REFACTOR_DIR/coupling_analysis.md`:
- Dependency graph (Mermaid)
- Coupling metrics per component
- Problem areas: components involved, coupling type, severity, impact
- Decoupling strategy: priority order, proposed interfaces/abstractions, effort estimates
**BLOCKING**: Present coupling analysis to user. Do NOT proceed until user confirms strategy.
#### 4b. Execute Decoupling
For each change in the decoupling strategy:
1. Implement the change
2. Run integration tests
3. Fix any failures
4. Commit with descriptive message
Address code smells encountered: long methods, large classes, duplicate code, dead code, magic numbers.
Write `REFACTOR_DIR/execution_log.md`:
- Change description, files affected, test status per change
- Before/after metrics comparison against baseline
**Self-verification**:
- [ ] All tests still pass after execution
- [ ] No circular dependencies remain (or reduced per plan)
- [ ] Code smells addressed
- [ ] Metrics improved compared to baseline
**Save action**: Write execution artifacts
**BLOCKING**: Present execution summary to user. Do NOT proceed until user confirms.
---
### Phase 5: Hardening (Optional, Parallel Tracks)
**Role**: Varies per track
**Goal**: Address technical debt, performance, and security
**Constraints**: Each track is optional; user picks which to run
Present the three tracks and let user choose which to execute:
#### Track A: Technical Debt
**Role**: Technical debt analyst
1. Identify and categorize debt items: design, code, test, documentation
2. Assess each: location, description, impact, effort, interest (cost of not fixing)
3. Prioritize: quick wins → strategic debt → tolerable debt
4. Create actionable plan with prevention measures
Write `REFACTOR_DIR/hardening/technical_debt.md`
#### Track B: Performance Optimization
**Role**: Performance engineer
1. Profile current performance, identify bottlenecks
2. For each bottleneck: location, symptom, root cause, impact
3. Propose optimizations with expected improvement and risk
4. Implement one at a time, benchmark after each change
5. Verify tests still pass
Write `REFACTOR_DIR/hardening/performance.md` with before/after benchmarks
#### Track C: Security Review
**Role**: Security engineer
1. Review code against OWASP Top 10
2. Verify security requirements from `security_approach.md` are met
3. Check: authentication, authorization, input validation, output encoding, encryption, logging
Write `REFACTOR_DIR/hardening/security.md`:
- Vulnerability assessment: location, type, severity, exploit scenario, fix
- Security controls review
- Compliance check against `security_approach.md`
- Recommendations: critical fixes, improvements, hardening
**Self-verification** (per track):
- [ ] All findings are grounded in actual code
- [ ] Recommendations are actionable with effort estimates
- [ ] All tests still pass after any changes
**Save action**: Write hardening artifacts
---
## Final Report
After all executed phases complete, write `REFACTOR_DIR/FINAL_report.md`:
- Refactoring mode used and phases executed
- Baseline metrics vs final metrics comparison
- Changes made summary
- Remaining items (deferred to future)
- Lessons learned
## Escalation Rules
| Situation | Action |
|-----------|--------|
| Unclear refactoring scope | **ASK user** |
| Ambiguous acceptance criteria | **ASK user** |
| Tests failing before refactoring | **ASK user** — fix tests or fix code? |
| Coupling change risks breaking external contracts | **ASK user** |
| Performance optimization vs readability trade-off | **ASK user** |
| Missing baseline metrics (no test suite, no CI) | **WARN user**, suggest building safety net first |
| Security vulnerability found during refactoring | **WARN user** immediately, don't defer |
## Trigger Conditions
When the user wants to:
- Improve existing code structure or quality
- Reduce technical debt or coupling
- Prepare codebase for new features
- Assess code health before major changes
**Keywords**: "refactor", "refactoring", "improve code", "reduce coupling", "technical debt", "code quality", "decoupling"
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Structured Refactoring (6-Phase Method) │
├────────────────────────────────────────────────────────────────┤
│ CONTEXT: Resolve mode (project vs standalone) + set paths │
│ MODE: Full / Targeted / Quick Assessment │
│ │
│ 0. Context & Baseline → baseline_metrics.md │
│ [BLOCKING: user confirms baseline] │
│ 1. Discovery → discovery/ (components, solution) │
│ [BLOCKING: user confirms documentation] │
│ 2. Analysis → analysis/ (research, roadmap) │
│ [BLOCKING: user confirms roadmap] │
│ ── Quick Assessment stops here ── │
│ 3. Safety Net → test_specs/ + implemented tests │
│ [GATE: all tests must pass] │
│ 4. Execution → coupling_analysis, execution_log │
│ [BLOCKING: user confirms changes] │
│ 5. Hardening → hardening/ (debt, perf, security) │
│ [optional, user picks tracks] │
│ ───────────────────────────────────────────────── │
│ FINAL_report.md │
├────────────────────────────────────────────────────────────────┤
│ Principles: Preserve behavior · Measure before/after │
│ Small changes · Save immediately · Ask don't assume│
└────────────────────────────────────────────────────────────────┘
```
+708
View File
@@ -0,0 +1,708 @@
---
name: deep-research
description: |
Deep Research Methodology (8-Step Method) with two execution modes:
- Mode A (Initial Research): Assess acceptance criteria, then research problem and produce solution draft
- Mode B (Solution Assessment): Assess existing solution draft for weak points and produce revised draft
Supports project mode (_docs/ structure) and standalone mode (@file.md).
Auto-detects research mode based on existing solution_draft files.
Trigger phrases:
- "research", "deep research", "deep dive", "in-depth analysis"
- "research this", "investigate", "look into"
- "assess solution", "review solution draft"
- "comparative analysis", "concept comparison", "technical comparison"
category: build
tags: [research, analysis, solution-design, comparison, decision-support]
---
# Deep Research (8-Step Method)
Transform vague topics raised by users into high-quality, deliverable research reports through a systematic methodology. Operates in two modes: **Initial Research** (produce new solution draft) and **Solution Assessment** (assess and revise existing draft).
## Core Principles
- **Conclusions come from mechanism comparison, not "gut feelings"**
- **Pin down the facts first, then reason**
- **Prioritize authoritative sources: L1 > L2 > L3 > L4**
- **Intermediate results must be saved for traceability and reuse**
- **Ask, don't assume** — when any aspect of the problem, criteria, or restrictions is unclear, STOP and ask the user before proceeding
## Context Resolution
Determine the operating mode based on invocation before any other logic runs.
**Project mode** (no explicit input file provided):
- INPUT_DIR: `_docs/00_problem/`
- OUTPUT_DIR: `_docs/01_solution/`
- RESEARCH_DIR: `_docs/00_research/`
- All existing guardrails, mode detection, and draft numbering apply as-is.
**Standalone mode** (explicit input file provided, e.g. `/research @some_doc.md`):
- INPUT_FILE: the provided file (treated as problem description)
- OUTPUT_DIR: `_standalone/01_solution/`
- RESEARCH_DIR: `_standalone/00_research/`
- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
- `restrictions.md` and `acceptance_criteria.md` are optional — warn if absent, proceed if user confirms
- Mode detection uses OUTPUT_DIR for `solution_draft*.md` scanning
- Draft numbering works the same, scoped to OUTPUT_DIR
- **Final step**: after all research is complete, move INPUT_FILE into `_standalone/`
Announce the detected mode and resolved paths to the user before proceeding.
## Project Integration
### Prerequisite Guardrails (BLOCKING)
Before any research begins, verify the input context exists. **Do not proceed if guardrails fail.**
**Project mode:**
1. Check INPUT_DIR exists — **STOP if missing**, ask user to create it and provide problem files
2. Check `problem.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
3. Check `restrictions.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
4. Check `acceptance_criteria.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
5. Check `input_data/` in INPUT_DIR exists and contains at least one file — **STOP if missing**
6. Read **all** files in INPUT_DIR to ground the investigation in the project context
7. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
**Standalone mode:**
1. Check INPUT_FILE exists and is non-empty — **STOP if missing**
2. Warn if no `restrictions.md` or `acceptance_criteria.md` were provided alongside INPUT_FILE — proceed if user confirms
3. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
### Mode Detection
After guardrails pass, determine the execution mode:
1. Scan OUTPUT_DIR for files matching `solution_draft*.md`
2. **No matches found****Mode A: Initial Research**
3. **Matches found****Mode B: Solution Assessment** (use the highest-numbered draft as input)
4. **User override**: if the user explicitly says "research from scratch" or "initial research", force Mode A regardless of existing drafts
Inform the user which mode was detected and confirm before proceeding.
### Solution Draft Numbering
All final output is saved as `OUTPUT_DIR/solution_draft##.md` with a 2-digit zero-padded number:
1. Scan existing files in OUTPUT_DIR matching `solution_draft*.md`
2. Extract the highest existing number
3. Increment by 1
4. Zero-pad to 2 digits (e.g., `01`, `02`, ..., `10`, `11`)
Example: if `solution_draft01.md` through `solution_draft10.md` exist, the next output is `solution_draft11.md`.
### Working Directory & Intermediate Artifact Management
#### Directory Structure
At the start of research, **must** create a working directory under RESEARCH_DIR:
```
RESEARCH_DIR/
├── 00_ac_assessment.md # Mode A Phase 1 output: AC & restrictions assessment
├── 00_question_decomposition.md # Step 0-1 output
├── 01_source_registry.md # Step 2 output: all consulted source links
├── 02_fact_cards.md # Step 3 output: extracted facts
├── 03_comparison_framework.md # Step 4 output: selected framework and populated data
├── 04_reasoning_chain.md # Step 6 output: fact → conclusion reasoning
├── 05_validation_log.md # Step 7 output: use-case validation results
└── raw/ # Raw source archive (optional)
├── source_1.md
└── source_2.md
```
### Save Timing & Content
| Step | Save immediately after completion | Filename |
|------|-----------------------------------|----------|
| Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
| Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
| Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
| Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
| Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
| Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
### Save Principles
1. **Save immediately**: Write to the corresponding file as soon as a step is completed; don't wait until the end
2. **Incremental updates**: Same file can be updated multiple times; append or replace new content
3. **Preserve process**: Keep intermediate files even after their content is integrated into the final report
4. **Enable recovery**: If research is interrupted, progress can be recovered from intermediate files
## Execution Flow
### Mode A: Initial Research
Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the user explicitly requests initial research.
#### Phase 1: AC & Restrictions Assessment (BLOCKING)
**Role**: Professional software architect
A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
**Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
**Task**:
1. Read all problem context files thoroughly
2. **ASK the user about every unclear aspect** — do not assume:
- Unclear problem boundaries → ask
- Ambiguous acceptance criteria values → ask
- Missing context (no `security_approach.md`, no `input_data/`) → ask what they have
- Conflicting restrictions → ask which takes priority
3. Research in internet:
- How realistic are the acceptance criteria for this specific domain?
- How critical is each criterion?
- What domain-specific acceptance criteria are we missing?
- Impact of each criterion value on the whole system quality
- Cost/budget implications of each criterion
- Timeline implications — how long would it take to meet each criterion
4. Research restrictions:
- Are the restrictions realistic?
- Should any be tightened or relaxed?
- Are there additional restrictions we should add?
5. Verify findings with authoritative sources (official docs, papers, benchmarks)
**Uses Steps 0-3 of the 8-step engine** (question classification, decomposition, source tiering, fact extraction) scoped to AC and restrictions assessment.
**📁 Save action**: Write `RESEARCH_DIR/00_ac_assessment.md` with format:
```markdown
# Acceptance Criteria Assessment
## Acceptance Criteria
| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-----------|-----------|-------------------|---------------------|--------|
| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
## Restrictions Assessment
| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-------------|-----------|-------------------|---------------------|--------|
| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
## Key Findings
[Summary of critical findings]
## Sources
[Key references used]
```
**BLOCKING**: Present the AC assessment tables to the user. Wait for confirmation or adjustments before proceeding to Phase 2. The user may update `acceptance_criteria.md` or `restrictions.md` based on findings.
---
#### Phase 2: Problem Research & Solution Draft
**Role**: Professional researcher and software architect
Full 8-step research methodology. Produces the first solution draft.
**Input**: All files from INPUT_DIR (possibly updated after Phase 1) + Phase 1 artifacts
**Task** (drives the 8-step engine):
1. Research existing/competitor solutions for similar problems
2. Research the problem thoroughly — all possible ways to solve it, split into components
3. For each component, research all possible solutions and find the most efficient state-of-the-art approaches
4. Verify that suggested tools/libraries actually exist and work as described
5. Include security considerations in each component analysis
6. Provide rough cost estimates for proposed solutions
Be concise in formulating. The fewer words, the better, but do not miss any important details.
**📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
---
#### Phase 3: Tech Stack Consolidation (OPTIONAL)
**Role**: Software architect evaluating technology choices
Focused synthesis step — no new 8-step cycle. Uses research already gathered in Phase 2 to make concrete technology decisions.
**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + all files from INPUT_DIR
**Task**:
1. Extract technology options from the solution draft's component comparison tables
2. Score each option against: fitness for purpose, maturity, security track record, team expertise, cost, scalability
3. Produce a tech stack summary with selection rationale
4. Assess risks and learning requirements per technology choice
**📁 Save action**: Write `OUTPUT_DIR/tech_stack.md` with:
- Requirements analysis (functional, non-functional, constraints)
- Technology evaluation tables (language, framework, database, infrastructure, key libraries) with scores
- Tech stack summary block
- Risk assessment and learning requirements tables
---
#### Phase 4: Security Deep Dive (OPTIONAL)
**Role**: Security architect
Focused analysis step — deepens the security column from the solution draft into a proper threat model and controls specification.
**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + `security_approach.md` from INPUT_DIR + problem context
**Task**:
1. Build threat model: asset inventory, threat actors, attack vectors
2. Define security requirements and proposed controls per component (with risk level)
3. Summarize authentication/authorization, data protection, secure communication, and logging/monitoring approach
**📁 Save action**: Write `OUTPUT_DIR/security_analysis.md` with:
- Threat model (assets, actors, vectors)
- Per-component security requirements and controls table
- Security controls summary
---
### Mode B: Solution Assessment
Triggered when `solution_draft*.md` files exist in OUTPUT_DIR.
**Role**: Professional software architect
Full 8-step research methodology applied to assessing and improving an existing solution draft.
**Input**: All files from INPUT_DIR + the latest (highest-numbered) `solution_draft##.md` from OUTPUT_DIR
**Task** (drives the 8-step engine):
1. Read the existing solution draft thoroughly
2. Research in internet — identify all potential weak points and problems
3. Identify security weak points and vulnerabilities
4. Identify performance bottlenecks
5. Address these problems and find ways to solve them
6. Based on findings, form a new solution draft in the same format
**📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
**Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions above.
## Escalation Rules
| Situation | Action |
|-----------|--------|
| Unclear problem boundaries | **ASK user** |
| Ambiguous acceptance criteria values | **ASK user** |
| Missing context files (`security_approach.md`, `input_data/`) | **ASK user** what they have |
| Conflicting restrictions | **ASK user** which takes priority |
| Technology choice with multiple valid options | **ASK user** |
| Contradictions between input files | **ASK user** |
| Missing acceptance criteria or restrictions files | **WARN user**, ask whether to proceed |
| File naming within research artifacts | PROCEED |
| Source tier classification | PROCEED |
## Trigger Conditions
When the user wants to:
- Deeply understand a concept/technology/phenomenon
- Compare similarities and differences between two or more things
- Gather information and evidence for a decision
- Assess or improve an existing solution draft
**Keywords**:
- "deep research", "deep dive", "in-depth analysis"
- "research this", "investigate", "look into"
- "assess solution", "review draft", "improve solution"
- "comparative analysis", "concept comparison", "technical comparison"
**Differentiation from other Skills**:
- Needs a **visual knowledge graph** → use `research-to-diagram`
- Needs **written output** (articles/tutorials) → use `wsy-writer`
- Needs **material organization** → use `material-to-markdown`
- Needs **research + solution draft** → use this Skill
## Research Engine (8-Step Method)
The 8-step method is the core research engine used by both modes. Steps 0-1 and Step 8 have mode-specific behavior; Steps 2-7 are identical regardless of mode.
### Step 0: Question Type Classification
First, classify the research question type and select the corresponding strategy:
| Question Type | Core Task | Focus Dimensions |
|---------------|-----------|------------------|
| **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
| **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
| **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
| **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
| **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |
**Mode-specific classification**:
| Mode / Phase | Typical Question Type |
|--------------|----------------------|
| Mode A Phase 1 | Knowledge Organization + Decision Support |
| Mode A Phase 2 | Decision Support |
| Mode B | Problem Diagnosis + Decision Support |
### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)
Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.
**For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`
Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.
**📁 Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`
---
### Step 1: Question Decomposition & Boundary Definition
**Mode-specific sub-questions**:
**Mode A Phase 2** (Initial Research — Problem & Solution):
- "What existing/competitor solutions address this problem?"
- "What are the component parts of this problem?"
- "For each component, what are the state-of-the-art solutions?"
- "What are the security considerations per component?"
- "What are the cost implications of each approach?"
**Mode B** (Solution Assessment):
- "What are the weak points and potential problems in the existing draft?"
- "What are the security vulnerabilities in the proposed architecture?"
- "Where are the performance bottlenecks?"
- "What solutions exist for each identified issue?"
**General sub-question patterns** (use when applicable):
- **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
- **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
- **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
- **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)
**⚠️ Research Subject Boundary Definition (BLOCKING - must be explicit)**:
When decomposing questions, you must explicitly define the **boundaries of the research subject**:
| Dimension | Boundary to define | Example |
|-----------|--------------------|---------|
| **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
| **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
| **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
| **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
**Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.
**📁 Save action**:
1. Read all files from INPUT_DIR to ground the research in the project context
2. Create working directory `RESEARCH_DIR/`
3. Write `00_question_decomposition.md`, including:
- Original question
- Active mode (A Phase 2 or B) and rationale
- Summary of relevant problem context from INPUT_DIR
- Classified question type and rationale
- **Research subject boundary definition** (population, geography, timeframe, level)
- List of decomposed sub-questions
4. Write TodoWrite to track progress
### Step 2: Source Tiering & Authority Anchoring
Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.
**For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`
**Tool Usage**:
- Use `WebSearch` for broad searches; `WebFetch` to read specific pages
- Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
- Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
- When citing web sources, include the URL and date accessed
**📁 Save action**:
For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
### Step 3: Fact Extraction & Evidence Cards
Transform sources into **verifiable fact cards**:
```markdown
## Fact Cards
### Fact 1
- **Statement**: [specific fact description]
- **Source**: [link/document section]
- **Confidence**: High/Medium/Low
### Fact 2
...
```
**Key discipline**:
- Pin down facts first, then reason
- Distinguish "what officials said" from "what I infer"
- When conflicting information is found, annotate and preserve both sides
- Annotate confidence level:
- ✅ High: Explicitly stated in official documentation
- ⚠️ Medium: Mentioned in official blog but not formally documented
- ❓ Low: Inference or from unofficial sources
**📁 Save action**:
For each extracted fact, **immediately** append to `02_fact_cards.md`:
```markdown
## Fact #[number]
- **Statement**: [specific fact description]
- **Source**: [Source #number] [link]
- **Phase**: [Phase 1 / Phase 2 / Assessment]
- **Target Audience**: [which group this fact applies to, inherited from source or further refined]
- **Confidence**: ✅/⚠️/❓
- **Related Dimension**: [corresponding comparison dimension]
```
**⚠️ Target audience in fact statements**:
- If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
- Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
- Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"
### Step 4: Build Comparison/Analysis Framework
Based on the question type, select fixed analysis dimensions. **For dimension lists** (General, Concept Comparison, Decision Support): Read `references/comparison-frameworks.md`
**📁 Save action**:
Write to `03_comparison_framework.md`:
```markdown
# Comparison Framework
## Selected Framework Type
[Concept Comparison / Decision Support / ...]
## Selected Dimensions
1. [Dimension 1]
2. [Dimension 2]
...
## Initial Population
| Dimension | X | Y | Factual Basis |
|-----------|---|---|---------------|
| [Dimension 1] | [description] | [description] | Fact #1, #3 |
| ... | | | |
```
### Step 5: Reference Point Baseline Alignment
Ensure all compared parties have clear, consistent definitions:
**Checklist**:
- [ ] Is the reference point's definition stable/widely accepted?
- [ ] Does it need verification, or can domain common knowledge be used?
- [ ] Does the reader's understanding of the reference point match mine?
- [ ] Are there ambiguities that need to be clarified first?
### Step 6: Fact-to-Conclusion Reasoning Chain
Explicitly write out the "fact → comparison → conclusion" reasoning process:
```markdown
## Reasoning Process
### Regarding [Dimension Name]
1. **Fact confirmation**: According to [source], X's mechanism is...
2. **Compare with reference**: While Y's mechanism is...
3. **Conclusion**: Therefore, the difference between X and Y on this dimension is...
```
**Key discipline**:
- Conclusions come from mechanism comparison, not "gut feelings"
- Every conclusion must be traceable to specific facts
- Uncertain conclusions must be annotated
**📁 Save action**:
Write to `04_reasoning_chain.md`:
```markdown
# Reasoning Chain
## Dimension 1: [Dimension Name]
### Fact Confirmation
According to [Fact #X], X's mechanism is...
### Reference Comparison
While Y's mechanism is... (Source: [Fact #Y])
### Conclusion
Therefore, the difference between X and Y on this dimension is...
### Confidence
✅/⚠️/❓ + rationale
---
## Dimension 2: [Dimension Name]
...
```
### Step 7: Use-Case Validation (Sanity Check)
Validate conclusions against a typical scenario:
**Validation questions**:
- Based on my conclusions, how should this scenario be handled?
- Is that actually the case?
- Are there counterexamples that need to be addressed?
**Review checklist**:
- [ ] Are draft conclusions consistent with Step 3 fact cards?
- [ ] Are there any important dimensions missed?
- [ ] Is there any over-extrapolation?
- [ ] Are conclusions actionable/verifiable?
**📁 Save action**:
Write to `05_validation_log.md`:
```markdown
# Validation Log
## Validation Scenario
[Scenario description]
## Expected Based on Conclusions
If using X: [expected behavior]
If using Y: [expected behavior]
## Actual Validation Results
[actual situation]
## Counterexamples
[yes/no, describe if yes]
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [ ] Issue found: [if any]
## Conclusions Requiring Revision
[if any]
```
### Step 8: Deliverable Formatting
Make the output **readable, traceable, and actionable**.
**📁 Save action**:
Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` using the appropriate output template based on active mode:
- Mode A: `templates/solution_draft_mode_a.md`
- Mode B: `templates/solution_draft_mode_b.md`
Sources to integrate:
- Extract background from `00_question_decomposition.md`
- Reference key facts from `02_fact_cards.md`
- Organize conclusions from `04_reasoning_chain.md`
- Generate references from `01_source_registry.md`
- Supplement with use cases from `05_validation_log.md`
- For Mode A: include AC assessment from `00_ac_assessment.md`
## Solution Draft Output Templates
### Mode A: Initial Research Output
Use template: `templates/solution_draft_mode_a.md`
### Mode B: Solution Assessment Output
Use template: `templates/solution_draft_mode_b.md`
## Stakeholder Perspectives
Adjust content depth based on audience:
| Audience | Focus | Detail Level |
|----------|-------|--------------|
| **Decision-makers** | Conclusions, risks, recommendations | Concise, emphasize actionability |
| **Implementers** | Specific mechanisms, how-to | Detailed, emphasize how to do it |
| **Technical experts** | Details, boundary conditions, limitations | In-depth, emphasize accuracy |
## Output Files
Default intermediate artifacts location: `RESEARCH_DIR/`
**Required files** (automatically generated through the process):
| File | Content | When Generated |
|------|---------|----------------|
| `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
| `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
| `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
| `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
| `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
| `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
| `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
| `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
| `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
| `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
**Optional files**:
- `raw/*.md` - Raw source archives (saved when content is lengthy)
## Methodology Quick Reference Card
```
┌──────────────────────────────────────────────────────────────────┐
│ Deep Research — Mode-Aware 8-Step Method │
├──────────────────────────────────────────────────────────────────┤
│ CONTEXT: Resolve mode (project vs standalone) + set paths │
│ GUARDRAILS: Check INPUT_DIR/INPUT_FILE exists + required files │
│ MODE DETECT: solution_draft*.md in 01_solution? → A or B │
│ │
│ MODE A: Initial Research │
│ Phase 1: AC & Restrictions Assessment (BLOCKING) │
│ Phase 2: Full 8-step → solution_draft##.md │
│ Phase 3: Tech Stack Consolidation (OPTIONAL) → tech_stack.md │
│ Phase 4: Security Deep Dive (OPTIONAL) → security_analysis.md │
│ │
│ MODE B: Solution Assessment │
│ Read latest draft → Full 8-step → solution_draft##.md (N+1) │
│ Optional: Phase 3 / Phase 4 on revised draft │
│ │
│ 8-STEP ENGINE: │
│ 0. Classify question type → Select framework template │
│ 1. Decompose question → mode-specific sub-questions │
│ 2. Tier sources → L1 Official > L2 Blog > L3 Media > L4 │
│ 3. Extract facts → Each with source, confidence level │
│ 4. Build framework → Fixed dimensions, structured compare │
│ 5. Align references → Ensure unified definitions │
│ 6. Reasoning chain → Fact→Compare→Conclude, explicit │
│ 7. Use-case validation → Sanity check, prevent armchairing │
│ 8. Deliverable → solution_draft##.md (mode-specific format) │
├──────────────────────────────────────────────────────────────────┤
│ Key discipline: Ask don't assume · Facts before reasoning │
│ Conclusions from mechanism, not gut feelings │
└──────────────────────────────────────────────────────────────────┘
```
## Usage Examples
For detailed execution flow examples (Mode A initial, Mode B assessment, standalone, force override): Read `references/usage-examples.md`
## Source Verifiability Requirements
Every cited piece of external information must be directly verifiable by the user. All links must be publicly accessible (annotate `[login required]` if not), citations must include exact section/page/timestamp, and unverifiable information must be annotated `[limited source]`. Full checklist in `references/quality-checklists.md`.
## Quality Checklist
Before completing the solution draft, run through the checklists in `references/quality-checklists.md`. This covers:
- General quality (L1/L2 support, verifiability, actionability)
- Mode A specific (AC assessment, competitor analysis, component tables, tech stack)
- Mode B specific (findings table, self-contained draft, performance column)
- Timeliness check for high-sensitivity domains (version annotations, cross-validation, community mining)
- Target audience consistency (boundary definition, source matching, fact card audience)
## Final Reply Guidelines
When replying to the user after research is complete:
**✅ Should include**:
- Active mode used (A or B) and which optional phases were executed
- One-sentence core conclusion
- Key findings summary (3-5 points)
- Path to the solution draft: `OUTPUT_DIR/solution_draft##.md`
- Paths to optional artifacts if produced: `tech_stack.md`, `security_analysis.md`
- If there are significant uncertainties, annotate points requiring further verification
**❌ Must not include**:
- Process file listings (e.g., `00_question_decomposition.md`, `01_source_registry.md`, etc.)
- Detailed research step descriptions
- Working directory structure display
**Reason**: Process files are for retrospective review, not for the user. The user cares about conclusions, not the process.
@@ -0,0 +1,34 @@
# Comparison & Analysis Frameworks — Reference
## General Dimensions (select as needed)
1. Goal / What problem does it solve
2. Working mechanism / Process
3. Input / Output / Boundaries
4. Advantages / Disadvantages / Trade-offs
5. Applicable scenarios / Boundary conditions
6. Cost / Benefit / Risk
7. Historical evolution / Future trends
8. Security / Permissions / Controllability
## Concept Comparison Specific Dimensions
1. Definition & essence
2. Trigger / invocation method
3. Execution agent
4. Input/output & type constraints
5. Determinism & repeatability
6. Resource & context management
7. Composition & reuse patterns
8. Security boundaries & permission control
## Decision Support Specific Dimensions
1. Solution overview
2. Implementation cost
3. Maintenance cost
4. Risk assessment
5. Expected benefit
6. Applicable scenarios
7. Team capability requirements
8. Migration difficulty
@@ -0,0 +1,75 @@
# Novelty Sensitivity Assessment — Reference
## Novelty Sensitivity Classification
| Sensitivity Level | Typical Domains | Source Time Window | Description |
|-------------------|-----------------|-------------------|-------------|
| **Critical** | AI/LLMs, blockchain, cryptocurrency | 3-6 months | Technology iterates extremely fast; info from months ago may be completely outdated |
| **High** | Cloud services, frontend frameworks, API interfaces | 6-12 months | Frequent version updates; must confirm current version |
| **Medium** | Programming languages, databases, operating systems | 1-2 years | Relatively stable but still evolving |
| **Low** | Algorithm fundamentals, design patterns, theoretical concepts | No limit | Core principles change slowly |
## Critical Sensitivity Domain Special Rules
When the research topic involves the following domains, special rules must be enforced:
**Trigger word identification**:
- AI-related: LLM, GPT, Claude, Gemini, AI Agent, RAG, vector database, prompt engineering
- Cloud-native: Kubernetes new versions, Serverless, container runtimes
- Cutting-edge tech: Web3, quantum computing, AR/VR
**Mandatory rules**:
1. **Search with time constraints**:
- Use `time_range: "month"` or `time_range: "week"` to limit search results
- Prefer `start_date: "YYYY-MM-DD"` set to within the last 3 months
2. **Elevate official source priority**:
- Must first consult official documentation, official blogs, official Changelogs
- GitHub Release Notes, official X/Twitter announcements
- Academic papers (arXiv and other preprint platforms)
3. **Mandatory version number annotation**:
- Any technical description must annotate the current version number
- Example: "Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) supports..."
- Prohibit vague statements like "the latest version supports..."
4. **Outdated information handling**:
- Technical blogs/tutorials older than 6 months -> historical reference only, cannot serve as factual evidence
- Version inconsistency found -> must verify current version before using
- Obviously outdated descriptions (e.g., "will support in the future" but now already supported) -> discard directly
5. **Cross-validation**:
- Highly sensitive information must be confirmed from at least 2 independent sources
- Priority: Official docs > Official blogs > Authoritative tech media > Personal blogs
6. **Official download/release page direct verification (BLOCKING)**:
- Must directly visit official download pages to verify platform support (don't rely on search engine caches)
- Use `WebFetch` to directly extract download page content
- Search results about "coming soon" or "planned support" may be outdated; must verify in real time
- Platform support is frequently changing information; cannot infer from old sources
7. **Product-specific protocol/feature name search (BLOCKING)**:
- Beyond searching the product name, must additionally search protocol/standard names the product supports
- Common protocols/standards to search:
- AI tools: MCP, ACP (Agent Client Protocol), LSP, DAP
- Cloud services: OAuth, OIDC, SAML
- Data exchange: GraphQL, gRPC, REST
- Search format: `"<product_name> <protocol_name> support"` or `"<product_name> <protocol_name> integration"`
## Timeliness Assessment Output Template
```markdown
## Timeliness Sensitivity Assessment
- **Research Topic**: [topic]
- **Sensitivity Level**: Critical / High / Medium / Low
- **Rationale**: [why this level]
- **Source Time Window**: [X months/years]
- **Priority official sources to consult**:
1. [Official source 1]
2. [Official source 2]
- **Key version information to verify**:
- [Product/technology 1]: Current version ____
- [Product/technology 2]: Current version ____
```
@@ -0,0 +1,61 @@
# Quality Checklists — Reference
## General Quality
- [ ] All core conclusions have L1/L2 tier factual support
- [ ] No use of vague words like "possibly", "probably" without annotating uncertainty
- [ ] Comparison dimensions are complete with no key differences missed
- [ ] At least one real use case validates conclusions
- [ ] References are complete with accessible links
- [ ] Every citation can be directly verified by the user (source verifiability)
- [ ] Structure hierarchy is clear; executives can quickly locate information
## Mode A Specific
- [ ] Phase 1 completed: AC assessment was presented to and confirmed by user
- [ ] AC assessment consistent: Solution draft respects the (possibly adjusted) acceptance criteria and restrictions
- [ ] Competitor analysis included: Existing solutions were researched
- [ ] All components have comparison tables: Each component lists alternatives with tools, advantages, limitations, security, cost
- [ ] Tools/libraries verified: Suggested tools actually exist and work as described
- [ ] Testing strategy covers AC: Tests map to acceptance criteria
- [ ] Tech stack documented (if Phase 3 ran): `tech_stack.md` has evaluation tables, risk assessment, and learning requirements
- [ ] Security analysis documented (if Phase 4 ran): `security_analysis.md` has threat model and per-component controls
## Mode B Specific
- [ ] Findings table complete: All identified weak points documented with solutions
- [ ] Weak point categories covered: Functional, security, and performance assessed
- [ ] New draft is self-contained: Written as if from scratch, no "updated" markers
- [ ] Performance column included: Mode B comparison tables include performance characteristics
- [ ] Previous draft issues addressed: Every finding in the table is resolved in the new draft
## Timeliness Check (High-Sensitivity Domain BLOCKING)
When the research topic has Critical or High sensitivity level:
- [ ] Timeliness sensitivity assessment completed: `00_question_decomposition.md` contains a timeliness assessment section
- [ ] Source timeliness annotated: Every source has publication date, timeliness status, version info
- [ ] No outdated sources used as factual evidence (Critical: within 6 months; High: within 1 year)
- [ ] Version numbers explicitly annotated for all technical products/APIs/SDKs
- [ ] Official sources prioritized: Core conclusions have support from official documentation/blogs
- [ ] Cross-validation completed: Key technical information confirmed from at least 2 independent sources
- [ ] Download page directly verified: Platform support info comes from real-time extraction of official download pages
- [ ] Protocol/feature names searched: Searched for product-supported protocol names (MCP, ACP, etc.)
- [ ] GitHub Issues mined: Reviewed product's GitHub Issues popular discussions
- [ ] Community hotspots identified: Identified and recorded feature points users care most about
## Target Audience Consistency Check (BLOCKING)
- [ ] Research boundary clearly defined: `00_question_decomposition.md` has clear population/geography/timeframe/level boundaries
- [ ] Every source has target audience annotated in `01_source_registry.md`
- [ ] Mismatched sources properly handled (excluded, annotated, or marked reference-only)
- [ ] No audience confusion in fact cards: Every fact has target audience consistent with research boundary
- [ ] No audience confusion in the report: Policies/research/data cited have consistent target audiences
## Source Verifiability
- [ ] All cited links are publicly accessible (annotate `[login required]` if not)
- [ ] Citations include exact section/page/timestamp for long documents
- [ ] Cited facts have corresponding statements in the original text (no over-interpretation)
- [ ] Source publication/update dates annotated; technical docs include version numbers
- [ ] Unverifiable information annotated `[limited source]` and not sole support for core conclusions
@@ -0,0 +1,118 @@
# Source Tiering & Authority Anchoring — Reference
## Source Tiers
| Tier | Source Type | Purpose | Credibility |
|------|------------|---------|-------------|
| **L1** | Official docs, papers, specs, RFCs | Definitions, mechanisms, verifiable facts | High |
| **L2** | Official blogs, tech talks, white papers | Design intent, architectural thinking | High |
| **L3** | Authoritative media, expert commentary, tutorials | Supplementary intuition, case studies | Medium |
| **L4** | Community discussions, personal blogs, forums | Discover blind spots, validate understanding | Low |
## L4 Community Source Specifics (mandatory for product comparison research)
| Source Type | Access Method | Value |
|------------|---------------|-------|
| **GitHub Issues** | Visit `github.com/<org>/<repo>/issues` | Real user pain points, feature requests, bug reports |
| **GitHub Discussions** | Visit `github.com/<org>/<repo>/discussions` | Feature discussions, usage insights, community consensus |
| **Reddit** | Search `site:reddit.com "<product_name>"` | Authentic user reviews, comparison discussions |
| **Hacker News** | Search `site:news.ycombinator.com "<product_name>"` | In-depth technical community discussions |
| **Discord/Telegram** | Product's official community channels | Active user feedback (must annotate [limited source]) |
## Principles
- Conclusions must be traceable to L1/L2
- L3/L4 serve only as supplementary and validation
- L4 community discussions are used to discover "what users truly care about"
- Record all information sources
## Timeliness Filtering Rules (execute based on Step 0.5 sensitivity level)
| Sensitivity Level | Source Filtering Rule | Suggested Search Parameters |
|-------------------|----------------------|-----------------------------|
| Critical | Only accept sources within 6 months as factual evidence | `time_range: "month"` or `start_date` set to last 3 months |
| High | Prefer sources within 1 year; annotate if older than 1 year | `time_range: "year"` |
| Medium | Sources within 2 years used normally; older ones need validity check | Default search |
| Low | No time limit | Default search |
## High-Sensitivity Domain Search Strategy
```
1. Round 1: Targeted official source search
- Use include_domains to restrict to official domains
- Example: include_domains: ["anthropic.com", "openai.com", "docs.xxx.com"]
2. Round 2: Official download/release page direct verification (BLOCKING)
- Directly visit official download pages; don't rely on search caches
- Use tavily-extract or WebFetch to extract page content
- Verify: platform support, current version number, release date
3. Round 3: Product-specific protocol/feature search (BLOCKING)
- Search protocol names the product supports (MCP, ACP, LSP, etc.)
- Format: "<product_name> <protocol_name>" site:official_domain
4. Round 4: Time-limited broad search
- time_range: "month" or start_date set to recent
- Exclude obviously outdated sources
5. Round 5: Version verification
- Cross-validate version numbers from search results
- If inconsistency found, immediately consult official Changelog
6. Round 6: Community voice mining (BLOCKING - mandatory for product comparison research)
- Visit the product's GitHub Issues page, review popular/pinned issues
- Search Issues for key feature terms (e.g., "MCP", "plugin", "integration")
- Review discussion trends from the last 3-6 months
- Identify the feature points and differentiating characteristics users care most about
```
## Community Voice Mining Detailed Steps
```
GitHub Issues Mining Steps:
1. Visit github.com/<org>/<repo>/issues
2. Sort by "Most commented" to view popular discussions
3. Search keywords:
- Feature-related: feature request, enhancement, MCP, plugin, API
- Comparison-related: vs, compared to, alternative, migrate from
4. Review issue labels: enhancement, feature, discussion
5. Record frequently occurring feature demands and user pain points
Value Translation:
- Frequently discussed features -> likely differentiating highlights
- User complaints/requests -> likely product weaknesses
- Comparison discussions -> directly obtain user-perspective difference analysis
```
## Source Registry Entry Template
For each source consulted, immediately append to `01_source_registry.md`:
```markdown
## Source #[number]
- **Title**: [source title]
- **Link**: [URL]
- **Tier**: L1/L2/L3/L4
- **Publication Date**: [YYYY-MM-DD]
- **Timeliness Status**: Currently valid / Needs verification / Outdated (reference only)
- **Version Info**: [If involving a specific version, must annotate]
- **Target Audience**: [Explicitly annotate the group/geography/level this source targets]
- **Research Boundary Match**: Full match / Partial overlap / Reference only
- **Summary**: [1-2 sentence key content]
- **Related Sub-question**: [which sub-question this corresponds to]
```
## Target Audience Verification (BLOCKING)
Before including each source, verify that its target audience matches the research boundary:
| Source Type | Target audience to verify | Verification method |
|------------|---------------------------|---------------------|
| **Policy/Regulation** | Who is it for? (K-12/university/all) | Check document title, scope clauses |
| **Academic Research** | Who are the subjects? (vocational/undergraduate/graduate) | Check methodology/sample description sections |
| **Statistical Data** | Which population is measured? | Check data source description |
| **Case Reports** | What type of institution is involved? | Confirm institution type |
Handling mismatched sources:
- Target audience completely mismatched -> do not include
- Partially overlapping -> include but annotate applicable scope
- Usable as analogous reference -> include but explicitly annotate "reference only"
@@ -0,0 +1,56 @@
# Usage Examples — Reference
## Example 1: Initial Research (Mode A)
```
User: Research this problem and find the best solution
```
Execution flow:
1. Context resolution: no explicit file -> project mode (INPUT_DIR=`_docs/00_problem/`, OUTPUT_DIR=`_docs/01_solution/`)
2. Guardrails: verify INPUT_DIR exists with required files
3. Mode detection: no `solution_draft*.md` -> Mode A
4. Phase 1: Assess acceptance criteria and restrictions, ask user about unclear parts
5. BLOCKING: present AC assessment, wait for user confirmation
6. Phase 2: Full 8-step research — competitors, components, state-of-the-art solutions
7. Output: `OUTPUT_DIR/solution_draft01.md`
8. (Optional) Phase 3: Tech stack consolidation -> `tech_stack.md`
9. (Optional) Phase 4: Security deep dive -> `security_analysis.md`
## Example 2: Solution Assessment (Mode B)
```
User: Assess the current solution draft
```
Execution flow:
1. Context resolution: no explicit file -> project mode
2. Guardrails: verify INPUT_DIR exists
3. Mode detection: `solution_draft03.md` found in OUTPUT_DIR -> Mode B, read it as input
4. Full 8-step research — weak points, security, performance, solutions
5. Output: `OUTPUT_DIR/solution_draft04.md` with findings table + revised draft
## Example 3: Standalone Research
```
User: /research @my_problem.md
```
Execution flow:
1. Context resolution: explicit file -> standalone mode (INPUT_FILE=`my_problem.md`, OUTPUT_DIR=`_standalone/my_problem/01_solution/`)
2. Guardrails: verify INPUT_FILE exists and is non-empty, warn about missing restrictions/AC
3. Mode detection + full research flow as in Example 1, scoped to standalone paths
4. Output: `_standalone/my_problem/01_solution/solution_draft01.md`
5. Move `my_problem.md` into `_standalone/my_problem/`
## Example 4: Force Initial Research (Override)
```
User: Research from scratch, ignore existing drafts
```
Execution flow:
1. Context resolution: no explicit file -> project mode
2. Mode detection: drafts exist, but user explicitly requested initial research -> Mode A
3. Phase 1 + Phase 2 as in Example 1
4. Output: `OUTPUT_DIR/solution_draft##.md` (incremented from highest existing)
@@ -0,0 +1,37 @@
# Solution Draft
## Product Solution Description
[Short description of the proposed solution. Brief component interaction diagram.]
## Existing/Competitor Solutions Analysis
[Analysis of existing solutions for similar problems, if any.]
## Architecture
[Architecture solution that meets restrictions and acceptance criteria.]
### Component: [Component Name]
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|----------|-------|-----------|-------------|-------------|----------|------|-----|
| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
[Repeat per component]
## Testing Strategy
### Integration / Functional Tests
- [Test 1]
- [Test 2]
### Non-Functional Tests
- [Performance test 1]
- [Security test 1]
## References
[All cited source links]
## Related Artifacts
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
@@ -0,0 +1,40 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| [old] | [weak point] | [new] |
## Product Solution Description
[Short description. Brief component interaction diagram. Written as if from scratch — no "updated" markers.]
## Architecture
[Architecture solution that meets restrictions and acceptance criteria.]
### Component: [Component Name]
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
[Repeat per component]
## Testing Strategy
### Integration / Functional Tests
- [Test 1]
- [Test 2]
### Non-Functional Tests
- [Performance test 1]
- [Security test 1]
## References
[All cited source links]
## Related Artifacts
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
+174
View File
@@ -0,0 +1,174 @@
---
name: retrospective
description: |
Collect metrics from implementation batch reports and code review findings, analyze trends across cycles,
and produce improvement reports with actionable recommendations.
3-step workflow: collect metrics, analyze trends, produce report.
Outputs to _docs/05_metrics/.
Trigger phrases:
- "retrospective", "retro", "run retro"
- "metrics review", "feedback loop"
- "implementation metrics", "analyze trends"
category: evolve
tags: [retrospective, metrics, trends, improvement, feedback-loop]
disable-model-invocation: true
---
# Retrospective
Collect metrics from implementation artifacts, analyze trends across development cycles, and produce actionable improvement reports.
## Core Principles
- **Data-driven**: conclusions come from metrics, not impressions
- **Actionable**: every finding must have a concrete improvement suggestion
- **Cumulative**: each retrospective compares against previous ones to track progress
- **Save immediately**: write artifacts to disk after each step
- **Non-judgmental**: focus on process improvement, not blame
## Context Resolution
Fixed paths:
- IMPL_DIR: `_docs/03_implementation/`
- METRICS_DIR: `_docs/05_metrics/`
- TASKS_DIR: `_docs/02_tasks/`
Announce the resolved paths to the user before proceeding.
## Prerequisite Checks (BLOCKING)
1. `IMPL_DIR` exists and contains at least one `batch_*_report.md`**STOP if missing** (nothing to analyze)
2. Create METRICS_DIR if it does not exist
3. Check for previous retrospective reports in METRICS_DIR to enable trend comparison
## Artifact Management
### Directory Structure
```
METRICS_DIR/
├── retro_[YYYY-MM-DD].md
├── retro_[YYYY-MM-DD].md
└── ...
```
## Progress Tracking
At the start of execution, create a TodoWrite with all steps (1 through 3). Update status as each step completes.
## Workflow
### Step 1: Collect Metrics
**Role**: Data analyst
**Goal**: Parse all implementation artifacts and extract quantitative metrics
**Constraints**: Collection only — no interpretation yet
#### Sources
| Source | Metrics Extracted |
|--------|------------------|
| `batch_*_report.md` | Tasks per batch, batch count, task statuses (Done/Blocked/Partial) |
| Code review sections in batch reports | PASS/FAIL/PASS_WITH_WARNINGS ratios, finding counts by severity and category |
| Task spec files in TASKS_DIR | Complexity points per task, dependency count |
| `FINAL_implementation_report.md` | Total tasks, total batches, overall duration |
| Git log (if available) | Commits per batch, files changed per batch |
#### Metrics to Compute
**Implementation Metrics**:
- Total tasks implemented
- Total batches executed
- Average tasks per batch
- Average complexity points per batch
- Total complexity points delivered
**Quality Metrics**:
- Code review pass rate (PASS / total reviews)
- Code review findings by severity: Critical, High, Medium, Low counts
- Code review findings by category: Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope
- FAIL count (batches that required user intervention)
**Efficiency Metrics**:
- Blocked task count and reasons
- Tasks completed on first attempt vs requiring fixes
- Batch with most findings (identify problem areas)
**Self-verification**:
- [ ] All batch reports parsed
- [ ] All metric categories computed
- [ ] No batch reports missed
---
### Step 2: Analyze Trends
**Role**: Process improvement analyst
**Goal**: Identify patterns, recurring issues, and improvement opportunities
**Constraints**: Analysis must be grounded in the metrics from Step 1
1. If previous retrospective reports exist in METRICS_DIR, load the most recent one for comparison
2. Identify patterns:
- **Recurring findings**: which code review categories appear most frequently?
- **Problem components**: which components/files generate the most findings?
- **Complexity accuracy**: do high-complexity tasks actually produce more issues?
- **Blocker patterns**: what types of blockers occur and can they be prevented?
3. Compare against previous retrospective (if exists):
- Which metrics improved?
- Which metrics degraded?
- Were previous improvement actions effective?
4. Identify top 3 improvement actions ranked by impact
**Self-verification**:
- [ ] Patterns are grounded in specific metrics
- [ ] Comparison with previous retro included (if exists)
- [ ] Top 3 actions are concrete and actionable
---
### Step 3: Produce Report
**Role**: Technical writer
**Goal**: Write a structured retrospective report with metrics, trends, and recommendations
**Constraints**: Concise, data-driven, actionable
Write `METRICS_DIR/retro_[YYYY-MM-DD].md` using `templates/retrospective-report.md` as structure.
**Self-verification**:
- [ ] All metrics from Step 1 included
- [ ] Trend analysis from Step 2 included
- [ ] Top 3 improvement actions clearly stated
- [ ] Suggested rule/skill updates are specific
**Save action**: Write `retro_[YYYY-MM-DD].md`
Present the report summary to the user.
---
## Escalation Rules
| Situation | Action |
|-----------|--------|
| No batch reports exist | **STOP** — nothing to analyze |
| Batch reports have inconsistent format | **WARN user**, extract what is available |
| No previous retrospective for comparison | PROCEED — report baseline metrics only |
| Metrics suggest systemic issue (>50% FAIL rate) | **WARN user** — suggest immediate process review |
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Retrospective (3-Step Method) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ: batch reports exist in _docs/03_implementation/ │
│ │
│ 1. Collect Metrics → parse batch reports, compute metrics │
│ 2. Analyze Trends → patterns, comparison, improvement areas │
│ 3. Produce Report → _docs/05_metrics/retro_[date].md │
├────────────────────────────────────────────────────────────────┤
│ Principles: Data-driven · Actionable · Cumulative │
│ Non-judgmental · Save immediately │
└────────────────────────────────────────────────────────────────┘
```
@@ -0,0 +1,93 @@
# Retrospective Report Template
Save as `_docs/05_metrics/retro_[YYYY-MM-DD].md`.
---
```markdown
# Retrospective — [YYYY-MM-DD]
## Implementation Summary
| Metric | Value |
|--------|-------|
| Total tasks | [count] |
| Total batches | [count] |
| Total complexity points | [sum] |
| Avg tasks per batch | [value] |
| Avg complexity per batch | [value] |
## Quality Metrics
### Code Review Results
| Verdict | Count | Percentage |
|---------|-------|-----------|
| PASS | [count] | [%] |
| PASS_WITH_WARNINGS | [count] | [%] |
| FAIL | [count] | [%] |
### Findings by Severity
| Severity | Count |
|----------|-------|
| Critical | [count] |
| High | [count] |
| Medium | [count] |
| Low | [count] |
### Findings by Category
| Category | Count | Top Files |
|----------|-------|-----------|
| Bug | [count] | [most affected files] |
| Spec-Gap | [count] | [most affected files] |
| Security | [count] | [most affected files] |
| Performance | [count] | [most affected files] |
| Maintainability | [count] | [most affected files] |
| Style | [count] | [most affected files] |
## Efficiency
| Metric | Value |
|--------|-------|
| Blocked tasks | [count] |
| Tasks requiring fixes after review | [count] |
| Batch with most findings | Batch [N] — [reason] |
### Blocker Analysis
| Blocker Type | Count | Prevention |
|-------------|-------|-----------|
| [type] | [count] | [suggested prevention] |
## Trend Comparison
| Metric | Previous | Current | Change |
|--------|----------|---------|--------|
| Pass rate | [%] | [%] | [+/-] |
| Avg findings per batch | [value] | [value] | [+/-] |
| Blocked tasks | [count] | [count] | [+/-] |
*Previous retrospective: [date or "N/A — first retro"]*
## Top 3 Improvement Actions
1. **[Action title]**: [specific, actionable description]
- Impact: [expected improvement]
- Effort: [low/medium/high]
2. **[Action title]**: [specific, actionable description]
- Impact: [expected improvement]
- Effort: [low/medium/high]
3. **[Action title]**: [specific, actionable description]
- Impact: [expected improvement]
- Effort: [low/medium/high]
## Suggested Rule/Skill Updates
| File | Change | Rationale |
|------|--------|-----------|
| [.cursor/rules/... or .cursor/skills/...] | [specific change] | [based on which metric] |
```
+130
View File
@@ -0,0 +1,130 @@
---
name: rollback
description: |
Revert implementation to a specific batch checkpoint using git revert, reset Jira ticket statuses,
verify rollback integrity with tests, and produce a rollback report.
Trigger phrases:
- "rollback", "revert", "revert batch"
- "undo implementation", "roll back to batch"
category: build
tags: [rollback, revert, recovery, implementation]
disable-model-invocation: true
---
# Implementation Rollback
Revert the codebase to a specific batch checkpoint, reset Jira statuses for reverted tasks, and verify integrity.
## Core Principles
- **Preserve history**: always use `git revert`, never force-push
- **Verify after revert**: run the full test suite after every rollback
- **Update tracking**: reset Jira ticket statuses for all reverted tasks
- **Atomic rollback**: if rollback fails midway, stop and report — do not leave the codebase in a partial state
- **Ask, don't assume**: if the target batch is ambiguous, present options and ask
## Context Resolution
- IMPL_DIR: `_docs/03_implementation/`
- Batch reports: `IMPL_DIR/batch_*_report.md`
## Prerequisite Checks (BLOCKING)
1. IMPL_DIR exists and contains at least one `batch_*_report.md`**STOP if missing**
2. Git working tree is clean (no uncommitted changes) — **STOP if dirty**, ask user to commit or stash
## Input
- User specifies a target batch number or commit hash
- If not specified, present the list of available batch checkpoints and ask
## Workflow
### Step 1: Identify Checkpoints
1. Read all `batch_*_report.md` files from IMPL_DIR
2. Extract: batch number, date, tasks included, commit hash, code review verdict
3. Present batch list to user
**BLOCKING**: User must confirm which batch to roll back to.
### Step 2: Revert Commits
1. Determine which commits need to be reverted (all commits after the target batch)
2. For each commit in reverse chronological order:
- Run `git revert <commit-hash> --no-edit`
- If merge conflicts occur: present conflicts and ask user for resolution
3. If any revert fails and cannot be resolved, abort the rollback sequence with `git revert --abort` and report
### Step 3: Verify Integrity
1. Run the full test suite
2. If tests fail: report failures to user, ask how to proceed (fix or abort)
3. If tests pass: continue
### Step 4: Update Jira
1. Identify all tasks from reverted batches
2. Reset each task's Jira ticket status to "To Do" via Jira MCP
### Step 5: Finalize
1. Commit with message: `[ROLLBACK] Reverted to batch [N]: [task list]`
2. Write rollback report to `IMPL_DIR/rollback_report.md`
## Output
Write `_docs/03_implementation/rollback_report.md`:
```markdown
# Rollback Report
**Date**: [YYYY-MM-DD]
**Target**: Batch [N] (commit [hash])
**Reverted Batches**: [list]
## Reverted Tasks
| Task | Batch | Status Before | Status After |
|------|-------|--------------|-------------|
| [JIRA-ID] | [batch #] | In Testing | To Do |
## Test Results
- [pass/fail count]
## Jira Updates
- [list of ticket transitions]
## Notes
- [any conflicts, manual steps, or issues encountered]
```
## Escalation Rules
| Situation | Action |
|-----------|--------|
| No batch reports exist | **STOP** — nothing to roll back |
| Uncommitted changes in working tree | **STOP** — ask user to commit or stash |
| Merge conflicts during revert | **ASK user** for resolution |
| Tests fail after rollback | **ASK user** — fix or abort |
| Rollback fails midway | Abort with `git revert --abort`, report to user |
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Rollback (5-Step Method) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ: batch reports exist, clean working tree │
│ │
│ 1. Identify Checkpoints → present batch list │
│ [BLOCKING: user confirms target batch] │
│ 2. Revert Commits → git revert per commit │
│ 3. Verify Integrity → run full test suite │
│ 4. Update Jira → reset statuses to "To Do" │
│ 5. Finalize → commit + rollback_report.md │
├────────────────────────────────────────────────────────────────┤
│ Principles: Preserve history · Verify after revert │
│ Atomic rollback · Ask don't assume │
└────────────────────────────────────────────────────────────────┘
```
+300
View File
@@ -0,0 +1,300 @@
---
name: security-testing
description: "Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices."
category: specialized-testing
priority: critical
tokenEstimate: 1200
agents: [qe-security-scanner, qe-api-contract-validator, qe-quality-analyzer]
implementation_status: optimized
optimization_version: 1.0
last_optimized: 2025-12-02
dependencies: []
quick_reference_card: true
tags: [security, owasp, sast, dast, vulnerabilities, auth, injection]
trust_tier: 3
validation:
schema_path: schemas/output.json
validator_path: scripts/validate-config.json
eval_path: evals/security-testing.yaml
---
# Security Testing
<default_to_action>
When testing security or conducting audits:
1. TEST OWASP Top 10 vulnerabilities systematically
2. VALIDATE authentication and authorization on every endpoint
3. SCAN dependencies for known vulnerabilities (npm audit)
4. CHECK for injection attacks (SQL, XSS, command)
5. VERIFY secrets aren't exposed in code/logs
**Quick Security Checks:**
- Access control → Test horizontal/vertical privilege escalation
- Crypto → Verify password hashing, HTTPS, no sensitive data exposed
- Injection → Test SQL injection, XSS, command injection
- Auth → Test weak passwords, session fixation, MFA enforcement
- Config → Check error messages don't leak info
**Critical Success Factors:**
- Think like an attacker, build like a defender
- Security is built in, not added at the end
- Test continuously in CI/CD, not just before release
</default_to_action>
## Quick Reference Card
### When to Use
- Security audits and penetration testing
- Testing authentication/authorization
- Validating input sanitization
- Reviewing security configuration
### OWASP Top 10
Use the most recent **stable** version of the OWASP Top 10. At the start of each security audit, research the current version at https://owasp.org/www-project-top-ten/ and test against all listed categories. Do not rely on a hardcoded list — the OWASP Top 10 is updated periodically and the current version must be verified.
### Tools
| Type | Tool | Purpose |
|------|------|---------|
| SAST | SonarQube, Semgrep | Static code analysis |
| DAST | OWASP ZAP, Burp | Dynamic scanning |
| Deps | npm audit, Snyk | Dependency vulnerabilities |
| Secrets | git-secrets, TruffleHog | Secret scanning |
### Agent Coordination
- `qe-security-scanner`: Multi-layer SAST/DAST scanning
- `qe-api-contract-validator`: API security testing
- `qe-quality-analyzer`: Security code review
---
## Key Vulnerability Tests
### 1. Broken Access Control
```javascript
// Horizontal escalation - User A accessing User B's data
test('user cannot access another user\'s order', async () => {
const userAToken = await login('userA');
const userBOrder = await createOrder('userB');
const response = await api.get(`/orders/${userBOrder.id}`, {
headers: { Authorization: `Bearer ${userAToken}` }
});
expect(response.status).toBe(403);
});
// Vertical escalation - Regular user accessing admin
test('regular user cannot access admin', async () => {
const userToken = await login('regularUser');
expect((await api.get('/admin/users', {
headers: { Authorization: `Bearer ${userToken}` }
})).status).toBe(403);
});
```
### 2. Injection Attacks
```javascript
// SQL Injection
test('prevents SQL injection', async () => {
const malicious = "' OR '1'='1";
const response = await api.get(`/products?search=${malicious}`);
expect(response.body.length).toBeLessThan(100); // Not all products
});
// XSS
test('sanitizes HTML output', async () => {
const xss = '<script>alert("XSS")</script>';
await api.post('/comments', { text: xss });
const html = (await api.get('/comments')).body;
expect(html).toContain('&lt;script&gt;');
expect(html).not.toContain('<script>');
});
```
### 3. Cryptographic Failures
```javascript
test('passwords are hashed', async () => {
await db.users.create({ email: 'test@example.com', password: 'MyPassword123' });
const user = await db.users.findByEmail('test@example.com');
expect(user.password).not.toBe('MyPassword123');
expect(user.password).toMatch(/^\$2[aby]\$\d{2}\$/); // bcrypt
});
test('no sensitive data in API response', async () => {
const response = await api.get('/users/me');
expect(response.body).not.toHaveProperty('password');
expect(response.body).not.toHaveProperty('ssn');
});
```
### 4. Security Misconfiguration
```javascript
test('errors don\'t leak sensitive info', async () => {
const response = await api.post('/login', { email: 'nonexistent@test.com', password: 'wrong' });
expect(response.body.error).toBe('Invalid credentials'); // Generic message
});
test('sensitive endpoints not exposed', async () => {
const endpoints = ['/debug', '/.env', '/.git', '/admin'];
for (let ep of endpoints) {
expect((await fetch(`https://example.com${ep}`)).status).not.toBe(200);
}
});
```
### 5. Rate Limiting
```javascript
test('rate limiting prevents brute force', async () => {
const responses = [];
for (let i = 0; i < 20; i++) {
responses.push(await api.post('/login', { email: 'test@example.com', password: 'wrong' }));
}
expect(responses.filter(r => r.status === 429).length).toBeGreaterThan(0);
});
```
---
## Security Checklist
### Authentication
- [ ] Strong password requirements (12+ chars)
- [ ] Password hashing (bcrypt, scrypt, Argon2)
- [ ] MFA for sensitive operations
- [ ] Account lockout after failed attempts
- [ ] Session ID changes after login
- [ ] Session timeout
### Authorization
- [ ] Check authorization on every request
- [ ] Least privilege principle
- [ ] No horizontal escalation
- [ ] No vertical escalation
### Data Protection
- [ ] HTTPS everywhere
- [ ] Encrypted at rest
- [ ] Secrets not in code/logs
- [ ] PII compliance (GDPR)
### Input Validation
- [ ] Server-side validation
- [ ] Parameterized queries (no SQL injection)
- [ ] Output encoding (no XSS)
- [ ] Rate limiting
---
## CI/CD Integration
```yaml
# GitHub Actions
security-checks:
steps:
- name: Dependency audit
run: npm audit --audit-level=high
- name: SAST scan
run: npm run sast
- name: Secret scan
uses: trufflesecurity/trufflehog@main
- name: DAST scan
if: github.ref == 'refs/heads/main'
run: docker run owasp/zap2docker-stable zap-baseline.py -t https://staging.example.com
```
**Pre-commit hooks:**
```bash
#!/bin/sh
git-secrets --scan
npm run lint:security
```
---
## Agent-Assisted Security Testing
```typescript
// Comprehensive multi-layer scan
await Task("Security Scan", {
target: 'src/',
layers: { sast: true, dast: true, dependencies: true, secrets: true },
severity: ['critical', 'high', 'medium']
}, "qe-security-scanner");
// OWASP Top 10 testing
await Task("OWASP Scan", {
categories: ['broken-access-control', 'injection', 'cryptographic-failures'],
depth: 'comprehensive'
}, "qe-security-scanner");
// Validate fix
await Task("Validate Fix", {
vulnerability: 'CVE-2024-12345',
expectedResolution: 'upgrade package to v2.0.0',
retestAfterFix: true
}, "qe-security-scanner");
```
---
## Agent Coordination Hints
### Memory Namespace
```
aqe/security/
├── scans/* - Scan results
├── vulnerabilities/* - Found vulnerabilities
├── fixes/* - Remediation tracking
└── compliance/* - Compliance status
```
### Fleet Coordination
```typescript
const securityFleet = await FleetManager.coordinate({
strategy: 'security-testing',
agents: [
'qe-security-scanner',
'qe-api-contract-validator',
'qe-quality-analyzer',
'qe-deployment-readiness'
],
topology: 'parallel'
});
```
---
## Common Mistakes
### ❌ Security by Obscurity
Hiding admin at `/super-secret-admin`**Use proper auth**
### ❌ Client-Side Validation Only
JavaScript validation can be bypassed → **Always validate server-side**
### ❌ Trusting User Input
Assuming input is safe → **Sanitize, validate, escape all input**
### ❌ Hardcoded Secrets
API keys in code → **Environment variables, secret management**
---
## Related Skills
- [agentic-quality-engineering](../agentic-quality-engineering/) - Security with agents
- [api-testing-patterns](../api-testing-patterns/) - API security testing
- [compliance-testing](../compliance-testing/) - GDPR, HIPAA, SOC2
---
## Remember
**Think like an attacker:** What would you try to break? Test that.
**Build like a defender:** Assume input is malicious until proven otherwise.
**Test continuously:** Security testing is ongoing, not one-time.
**With Agents:** Agents automate vulnerability scanning, track remediation, and validate fixes. Use agents to maintain security posture at scale.
@@ -0,0 +1,789 @@
# =============================================================================
# AQE Skill Evaluation Test Suite: Security Testing v1.0.0
# =============================================================================
#
# Comprehensive evaluation suite for the security-testing skill per ADR-056.
# Tests OWASP Top 10 2021 detection, severity classification, remediation
# quality, and cross-model consistency.
#
# Schema: .claude/skills/.validation/schemas/skill-eval.schema.json
# Validator: .claude/skills/security-testing/scripts/validate-config.json
#
# Coverage:
# - OWASP A01:2021 - Broken Access Control
# - OWASP A02:2021 - Cryptographic Failures
# - OWASP A03:2021 - Injection (SQL, XSS, Command)
# - OWASP A07:2021 - Identification and Authentication Failures
# - Negative tests (no false positives on secure code)
#
# =============================================================================
skill: security-testing
version: 1.0.0
description: >
Comprehensive evaluation suite for the security-testing skill.
Tests OWASP Top 10 2021 detection capabilities, CWE classification accuracy,
CVSS scoring, severity classification, and remediation quality.
Supports multi-model testing and integrates with ReasoningBank for
continuous improvement.
# =============================================================================
# Multi-Model Configuration
# =============================================================================
models_to_test:
- claude-3.5-sonnet # Primary model (high accuracy expected)
- claude-3-haiku # Fast model (minimum quality threshold)
- gpt-4o # Cross-vendor validation
# =============================================================================
# MCP Integration Configuration
# =============================================================================
mcp_integration:
enabled: true
namespace: skill-validation
# Query existing security patterns before running evals
query_patterns: true
# Track each test outcome for learning feedback loop
track_outcomes: true
# Store successful patterns after evals complete
store_patterns: true
# Share learning with fleet coordinator agents
share_learning: true
# Update quality gate with validation metrics
update_quality_gate: true
# Target agents for learning distribution
target_agents:
- qe-learning-coordinator
- qe-queen-coordinator
- qe-security-scanner
- qe-security-auditor
# =============================================================================
# ReasoningBank Learning Configuration
# =============================================================================
learning:
store_success_patterns: true
store_failure_patterns: true
pattern_ttl_days: 90
min_confidence_to_store: 0.7
cross_model_comparison: true
# =============================================================================
# Result Format Configuration
# =============================================================================
result_format:
json_output: true
markdown_report: true
include_raw_output: false
include_timing: true
include_token_usage: true
# =============================================================================
# Environment Setup
# =============================================================================
setup:
required_tools:
- jq # JSON parsing (required)
- npm # Dependency audit (optional but recommended)
environment_variables:
SECURITY_SCAN_DEPTH: "deep"
OWASP_ENABLED: "true"
SEVERITY_THRESHOLD: "medium"
fixtures:
- name: vulnerable_express_app
path: fixtures/vulnerable-express-app.js
content: |
const express = require('express');
const app = express();
// SQL Injection vulnerability
app.get('/user', (req, res) => {
const query = `SELECT * FROM users WHERE id = ${req.params.id}`;
db.query(query);
});
// XSS vulnerability
app.get('/profile', (req, res) => {
res.send(`<h1>Hello ${req.query.name}</h1>`);
});
// Path Traversal vulnerability
app.get('/file', (req, res) => {
const path = './uploads/' + req.query.filename;
res.sendFile(path);
});
# =============================================================================
# TEST CASES
# =============================================================================
test_cases:
# ---------------------------------------------------------------------------
# CATEGORY: SQL Injection (OWASP A03:2021)
# ---------------------------------------------------------------------------
- id: tc001_sql_injection_string_concat
description: "Detect SQL injection via string concatenation in Node.js"
category: injection
priority: critical
input:
code: |
const express = require('express');
const mysql = require('mysql');
const app = express();
app.get('/api/users', (req, res) => {
const userId = req.params.id;
const query = `SELECT * FROM users WHERE id = ${userId}`;
db.query(query, (err, results) => {
res.json(results);
});
});
context:
language: javascript
framework: express
environment: production
expected_output:
must_contain:
- "SQL injection"
- "parameterized"
must_not_contain:
- "no vulnerabilities"
- "secure"
must_match_regex:
- "CWE-89|CWE-564"
- "A03:20[21][0-9]"
severity_classification: critical
finding_count:
min: 1
max: 3
recommendation_count:
min: 1
validation:
schema_check: true
keyword_match_threshold: 0.8
reasoning_quality_min: 0.7
grading_rubric:
completeness: 0.3
accuracy: 0.5
actionability: 0.2
timeout_ms: 30000
- id: tc002_sql_injection_parameterized_safe
description: "Verify parameterized queries are NOT flagged as vulnerable"
category: injection
priority: high
input:
code: |
app.get('/api/users', (req, res) => {
const userId = parseInt(req.params.id, 10);
db.query('SELECT * FROM users WHERE id = ?', [userId], (err, results) => {
res.json(results);
});
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "parameterized"
- "secure"
must_not_contain:
- "SQL injection"
- "critical"
- "vulnerable"
severity_classification: info
finding_count:
max: 1
validation:
schema_check: true
keyword_match_threshold: 0.7
allow_partial: true
# ---------------------------------------------------------------------------
# CATEGORY: Cross-Site Scripting (OWASP A03:2021)
# ---------------------------------------------------------------------------
- id: tc003_xss_reflected_html_output
description: "Detect reflected XSS in unescaped HTML output"
category: injection
priority: critical
input:
code: |
app.get('/profile', (req, res) => {
const name = req.query.name;
res.send(`
<html>
<body>
<h1>Welcome, ${name}!</h1>
<p>Your profile has been loaded.</p>
</body>
</html>
`);
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "XSS"
- "cross-site scripting"
- "sanitize"
- "escape"
must_match_regex:
- "CWE-79"
severity_classification: high
finding_count:
min: 1
validation:
schema_check: true
keyword_match_threshold: 0.8
reasoning_quality_min: 0.75
- id: tc004_xss_dom_based_innerhtml
description: "Detect DOM-based XSS via innerHTML assignment"
category: injection
priority: high
input:
code: |
// Client-side JavaScript
const params = new URLSearchParams(window.location.search);
const message = params.get('msg');
document.getElementById('output').innerHTML = message;
context:
language: javascript
framework: vanilla
environment: production
expected_output:
must_contain:
- "DOM"
- "XSS"
- "innerHTML"
- "textContent"
must_match_regex:
- "CWE-79"
severity_classification: high
validation:
schema_check: true
keyword_match_threshold: 0.7
# ---------------------------------------------------------------------------
# CATEGORY: Authentication Failures (OWASP A07:2021)
# ---------------------------------------------------------------------------
- id: tc005_hardcoded_credentials
description: "Detect hardcoded credentials and API keys"
category: authentication
priority: critical
input:
code: |
const ADMIN_PASSWORD = 'admin123';
const API_KEY = 'sk-1234567890abcdef';
const DATABASE_URL = 'postgres://admin:password123@localhost/db';
app.post('/login', (req, res) => {
if (req.body.password === ADMIN_PASSWORD) {
req.session.isAdmin = true;
res.send('Login successful');
}
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "hardcoded"
- "credentials"
- "secret"
- "environment variable"
must_match_regex:
- "CWE-798|CWE-259"
severity_classification: critical
finding_count:
min: 2
validation:
schema_check: true
keyword_match_threshold: 0.8
reasoning_quality_min: 0.8
- id: tc006_weak_password_hashing
description: "Detect weak password hashing algorithms (MD5, SHA1)"
category: authentication
priority: high
input:
code: |
const crypto = require('crypto');
function hashPassword(password) {
return crypto.createHash('md5').update(password).digest('hex');
}
function verifyPassword(password, hash) {
return hashPassword(password) === hash;
}
context:
language: javascript
framework: nodejs
expected_output:
must_contain:
- "MD5"
- "weak"
- "bcrypt"
- "argon2"
must_match_regex:
- "CWE-327|CWE-328|CWE-916"
severity_classification: high
finding_count:
min: 1
validation:
schema_check: true
keyword_match_threshold: 0.8
# ---------------------------------------------------------------------------
# CATEGORY: Broken Access Control (OWASP A01:2021)
# ---------------------------------------------------------------------------
- id: tc007_idor_missing_authorization
description: "Detect IDOR vulnerability with missing authorization check"
category: authorization
priority: critical
input:
code: |
app.get('/api/users/:id/profile', (req, res) => {
// No authorization check - any user can access any profile
const userId = req.params.id;
db.query('SELECT * FROM profiles WHERE user_id = ?', [userId])
.then(profile => res.json(profile));
});
app.delete('/api/users/:id', (req, res) => {
// No check if requesting user owns this account
db.query('DELETE FROM users WHERE id = ?', [req.params.id]);
res.send('User deleted');
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "authorization"
- "access control"
- "IDOR"
- "ownership"
must_match_regex:
- "CWE-639|CWE-284|CWE-862"
- "A01:2021"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.7
# ---------------------------------------------------------------------------
# CATEGORY: Cryptographic Failures (OWASP A02:2021)
# ---------------------------------------------------------------------------
- id: tc008_weak_encryption_des
description: "Detect use of weak encryption algorithms (DES, RC4)"
category: cryptography
priority: high
input:
code: |
const crypto = require('crypto');
function encryptData(data, key) {
const cipher = crypto.createCipher('des', key);
return cipher.update(data, 'utf8', 'hex') + cipher.final('hex');
}
function decryptData(data, key) {
const decipher = crypto.createDecipher('des', key);
return decipher.update(data, 'hex', 'utf8') + decipher.final('utf8');
}
context:
language: javascript
framework: nodejs
expected_output:
must_contain:
- "DES"
- "weak"
- "deprecated"
- "AES"
must_match_regex:
- "CWE-327|CWE-328"
- "A02:2021"
severity_classification: high
validation:
schema_check: true
keyword_match_threshold: 0.7
- id: tc009_plaintext_password_storage
description: "Detect plaintext password storage"
category: cryptography
priority: critical
input:
code: |
class User {
constructor(email, password) {
this.email = email;
this.password = password; // Stored in plaintext!
}
save() {
db.query('INSERT INTO users (email, password) VALUES (?, ?)',
[this.email, this.password]);
}
}
context:
language: javascript
framework: nodejs
expected_output:
must_contain:
- "plaintext"
- "password"
- "hash"
- "bcrypt"
must_match_regex:
- "CWE-256|CWE-312"
- "A02:2021"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.8
# ---------------------------------------------------------------------------
# CATEGORY: Path Traversal (Related to A01:2021)
# ---------------------------------------------------------------------------
- id: tc010_path_traversal_file_access
description: "Detect path traversal vulnerability in file access"
category: injection
priority: critical
input:
code: |
const fs = require('fs');
app.get('/download', (req, res) => {
const filename = req.query.file;
const filepath = './uploads/' + filename;
res.sendFile(filepath);
});
app.get('/read', (req, res) => {
const content = fs.readFileSync('./data/' + req.params.name);
res.send(content);
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "path traversal"
- "directory traversal"
- "../"
- "sanitize"
must_match_regex:
- "CWE-22|CWE-23"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.7
# ---------------------------------------------------------------------------
# CATEGORY: Negative Tests (No False Positives)
# ---------------------------------------------------------------------------
- id: tc011_secure_code_no_false_positives
description: "Verify secure code is NOT flagged as vulnerable"
category: negative
priority: critical
input:
code: |
const express = require('express');
const helmet = require('helmet');
const rateLimit = require('express-rate-limit');
const bcrypt = require('bcrypt');
const validator = require('validator');
const app = express();
app.use(helmet());
app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 }));
app.post('/api/users', async (req, res) => {
const { email, password } = req.body;
// Input validation
if (!validator.isEmail(email)) {
return res.status(400).json({ error: 'Invalid email' });
}
// Secure password hashing
const hashedPassword = await bcrypt.hash(password, 12);
// Parameterized query
await db.query(
'INSERT INTO users (email, password) VALUES ($1, $2)',
[email, hashedPassword]
);
res.status(201).json({ message: 'User created' });
});
context:
language: javascript
framework: express
environment: production
expected_output:
must_contain:
- "secure"
- "best practice"
must_not_contain:
- "SQL injection"
- "XSS"
- "critical vulnerability"
- "high severity"
finding_count:
max: 2 # Allow informational findings only
validation:
schema_check: true
keyword_match_threshold: 0.6
allow_partial: true
- id: tc012_secure_auth_implementation
description: "Verify secure authentication is recognized as safe"
category: negative
priority: high
input:
code: |
const bcrypt = require('bcrypt');
const jwt = require('jsonwebtoken');
async function login(email, password) {
const user = await User.findByEmail(email);
if (!user) {
return { error: 'Invalid credentials' };
}
const match = await bcrypt.compare(password, user.passwordHash);
if (!match) {
return { error: 'Invalid credentials' };
}
const token = jwt.sign(
{ userId: user.id },
process.env.JWT_SECRET,
{ expiresIn: '1h' }
);
return { token };
}
context:
language: javascript
framework: nodejs
expected_output:
must_contain:
- "bcrypt"
- "jwt"
- "secure"
must_not_contain:
- "vulnerable"
- "critical"
- "hardcoded"
severity_classification: info
validation:
schema_check: true
allow_partial: true
# ---------------------------------------------------------------------------
# CATEGORY: Python Security (Multi-language Support)
# ---------------------------------------------------------------------------
- id: tc013_python_sql_injection
description: "Detect SQL injection in Python Flask application"
category: injection
priority: critical
input:
code: |
from flask import Flask, request
import sqlite3
app = Flask(__name__)
@app.route('/user')
def get_user():
user_id = request.args.get('id')
conn = sqlite3.connect('users.db')
cursor = conn.cursor()
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
return str(cursor.fetchone())
context:
language: python
framework: flask
expected_output:
must_contain:
- "SQL injection"
- "parameterized"
- "f-string"
must_match_regex:
- "CWE-89"
severity_classification: critical
finding_count:
min: 1
validation:
schema_check: true
keyword_match_threshold: 0.7
- id: tc014_python_ssti_jinja
description: "Detect Server-Side Template Injection in Jinja2"
category: injection
priority: critical
input:
code: |
from flask import Flask, request, render_template_string
app = Flask(__name__)
@app.route('/render')
def render():
template = request.args.get('template')
return render_template_string(template)
context:
language: python
framework: flask
expected_output:
must_contain:
- "SSTI"
- "template injection"
- "render_template_string"
- "Jinja2"
must_match_regex:
- "CWE-94|CWE-1336"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.7
- id: tc015_python_pickle_deserialization
description: "Detect insecure deserialization with pickle"
category: injection
priority: critical
input:
code: |
import pickle
from flask import Flask, request
app = Flask(__name__)
@app.route('/load')
def load_data():
data = request.get_data()
obj = pickle.loads(data)
return str(obj)
context:
language: python
framework: flask
expected_output:
must_contain:
- "pickle"
- "deserialization"
- "untrusted"
- "RCE"
must_match_regex:
- "CWE-502"
- "A08:2021"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.7
# =============================================================================
# SUCCESS CRITERIA
# =============================================================================
success_criteria:
# Overall pass rate (90% of tests must pass)
pass_rate: 0.9
# Critical tests must ALL pass (100%)
critical_pass_rate: 1.0
# Average reasoning quality score
avg_reasoning_quality: 0.75
# Maximum suite execution time (5 minutes)
max_execution_time_ms: 300000
# Maximum variance between model results (15%)
cross_model_variance: 0.15
# =============================================================================
# METADATA
# =============================================================================
metadata:
author: "qe-security-auditor"
created: "2026-02-02"
last_updated: "2026-02-02"
coverage_target: >
OWASP Top 10 2021: A01 (Broken Access Control), A02 (Cryptographic Failures),
A03 (Injection - SQL, XSS, SSTI, Command), A07 (Authentication Failures),
A08 (Software Integrity - Deserialization). Covers JavaScript/Node.js
Express apps and Python Flask apps. 15 test cases with 90% pass rate
requirement and 100% critical pass rate.
+879
View File
@@ -0,0 +1,879 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://agentic-qe.dev/schemas/security-testing-output.json",
"title": "AQE Security Testing Skill Output Schema",
"description": "Schema for security-testing skill output validation. Extends the base skill-output template with OWASP Top 10 categories, CWE identifiers, and CVSS scoring.",
"type": "object",
"required": ["skillName", "version", "timestamp", "status", "trustTier", "output"],
"properties": {
"skillName": {
"type": "string",
"const": "security-testing",
"description": "Must be 'security-testing'"
},
"version": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+(-[a-zA-Z0-9]+)?$",
"description": "Semantic version of the skill"
},
"timestamp": {
"type": "string",
"format": "date-time",
"description": "ISO 8601 timestamp of output generation"
},
"status": {
"type": "string",
"enum": ["success", "partial", "failed", "skipped"],
"description": "Overall execution status"
},
"trustTier": {
"type": "integer",
"const": 3,
"description": "Trust tier 3 indicates full validation with eval suite"
},
"output": {
"type": "object",
"required": ["summary", "findings", "owaspCategories"],
"properties": {
"summary": {
"type": "string",
"minLength": 50,
"maxLength": 2000,
"description": "Human-readable summary of security findings"
},
"score": {
"$ref": "#/$defs/securityScore",
"description": "Overall security score"
},
"findings": {
"type": "array",
"items": {
"$ref": "#/$defs/securityFinding"
},
"maxItems": 500,
"description": "List of security vulnerabilities discovered"
},
"recommendations": {
"type": "array",
"items": {
"$ref": "#/$defs/securityRecommendation"
},
"maxItems": 100,
"description": "Prioritized remediation recommendations with code examples"
},
"metrics": {
"$ref": "#/$defs/securityMetrics",
"description": "Security scan metrics and statistics"
},
"owaspCategories": {
"$ref": "#/$defs/owaspCategoryBreakdown",
"description": "OWASP Top 10 2021 category breakdown"
},
"artifacts": {
"type": "array",
"items": {
"$ref": "#/$defs/artifact"
},
"maxItems": 50,
"description": "Generated security reports and scan artifacts"
},
"timeline": {
"type": "array",
"items": {
"$ref": "#/$defs/timelineEvent"
},
"description": "Scan execution timeline"
},
"scanConfiguration": {
"$ref": "#/$defs/scanConfiguration",
"description": "Configuration used for the security scan"
}
}
},
"metadata": {
"$ref": "#/$defs/metadata"
},
"validation": {
"$ref": "#/$defs/validationResult"
},
"learning": {
"$ref": "#/$defs/learningData"
}
},
"$defs": {
"securityScore": {
"type": "object",
"required": ["value", "max"],
"properties": {
"value": {
"type": "number",
"minimum": 0,
"maximum": 100,
"description": "Security score (0=critical issues, 100=no issues)"
},
"max": {
"type": "number",
"const": 100,
"description": "Maximum score is always 100"
},
"grade": {
"type": "string",
"pattern": "^[A-F][+-]?$",
"description": "Letter grade: A (90-100), B (80-89), C (70-79), D (60-69), F (<60)"
},
"trend": {
"type": "string",
"enum": ["improving", "stable", "declining", "unknown"],
"description": "Trend compared to previous scans"
},
"riskLevel": {
"type": "string",
"enum": ["critical", "high", "medium", "low", "minimal"],
"description": "Overall risk level assessment"
}
}
},
"securityFinding": {
"type": "object",
"required": ["id", "title", "severity", "owasp"],
"properties": {
"id": {
"type": "string",
"pattern": "^SEC-\\d{3,6}$",
"description": "Unique finding identifier (e.g., SEC-001)"
},
"title": {
"type": "string",
"minLength": 10,
"maxLength": 200,
"description": "Finding title describing the vulnerability"
},
"description": {
"type": "string",
"maxLength": 2000,
"description": "Detailed description of the vulnerability"
},
"severity": {
"type": "string",
"enum": ["critical", "high", "medium", "low", "info"],
"description": "Severity: critical (CVSS 9.0-10.0), high (7.0-8.9), medium (4.0-6.9), low (0.1-3.9), info (0)"
},
"owasp": {
"type": "string",
"pattern": "^A(0[1-9]|10):20(21|25)$",
"description": "OWASP Top 10 category (e.g., A01:2021, A03:2025)"
},
"owaspCategory": {
"type": "string",
"enum": [
"A01:2021-Broken-Access-Control",
"A02:2021-Cryptographic-Failures",
"A03:2021-Injection",
"A04:2021-Insecure-Design",
"A05:2021-Security-Misconfiguration",
"A06:2021-Vulnerable-Components",
"A07:2021-Identification-Authentication-Failures",
"A08:2021-Software-Data-Integrity-Failures",
"A09:2021-Security-Logging-Monitoring-Failures",
"A10:2021-Server-Side-Request-Forgery"
],
"description": "Full OWASP category name"
},
"cwe": {
"type": "string",
"pattern": "^CWE-\\d{1,4}$",
"description": "CWE identifier (e.g., CWE-79 for XSS, CWE-89 for SQLi)"
},
"cvss": {
"type": "object",
"properties": {
"score": {
"type": "number",
"minimum": 0,
"maximum": 10,
"description": "CVSS v3.1 base score"
},
"vector": {
"type": "string",
"pattern": "^CVSS:3\\.1/AV:[NALP]/AC:[LH]/PR:[NLH]/UI:[NR]/S:[UC]/C:[NLH]/I:[NLH]/A:[NLH]$",
"description": "CVSS v3.1 vector string"
},
"severity": {
"type": "string",
"enum": ["None", "Low", "Medium", "High", "Critical"],
"description": "CVSS severity rating"
}
}
},
"location": {
"$ref": "#/$defs/location",
"description": "Location of the vulnerability"
},
"evidence": {
"type": "string",
"maxLength": 5000,
"description": "Evidence: code snippet, request/response, or PoC"
},
"remediation": {
"type": "string",
"maxLength": 2000,
"description": "Specific fix instructions for this finding"
},
"references": {
"type": "array",
"items": {
"type": "object",
"required": ["title", "url"],
"properties": {
"title": { "type": "string" },
"url": { "type": "string", "format": "uri" }
}
},
"maxItems": 10,
"description": "External references (OWASP, CWE, CVE, etc.)"
},
"falsePositive": {
"type": "boolean",
"default": false,
"description": "Potential false positive flag"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence in finding accuracy (0.0-1.0)"
},
"exploitability": {
"type": "string",
"enum": ["trivial", "easy", "moderate", "difficult", "theoretical"],
"description": "How easy is it to exploit this vulnerability"
},
"affectedVersions": {
"type": "array",
"items": { "type": "string" },
"description": "Affected package/library versions for dependency vulnerabilities"
},
"cve": {
"type": "string",
"pattern": "^CVE-\\d{4}-\\d{4,}$",
"description": "CVE identifier if applicable"
}
}
},
"securityRecommendation": {
"type": "object",
"required": ["id", "title", "priority", "owaspCategories"],
"properties": {
"id": {
"type": "string",
"pattern": "^REC-\\d{3,6}$",
"description": "Unique recommendation identifier"
},
"title": {
"type": "string",
"minLength": 10,
"maxLength": 200,
"description": "Recommendation title"
},
"description": {
"type": "string",
"maxLength": 2000,
"description": "Detailed recommendation description"
},
"priority": {
"type": "string",
"enum": ["critical", "high", "medium", "low"],
"description": "Remediation priority"
},
"effort": {
"type": "string",
"enum": ["trivial", "low", "medium", "high", "major"],
"description": "Estimated effort: trivial(<1hr), low(1-4hr), medium(1-3d), high(1-2wk), major(>2wk)"
},
"impact": {
"type": "integer",
"minimum": 1,
"maximum": 10,
"description": "Security impact if implemented (1-10)"
},
"relatedFindings": {
"type": "array",
"items": {
"type": "string",
"pattern": "^SEC-\\d{3,6}$"
},
"description": "IDs of findings this addresses"
},
"owaspCategories": {
"type": "array",
"items": {
"type": "string",
"pattern": "^A(0[1-9]|10):20(21|25)$"
},
"description": "OWASP categories this recommendation addresses"
},
"codeExample": {
"type": "object",
"properties": {
"before": {
"type": "string",
"maxLength": 2000,
"description": "Vulnerable code example"
},
"after": {
"type": "string",
"maxLength": 2000,
"description": "Secure code example"
},
"language": {
"type": "string",
"description": "Programming language"
}
},
"description": "Before/after code examples for remediation"
},
"resources": {
"type": "array",
"items": {
"type": "object",
"required": ["title", "url"],
"properties": {
"title": { "type": "string" },
"url": { "type": "string", "format": "uri" }
}
},
"maxItems": 10,
"description": "External resources and documentation"
},
"automatable": {
"type": "boolean",
"description": "Can this fix be automated?"
},
"fixCommand": {
"type": "string",
"description": "CLI command to apply fix if automatable"
}
}
},
"owaspCategoryBreakdown": {
"type": "object",
"description": "OWASP Top 10 2021 category scores and findings",
"properties": {
"A01:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A01:2021 - Broken Access Control"
},
"A02:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A02:2021 - Cryptographic Failures"
},
"A03:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A03:2021 - Injection"
},
"A04:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A04:2021 - Insecure Design"
},
"A05:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A05:2021 - Security Misconfiguration"
},
"A06:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A06:2021 - Vulnerable and Outdated Components"
},
"A07:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A07:2021 - Identification and Authentication Failures"
},
"A08:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A08:2021 - Software and Data Integrity Failures"
},
"A09:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A09:2021 - Security Logging and Monitoring Failures"
},
"A10:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A10:2021 - Server-Side Request Forgery (SSRF)"
}
},
"additionalProperties": false
},
"owaspCategoryScore": {
"type": "object",
"required": ["tested", "score"],
"properties": {
"tested": {
"type": "boolean",
"description": "Whether this category was tested"
},
"score": {
"type": "number",
"minimum": 0,
"maximum": 100,
"description": "Category score (100 = no issues, 0 = critical)"
},
"grade": {
"type": "string",
"pattern": "^[A-F][+-]?$",
"description": "Letter grade for this category"
},
"findingCount": {
"type": "integer",
"minimum": 0,
"description": "Number of findings in this category"
},
"criticalCount": {
"type": "integer",
"minimum": 0,
"description": "Number of critical findings"
},
"highCount": {
"type": "integer",
"minimum": 0,
"description": "Number of high severity findings"
},
"status": {
"type": "string",
"enum": ["pass", "fail", "warn", "skip"],
"description": "Category status"
},
"description": {
"type": "string",
"description": "Category description and context"
},
"cwes": {
"type": "array",
"items": {
"type": "string",
"pattern": "^CWE-\\d{1,4}$"
},
"description": "CWEs found in this category"
}
}
},
"securityMetrics": {
"type": "object",
"properties": {
"totalFindings": {
"type": "integer",
"minimum": 0,
"description": "Total vulnerabilities found"
},
"criticalCount": {
"type": "integer",
"minimum": 0,
"description": "Critical severity findings"
},
"highCount": {
"type": "integer",
"minimum": 0,
"description": "High severity findings"
},
"mediumCount": {
"type": "integer",
"minimum": 0,
"description": "Medium severity findings"
},
"lowCount": {
"type": "integer",
"minimum": 0,
"description": "Low severity findings"
},
"infoCount": {
"type": "integer",
"minimum": 0,
"description": "Informational findings"
},
"filesScanned": {
"type": "integer",
"minimum": 0,
"description": "Number of files analyzed"
},
"linesOfCode": {
"type": "integer",
"minimum": 0,
"description": "Lines of code scanned"
},
"dependenciesChecked": {
"type": "integer",
"minimum": 0,
"description": "Number of dependencies checked"
},
"owaspCategoriesTested": {
"type": "integer",
"minimum": 0,
"maximum": 10,
"description": "OWASP Top 10 categories tested"
},
"owaspCategoriesPassed": {
"type": "integer",
"minimum": 0,
"maximum": 10,
"description": "OWASP Top 10 categories with no findings"
},
"uniqueCwes": {
"type": "integer",
"minimum": 0,
"description": "Unique CWE identifiers found"
},
"falsePositiveRate": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Estimated false positive rate"
},
"scanDurationMs": {
"type": "integer",
"minimum": 0,
"description": "Total scan duration in milliseconds"
},
"coverage": {
"type": "object",
"properties": {
"sast": {
"type": "boolean",
"description": "Static analysis performed"
},
"dast": {
"type": "boolean",
"description": "Dynamic analysis performed"
},
"dependencies": {
"type": "boolean",
"description": "Dependency scan performed"
},
"secrets": {
"type": "boolean",
"description": "Secret scanning performed"
},
"configuration": {
"type": "boolean",
"description": "Configuration review performed"
}
},
"description": "Scan coverage indicators"
}
}
},
"scanConfiguration": {
"type": "object",
"properties": {
"target": {
"type": "string",
"description": "Scan target (file path, URL, or package)"
},
"targetType": {
"type": "string",
"enum": ["source", "url", "package", "container", "infrastructure"],
"description": "Type of target being scanned"
},
"scanTypes": {
"type": "array",
"items": {
"type": "string",
"enum": ["sast", "dast", "dependency", "secret", "configuration", "container", "iac"]
},
"description": "Types of scans performed"
},
"severity": {
"type": "array",
"items": {
"type": "string",
"enum": ["critical", "high", "medium", "low", "info"]
},
"description": "Severity levels included in scan"
},
"owaspCategories": {
"type": "array",
"items": {
"type": "string",
"pattern": "^A(0[1-9]|10):20(21|25)$"
},
"description": "OWASP categories tested"
},
"tools": {
"type": "array",
"items": { "type": "string" },
"description": "Security tools used"
},
"excludePatterns": {
"type": "array",
"items": { "type": "string" },
"description": "File patterns excluded from scan"
},
"rulesets": {
"type": "array",
"items": { "type": "string" },
"description": "Security rulesets applied"
}
}
},
"location": {
"type": "object",
"properties": {
"file": {
"type": "string",
"maxLength": 500,
"description": "File path relative to project root"
},
"line": {
"type": "integer",
"minimum": 1,
"description": "Line number"
},
"column": {
"type": "integer",
"minimum": 1,
"description": "Column number"
},
"endLine": {
"type": "integer",
"minimum": 1,
"description": "End line for multi-line findings"
},
"endColumn": {
"type": "integer",
"minimum": 1,
"description": "End column"
},
"url": {
"type": "string",
"format": "uri",
"description": "URL for web-based findings"
},
"endpoint": {
"type": "string",
"description": "API endpoint path"
},
"method": {
"type": "string",
"enum": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS"],
"description": "HTTP method for API findings"
},
"parameter": {
"type": "string",
"description": "Vulnerable parameter name"
},
"component": {
"type": "string",
"description": "Affected component or module"
}
}
},
"artifact": {
"type": "object",
"required": ["type", "path"],
"properties": {
"type": {
"type": "string",
"enum": ["report", "sarif", "data", "log", "evidence"],
"description": "Artifact type"
},
"path": {
"type": "string",
"maxLength": 500,
"description": "Path to artifact"
},
"format": {
"type": "string",
"enum": ["json", "sarif", "html", "md", "txt", "xml", "csv"],
"description": "Artifact format"
},
"description": {
"type": "string",
"maxLength": 500,
"description": "Artifact description"
},
"sizeBytes": {
"type": "integer",
"minimum": 0,
"description": "File size in bytes"
},
"checksum": {
"type": "string",
"pattern": "^sha256:[a-f0-9]{64}$",
"description": "SHA-256 checksum"
}
}
},
"timelineEvent": {
"type": "object",
"required": ["timestamp", "event"],
"properties": {
"timestamp": {
"type": "string",
"format": "date-time",
"description": "Event timestamp"
},
"event": {
"type": "string",
"maxLength": 200,
"description": "Event description"
},
"type": {
"type": "string",
"enum": ["start", "checkpoint", "warning", "error", "complete"],
"description": "Event type"
},
"durationMs": {
"type": "integer",
"minimum": 0,
"description": "Duration since previous event"
},
"phase": {
"type": "string",
"enum": ["initialization", "sast", "dast", "dependency", "secret", "reporting"],
"description": "Scan phase"
}
}
},
"metadata": {
"type": "object",
"properties": {
"executionTimeMs": {
"type": "integer",
"minimum": 0,
"maximum": 3600000,
"description": "Execution time in milliseconds"
},
"toolsUsed": {
"type": "array",
"items": {
"type": "string",
"enum": ["semgrep", "npm-audit", "trivy", "owasp-zap", "bandit", "gosec", "eslint-security", "snyk", "gitleaks", "trufflehog", "bearer"]
},
"uniqueItems": true,
"description": "Security tools used"
},
"agentId": {
"type": "string",
"pattern": "^qe-[a-z][a-z0-9-]*$",
"description": "Agent ID (e.g., qe-security-scanner)"
},
"modelUsed": {
"type": "string",
"description": "LLM model used for analysis"
},
"inputHash": {
"type": "string",
"pattern": "^[a-f0-9]{64}$",
"description": "SHA-256 hash of input"
},
"targetUrl": {
"type": "string",
"format": "uri",
"description": "Target URL if applicable"
},
"targetPath": {
"type": "string",
"description": "Target path if applicable"
},
"environment": {
"type": "string",
"enum": ["development", "staging", "production", "ci"],
"description": "Execution environment"
},
"retryCount": {
"type": "integer",
"minimum": 0,
"maximum": 10,
"description": "Number of retries"
}
}
},
"validationResult": {
"type": "object",
"properties": {
"schemaValid": {
"type": "boolean",
"description": "Passes JSON schema validation"
},
"contentValid": {
"type": "boolean",
"description": "Passes content validation"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence score"
},
"warnings": {
"type": "array",
"items": {
"type": "string",
"maxLength": 500
},
"maxItems": 20,
"description": "Validation warnings"
},
"errors": {
"type": "array",
"items": {
"type": "string",
"maxLength": 500
},
"maxItems": 20,
"description": "Validation errors"
},
"validatorVersion": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+$",
"description": "Validator version"
}
}
},
"learningData": {
"type": "object",
"properties": {
"patternsDetected": {
"type": "array",
"items": {
"type": "string",
"maxLength": 200
},
"maxItems": 20,
"description": "Security patterns detected (e.g., sql-injection-string-concat)"
},
"reward": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Reward signal for learning (0.0-1.0)"
},
"feedbackLoop": {
"type": "object",
"properties": {
"previousRunId": {
"type": "string",
"format": "uuid",
"description": "Previous run ID for comparison"
},
"improvement": {
"type": "number",
"minimum": -1,
"maximum": 1,
"description": "Improvement over previous run"
}
}
},
"newVulnerabilityPatterns": {
"type": "array",
"items": {
"type": "object",
"properties": {
"pattern": { "type": "string" },
"cwe": { "type": "string" },
"confidence": { "type": "number" }
}
},
"description": "New vulnerability patterns learned"
}
}
}
}
}
@@ -0,0 +1,45 @@
{
"skillName": "security-testing",
"skillVersion": "1.0.0",
"requiredTools": [
"jq"
],
"optionalTools": [
"npm",
"semgrep",
"trivy",
"ajv",
"jsonschema",
"python3"
],
"schemaPath": "schemas/output.json",
"requiredFields": [
"skillName",
"status",
"output",
"output.summary",
"output.findings",
"output.owaspCategories"
],
"requiredNonEmptyFields": [
"output.summary"
],
"mustContainTerms": [
"OWASP",
"security",
"vulnerability"
],
"mustNotContainTerms": [
"TODO",
"placeholder",
"FIXME"
],
"enumValidations": {
".status": [
"success",
"partial",
"failed",
"skipped"
]
}
}