Update decomposition skill documentation and templates; remove obsolete feature specification and initial structure templates. Enhance task decomposition descriptions and adjust directory paths for project documentation.

2026-06-23 03:01:12 +00:00 · 2026-03-25 06:40:30 +02:00
parent 2c2fad0219
commit bc69fd4b0a
80 changed files with 8339 additions and 3526 deletions
@@ -0,0 +1,515 @@
+---
+name: document
+description: |
+  Bottom-up codebase documentation skill. Analyzes existing code from modules up through components
+  to architecture, then retrospectively derives problem/restrictions/acceptance criteria.
+  Produces the same _docs/ artifacts as the problem, research, and plan skills, but from code
+  analysis instead of user interview.
+  Trigger phrases:
+  - "document", "document codebase", "document this project"
+  - "documentation", "generate documentation", "create documentation"
+  - "reverse-engineer docs", "code to docs"
+  - "analyze and document"
+category: build
+tags: [documentation, code-analysis, reverse-engineering, architecture, bottom-up]
+disable-model-invocation: true
+---
+
+# Bottom-Up Codebase Documentation
+
+Analyze an existing codebase from the bottom up — individual modules first, then components, then system-level architecture — and produce the same `_docs/` artifacts that the `problem` and `plan` skills generate, without requiring user interview.
+
+## Core Principles
+
+- **Bottom-up always**: module docs -> component specs -> architecture/flows -> solution -> problem extraction. Every higher level is synthesized from the level below.
+- **Dependencies first**: process modules in topological order (leaves first). When documenting module X, all of X's dependencies already have docs.
+- **Incremental context**: each module's doc uses already-written dependency docs as context — no ever-growing chain.
+- **Verify against code**: cross-reference every entity in generated docs against actual codebase. Catch hallucinations.
+- **Save immediately**: write each artifact as soon as its step completes. Enable resume from any checkpoint.
+- **Ask, don't assume**: when code intent is ambiguous, ASK the user before proceeding.
+
+## Context Resolution
+
+Fixed paths:
+
+- DOCUMENT_DIR: `_docs/02_document/`
+- SOLUTION_DIR: `_docs/01_solution/`
+- PROBLEM_DIR: `_docs/00_problem/`
+
+Optional input:
+
+- FOCUS_DIR: a specific directory subtree provided by the user (e.g., `/document @src/api/`). When set, only this subtree and its transitive dependencies are analyzed.
+
+Announce resolved paths (and FOCUS_DIR if set) to user before proceeding.
+
+## Mode Detection
+
+Determine the execution mode before any other logic:
+
+| Mode | Trigger | Scope |
+|------|---------|-------|
+| **Full** | No input file, no existing state | Entire codebase |
+| **Focus Area** | User provides a directory path (e.g., `@src/api/`) | Only the specified subtree + transitive dependencies |
+| **Resume** | `state.json` exists in DOCUMENT_DIR | Continue from last checkpoint |
+
+Focus Area mode produces module + component docs for the targeted area only. It can be run repeatedly for different areas — each run appends to the existing module and component docs without overwriting other areas.
+
+## Prerequisite Checks
+
+1. If `_docs/` already exists and contains files AND mode is **Full**, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?**
+2. Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
+3. If DOCUMENT_DIR contains a `state.json`, offer to **resume from last checkpoint or start fresh**
+4. If FOCUS_DIR is set, verify the directory exists and contains source files — **STOP if missing**
+
+## Progress Tracking
+
+Create a TodoWrite with all steps (0 through 7). Update status as each step completes.
+
+## Workflow
+
+### Step 0: Codebase Discovery
+
+**Role**: Code analyst
+**Goal**: Build a complete map of the codebase (or targeted subtree) before analyzing any code.
+
+**Focus Area scoping**: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.
+
+Scan and catalog:
+
+1. Directory tree (ignore `node_modules`, `.git`, `__pycache__`, `bin/`, `obj/`, build artifacts)
+2. Language detection from file extensions and config files
+3. Package manifests: `package.json`, `requirements.txt`, `pyproject.toml`, `*.csproj`, `Cargo.toml`, `go.mod`
+4. Config files: `Dockerfile`, `docker-compose.yml`, `.env.example`, CI/CD configs (`.github/workflows/`, `.gitlab-ci.yml`, `azure-pipelines.yml`)
+5. Entry points: `main.*`, `app.*`, `index.*`, `Program.*`, startup scripts
+6. Test structure: test directories, test frameworks, test runner configs
+7. Existing documentation: README, `docs/`, wiki references, inline doc coverage
+8. **Dependency graph**: build a module-level dependency graph by analyzing imports/references. Identify:
+   - Leaf modules (no internal dependencies)
+   - Entry points (no internal dependents)
+   - Cycles (mark for grouped analysis)
+   - Topological processing order
+   - If FOCUS_DIR: mark which modules are in-scope vs dependency-only
+
+**Save**: `DOCUMENT_DIR/00_discovery.md` containing:
+- Directory tree (concise, relevant directories only)
+- Tech stack summary table (language, framework, database, infra)
+- Dependency graph (textual list + Mermaid diagram)
+- Topological processing order
+- Entry points and leaf modules
+
+**Save**: `DOCUMENT_DIR/state.json` with initial state:
+```json
+{
+  "current_step": "module-analysis",
+  "completed_steps": ["discovery"],
+  "focus_dir": null,
+  "modules_total": 0,
+  "modules_documented": [],
+  "modules_remaining": [],
+  "module_batch": 0,
+  "components_written": [],
+  "last_updated": ""
+}
+```
+
+Set `focus_dir` to the FOCUS_DIR path if in Focus Area mode, or `null` for Full mode.
+
+---
+
+### Step 1: Module-Level Documentation
+
+**Role**: Code analyst
+**Goal**: Document every identified module individually, processing in topological order (leaves first).
+
+**Batched processing**: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update `state.json`, present a progress summary. Between batches, evaluate whether to suggest a session break.
+
+For each module in topological order:
+
+1. **Read**: read the module's source code. Assess complexity and what context is needed.
+2. **Gather context**: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
+3. **Write module doc** with these sections:
+   - **Purpose**: one-sentence responsibility
+   - **Public interface**: exported functions/classes/methods with signatures, input/output types
+   - **Internal logic**: key algorithms, patterns, non-obvious behavior
+   - **Dependencies**: what it imports internally and why
+   - **Consumers**: what uses this module (from the dependency graph)
+   - **Data models**: entities/types defined in this module
+   - **Configuration**: env vars, config keys consumed
+   - **External integrations**: HTTP calls, DB queries, queue operations, file I/O
+   - **Security**: auth checks, encryption, input validation, secrets access
+   - **Tests**: what tests exist for this module, what they cover
+4. **Verify**: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.
+
+**Cycle handling**: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.
+
+**Large modules**: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.
+
+**Save**: `DOCUMENT_DIR/modules/[module_name].md` for each module.
+**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`). Increment `module_batch` after each batch of ~5.
+
+**Session break heuristic**: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:
+
+```
+══════════════════════════════════════
+ SESSION BREAK SUGGESTED
+══════════════════════════════════════
+ Modules documented: [X] of [Y]
+ Batches completed this session: [N]
+══════════════════════════════════════
+ A) Continue in this conversation
+ B) Save and continue in a fresh conversation (recommended)
+══════════════════════════════════════
+ Recommendation: B — fresh context improves
+ analysis quality for remaining modules
+══════════════════════════════════════
+```
+
+Re-entry is seamless: `state.json` tracks exactly which modules are done.
+
+---
+
+### Step 2: Component Assembly
+
+**Role**: Software architect
+**Goal**: Group related modules into logical components and produce component specs.
+
+1. Analyze module docs from Step 1 to identify natural groupings:
+   - By directory structure (most common)
+   - By shared data models or common purpose
+   - By dependency clusters (tightly coupled modules)
+2. For each identified component, synthesize its module docs into a single component specification using `templates/component-spec.md` as structure:
+   - High-level overview: purpose, pattern, upstream/downstream
+   - Internal interfaces: method signatures, DTOs (from actual module code)
+   - External API specification (if the component exposes HTTP/gRPC endpoints)
+   - Data access patterns: queries, caching, storage estimates
+   - Implementation details: algorithmic complexity, state management, key libraries
+   - Extensions and helpers: shared utilities needed
+   - Caveats and edge cases: limitations, race conditions, bottlenecks
+   - Dependency graph: implementation order relative to other components
+   - Logging strategy
+3. Identify common helpers shared across multiple components -> document in `common-helpers/`
+4. Generate component relationship diagram (Mermaid)
+
+**Self-verification**:
+- [ ] Every module from Step 1 is covered by exactly one component
+- [ ] No component has overlapping responsibility with another
+- [ ] Inter-component interfaces are explicit (who calls whom, with what)
+- [ ] Component dependency graph has no circular dependencies
+
+**Save**:
+- `DOCUMENT_DIR/components/[##]_[name]/description.md` per component
+- `DOCUMENT_DIR/common-helpers/[##]_helper_[name].md` per shared helper
+- `DOCUMENT_DIR/diagrams/components.md` (Mermaid component diagram)
+
+**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.
+
+---
+
+### Step 3: System-Level Synthesis
+
+**Role**: Software architect
+**Goal**: From component docs, synthesize system-level documents.
+
+All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.
+
+#### 3a. Architecture
+
+Using `templates/architecture.md` as structure:
+
+- System context and boundaries from entry points and external integrations
+- Tech stack table from discovery (Step 0) + component specs
+- Deployment model from Dockerfiles, CI configs, environment strategies
+- Data model overview from per-component data access sections
+- Integration points from inter-component interfaces
+- NFRs from test thresholds, config limits, health checks
+- Security architecture from per-module security observations
+- Key ADRs inferred from technology choices and patterns
+
+**Save**: `DOCUMENT_DIR/architecture.md`
+
+#### 3b. System Flows
+
+Using `templates/system-flows.md` as structure:
+
+- Trace main flows through the component interaction graph
+- Entry point -> component chain -> output for each major flow
+- Mermaid sequence diagrams and flowcharts
+- Error scenarios from exception handling patterns
+- Data flow tables per flow
+
+**Save**: `DOCUMENT_DIR/system-flows.md` and `DOCUMENT_DIR/diagrams/flows/flow_[name].md`
+
+#### 3c. Data Model
+
+- Consolidate all data models from module docs
+- Entity-relationship diagram (Mermaid ERD)
+- Migration strategy (if ORM/migration tooling detected)
+- Seed data observations
+- Backward compatibility approach (if versioning found)
+
+**Save**: `DOCUMENT_DIR/data_model.md`
+
+#### 3d. Deployment (if Dockerfile/CI configs exist)
+
+- Containerization summary
+- CI/CD pipeline structure
+- Environment strategy (dev, staging, production)
+- Observability (logging patterns, metrics, health checks found in code)
+
+**Save**: `DOCUMENT_DIR/deployment/` (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)
+
+---
+
+### Step 4: Verification Pass
+
+**Role**: Quality verifier
+**Goal**: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.
+
+For each document generated in Steps 1-3:
+
+1. **Entity verification**: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
+2. **Interface accuracy**: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
+3. **Flow correctness**: for each system flow diagram, trace the actual code path and verify the sequence matches.
+4. **Completeness check**: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
+5. **Consistency check**: do component docs agree with architecture doc? Do flow diagrams match component interfaces?
+
+Apply corrections inline to the documents that need them.
+
+**Save**: `DOCUMENT_DIR/04_verification_log.md` with:
+- Total entities verified vs flagged
+- Corrections applied (which document, what changed)
+- Remaining gaps or uncertainties
+- Completeness score (modules covered / total modules)
+
+**BLOCKING**: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.
+
+**Session boundary**: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:
+
+```
+══════════════════════════════════════
+ VERIFICATION COMPLETE — session break?
+══════════════════════════════════════
+ Steps 0–4 (analysis + verification) are done.
+ Steps 5–7 (solution + problem extraction + report)
+ can run in a fresh conversation.
+══════════════════════════════════════
+ A) Continue in this conversation
+ B) Save and continue in a new conversation (recommended)
+══════════════════════════════════════
+```
+
+If **Focus Area mode**: Steps 5–7 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run `/document` again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.
+
+---
+
+### Step 5: Solution Extraction (Retrospective)
+
+**Role**: Software architect
+**Goal**: From all verified technical documentation, retrospectively create `solution.md` — the same artifact the research skill produces. This makes downstream skills (`plan`, `deploy`, `decompose`) compatible with the documented codebase.
+
+Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):
+
+1. **Product Solution Description**: what the system is, brief component interaction diagram (Mermaid)
+2. **Architecture**: the architecture that is implemented, with per-component solution tables:
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| [actual implementation] | [libs/platforms used] | [observed strengths] | [observed limitations] | [requirements met] | [security approach] | [cost indicators] | [fitness assessment] |
+
+3. **Testing Strategy**: summarize integration/functional tests and non-functional tests found in the codebase
+4. **References**: links to key config files, Dockerfiles, CI configs that evidence the solution choices
+
+**Save**: `SOLUTION_DIR/solution.md` (`_docs/01_solution/solution.md`)
+
+---
+
+### Step 6: Problem Extraction (Retrospective)
+
+**Role**: Business analyst
+**Goal**: From all verified technical docs, retrospectively derive the high-level problem definition — producing the same documents the `problem` skill creates through interview.
+
+This is the inverse of normal workflow: instead of problem -> solution -> code, we go code -> technical docs -> problem understanding.
+
+#### 6a. `problem.md`
+
+- Synthesize from architecture overview + component purposes + system flows
+- What is this system? What problem does it solve? Who are the users? How does it work at a high level?
+- Cross-reference with README if one exists
+- Free-form text, concise, readable by someone unfamiliar with the project
+
+#### 6b. `restrictions.md`
+
+- Extract from: tech stack choices, Dockerfile specs (OS, base images), CI configs (platform constraints), dependency versions, environment configs
+- Categorize with headers: Hardware, Software, Environment, Operational
+- Each restriction should be specific and testable
+
+#### 6c. `acceptance_criteria.md`
+
+- Derive from: test assertions (expected values, thresholds), performance configs (timeouts, rate limits, batch sizes), health check endpoints, validation rules in code
+- Categorize with headers by domain
+- Every criterion must have a measurable value — if only implied, note the source
+
+#### 6d. `input_data/`
+
+- Document data schemas found (DB schemas, API request/response types, config file formats)
+- Create `data_parameters.md` describing what data the system consumes, formats, volumes, update patterns
+
+#### 6e. `security_approach.md` (only if security code found)
+
+- Authentication mechanisms, authorization patterns, encryption, secrets handling, CORS, rate limiting, input sanitization — all from code observations
+- If no security-relevant code found, skip this file
+
+**Save**: all files to `PROBLEM_DIR/` (`_docs/00_problem/`)
+
+**BLOCKING**: Present all problem documents to user. These are the most abstracted and therefore most prone to interpretation error. Do NOT proceed until user confirms or requests corrections.
+
+---
+
+### Step 7: Final Report
+
+**Role**: Technical writer
+**Goal**: Produce `FINAL_report.md` integrating all generated documentation.
+
+Using `templates/final-report.md` as structure:
+
+- Executive summary from architecture + problem docs
+- Problem statement (transformed from problem.md, not copy-pasted)
+- Architecture overview with tech stack one-liner
+- Component summary table (number, name, purpose, dependencies)
+- System flows summary table
+- Risk observations from verification log (Step 4)
+- Open questions (uncertainties flagged during analysis)
+- Artifact index listing all generated documents with paths
+
+**Save**: `DOCUMENT_DIR/FINAL_report.md`
+
+**State**: update `state.json` with `current_step: "complete"`.
+
+---
+
+## Artifact Management
+
+### Directory Structure
+
+```
+_docs/
+├── 00_problem/                          # Step 6 (retrospective)
+│   ├── problem.md
+│   ├── restrictions.md
+│   ├── acceptance_criteria.md
+│   ├── input_data/
+│   │   └── data_parameters.md
+│   └── security_approach.md
+├── 01_solution/                         # Step 5 (retrospective)
+│   └── solution.md
+└── 02_document/                         # DOCUMENT_DIR
+    ├── 00_discovery.md                  # Step 0
+    ├── modules/                         # Step 1
+    │   ├── [module_name].md
+    │   └── ...
+    ├── components/                      # Step 2
+    │   ├── 01_[name]/description.md
+    │   ├── 02_[name]/description.md
+    │   └── ...
+    ├── common-helpers/                  # Step 2
+    ├── architecture.md                  # Step 3
+    ├── system-flows.md                  # Step 3
+    ├── data_model.md                    # Step 3
+    ├── deployment/                      # Step 3
+    ├── diagrams/                        # Steps 2-3
+    │   ├── components.md
+    │   └── flows/
+    ├── 04_verification_log.md           # Step 4
+    ├── FINAL_report.md                  # Step 7
+    └── state.json                       # Resumability
+```
+
+### Resumability
+
+Maintain `DOCUMENT_DIR/state.json`:
+
+```json
+{
+  "current_step": "module-analysis",
+  "completed_steps": ["discovery"],
+  "focus_dir": null,
+  "modules_total": 12,
+  "modules_documented": ["utils/helpers", "models/user"],
+  "modules_remaining": ["services/auth", "api/endpoints"],
+  "module_batch": 1,
+  "components_written": [],
+  "last_updated": "2026-03-21T14:00:00Z"
+}
+```
+
+Update after each module/component completes. If interrupted, resume from next undocumented module.
+
+When resuming:
+1. Read `state.json`
+2. Cross-check against actual files in DOCUMENT_DIR (trust files over state if they disagree)
+3. Continue from the next incomplete item
+4. Inform user which steps are being skipped
+
+### Save Principles
+
+1. **Save immediately**: write each module doc as soon as analysis completes
+2. **Incremental context**: each subsequent module uses already-written docs as context
+3. **Preserve intermediates**: keep all module docs even after synthesis into component docs
+4. **Enable recovery**: state file tracks exact progress for resume
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Minified/obfuscated code detected | WARN user, skip module, note in verification log |
+| Module too large for context window | Split into sub-sections, analyze parts separately, combine |
+| Cycle in dependency graph | Group cycled modules, analyze together as one doc |
+| Generated code (protobuf, swagger-gen) | Note as generated, document the source spec instead |
+| No tests found in codebase | Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only |
+| Contradictions between code and README | Flag in verification log, ASK user |
+| Binary files or non-code assets | Skip, note in discovery |
+| `_docs/` already exists | ASK user: overwrite, merge, or use `_docs_generated/` |
+| Code intent is ambiguous | ASK user, do not guess |
+
+## Common Mistakes
+
+- **Top-down guessing**: never infer architecture before documenting modules. Build up, don't assume down.
+- **Hallucinating entities**: always verify that referenced classes/functions/endpoints actually exist in code.
+- **Skipping modules**: every source module must appear in exactly one module doc and one component.
+- **Monolithic analysis**: don't try to analyze the entire codebase in one pass. Module by module, in order.
+- **Inventing restrictions**: only document constraints actually evidenced in code, configs, or Dockerfiles.
+- **Vague acceptance criteria**: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
+- **Writing code**: this skill produces documents, never implementation code.
+
+## Methodology Quick Reference
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│          Bottom-Up Codebase Documentation (8-Step)               │
+├──────────────────────────────────────────────────────────────────┤
+│ MODE: Full / Focus Area (@dir) / Resume (state.json)             │
+│ PREREQ: Check _docs/ exists (overwrite/merge/new?)               │
+│ PREREQ: Check state.json for resume                              │
+│                                                                  │
+│ 0. Discovery          → dependency graph, tech stack, topo order │
+│    (Focus Area: scoped to FOCUS_DIR + transitive deps)           │
+│ 1. Module Docs        → per-module analysis (leaves first)       │
+│    (batched ~5 modules; session break between batches)           │
+│ 2. Component Assembly → group modules, write component specs     │
+│    [BLOCKING: user confirms components]                          │
+│ 3. System Synthesis   → architecture, flows, data model, deploy  │
+│ 4. Verification       → compare all docs vs code, fix errors     │
+│    [BLOCKING: user reviews corrections]                          │
+│    [SESSION BREAK suggested before Steps 5–7]                    │
+│    ── Focus Area mode stops here ──                              │
+│ 5. Solution Extraction → retrospective solution.md               │
+│ 6. Problem Extraction → retrospective problem, restrictions, AC  │
+│    [BLOCKING: user confirms problem docs]                        │
+│ 7. Final Report       → FINAL_report.md                          │
+├──────────────────────────────────────────────────────────────────┤
+│ Principles: Bottom-up always · Dependencies first                │
+│             Incremental context · Verify against code            │
+│             Save immediately · Resume from checkpoint            │
+│             Batch modules · Session breaks for large codebases   │
+└──────────────────────────────────────────────────────────────────┘
+```