Files
2026-04-18 22:04:08 +03:00

410 lines
22 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Document Skill — Full / Focus Area / Resume Workflow
Covers three related modes that share the same 8-step pipeline:
- **Full**: entire codebase, no prior state
- **Focus Area**: scoped to a directory subtree + transitive dependencies
- **Resume**: continue from `state.json` checkpoint
## Prerequisite Checks
1. If `_docs/` already exists and contains files AND mode is **Full**, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?**
2. Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
3. If DOCUMENT_DIR contains a `state.json`, offer to **resume from last checkpoint or start fresh**
4. If FOCUS_DIR is set, verify the directory exists and contains source files — **STOP if missing**
## Progress Tracking
Create a TodoWrite with all steps (0 through 7). Update status as each step completes.
## Steps
### Step 0: Codebase Discovery
**Role**: Code analyst
**Goal**: Build a complete map of the codebase (or targeted subtree) before analyzing any code.
**Focus Area scoping**: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.
Scan and catalog:
1. Directory tree (ignore `node_modules`, `.git`, `__pycache__`, `bin/`, `obj/`, build artifacts)
2. Language detection from file extensions and config files
3. Package manifests: `package.json`, `requirements.txt`, `pyproject.toml`, `*.csproj`, `Cargo.toml`, `go.mod`
4. Config files: `Dockerfile`, `docker-compose.yml`, `.env.example`, CI/CD configs (`.github/workflows/`, `.gitlab-ci.yml`, `azure-pipelines.yml`)
5. Entry points: `main.*`, `app.*`, `index.*`, `Program.*`, startup scripts
6. Test structure: test directories, test frameworks, test runner configs
7. Existing documentation: README, `docs/`, wiki references, inline doc coverage
8. **Dependency graph**: build a module-level dependency graph by analyzing imports/references. Identify:
- Leaf modules (no internal dependencies)
- Entry points (no internal dependents)
- Cycles (mark for grouped analysis)
- Topological processing order
- If FOCUS_DIR: mark which modules are in-scope vs dependency-only
**Save**: `DOCUMENT_DIR/00_discovery.md` containing:
- Directory tree (concise, relevant directories only)
- Tech stack summary table (language, framework, database, infra)
- Dependency graph (textual list + Mermaid diagram)
- Topological processing order
- Entry points and leaf modules
**Save**: `DOCUMENT_DIR/state.json` with initial state (see `references/artifacts.md` for format).
---
### Step 1: Module-Level Documentation
**Role**: Code analyst
**Goal**: Document every identified module individually, processing in topological order (leaves first).
**Batched processing**: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update `state.json`, present a progress summary. Between batches, evaluate whether to suggest a session break.
For each module in topological order:
1. **Read**: read the module's source code. Assess complexity and what context is needed.
2. **Gather context**: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
3. **Write module doc** with these sections:
- **Purpose**: one-sentence responsibility
- **Public interface**: exported functions/classes/methods with signatures, input/output types
- **Internal logic**: key algorithms, patterns, non-obvious behavior
- **Dependencies**: what it imports internally and why
- **Consumers**: what uses this module (from the dependency graph)
- **Data models**: entities/types defined in this module
- **Configuration**: env vars, config keys consumed
- **External integrations**: HTTP calls, DB queries, queue operations, file I/O
- **Security**: auth checks, encryption, input validation, secrets access
- **Tests**: what tests exist for this module, what they cover
4. **Verify**: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.
**Cycle handling**: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.
**Large modules**: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.
**Save**: `DOCUMENT_DIR/modules/[module_name].md` for each module.
**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`). Increment `module_batch` after each batch of ~5.
**Session break heuristic**: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:
```
══════════════════════════════════════
SESSION BREAK SUGGESTED
══════════════════════════════════════
Modules documented: [X] of [Y]
Batches completed this session: [N]
══════════════════════════════════════
A) Continue in this conversation
B) Save and continue in a fresh conversation (recommended)
══════════════════════════════════════
Recommendation: B — fresh context improves
analysis quality for remaining modules
══════════════════════════════════════
```
Re-entry is seamless: `state.json` tracks exactly which modules are done.
---
### Step 2: Component Assembly
**Role**: Software architect
**Goal**: Group related modules into logical components and produce component specs.
1. Analyze module docs from Step 1 to identify natural groupings:
- By directory structure (most common)
- By shared data models or common purpose
- By dependency clusters (tightly coupled modules)
2. For each identified component, synthesize its module docs into a single component specification using `.cursor/skills/plan/templates/component-spec.md` as structure:
- High-level overview: purpose, pattern, upstream/downstream
- Internal interfaces: method signatures, DTOs (from actual module code)
- External API specification (if the component exposes HTTP/gRPC endpoints)
- Data access patterns: queries, caching, storage estimates
- Implementation details: algorithmic complexity, state management, key libraries
- Extensions and helpers: shared utilities needed
- Caveats and edge cases: limitations, race conditions, bottlenecks
- Dependency graph: implementation order relative to other components
- Logging strategy
3. Identify common helpers shared across multiple components → document in `common-helpers/`
4. Generate component relationship diagram (Mermaid)
**Self-verification**:
- [ ] Every module from Step 1 is covered by exactly one component
- [ ] No component has overlapping responsibility with another
- [ ] Inter-component interfaces are explicit (who calls whom, with what)
- [ ] Component dependency graph has no circular dependencies
**Save**:
- `DOCUMENT_DIR/components/[##]_[name]/description.md` per component
- `DOCUMENT_DIR/common-helpers/[##]_helper_[name].md` per shared helper
- `DOCUMENT_DIR/diagrams/components.md` (Mermaid component diagram)
**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.
---
### Step 2.5: Module Layout Derivation
**Role**: Software architect
**Goal**: Produce `_docs/02_document/module-layout.md` — the authoritative file-ownership map read by `/implement` Step 4, `/code-review` Phase 7, and `/refactor` discovery. Required for any downstream skill that assigns file ownership or checks architectural layering.
This step derives the layout from the **existing** codebase rather than from a plan. Decompose Step 1.5 is the greenfield counterpart and uses the same template; this step uses the same output shape so downstream consumers don't branch on origin.
1. For each component identified in Step 2, resolve its owning directory from module docs (Step 1) and from directory groupings used in Step 2.
2. For each component, compute:
- **Public API**: exported symbols. Language-specific: Python — `__init__.py` re-exports + non-underscore root-level symbols; TypeScript — `index.ts` / barrel exports; C# — `public` types in the namespace root; Rust — `pub` items in `lib.rs` / `mod.rs`; Go — exported (capitalized) identifiers in the package root.
- **Internal**: everything else under the component's directory.
- **Owns**: the component's directory glob.
- **Imports from**: other components whose Public API this one references (parse imports; reuse tooling from Step 0's dependency graph).
- **Consumed by**: reverse of Imports from across all components.
3. Identify `shared/*` directories already present in the code (or infer candidates: modules imported by ≥2 components and owning no domain logic). Create a Shared / Cross-Cutting entry per concern.
4. Infer the Allowed Dependencies layering table by topologically sorting the import graph built in step 2. Components that import only from `shared/*` go to Layer 1; each successive layer imports only from lower layers.
5. Write `_docs/02_document/module-layout.md` using `.cursor/skills/decompose/templates/module-layout.md`. At the top of the file add `**Status**: derived-from-code` and a `## Verification Needed` block listing any inference that was not clean (detected cycles, ambiguous ownership, components not cleanly assignable to a layer).
**Self-verification**:
- [ ] Every component from Step 2 has a Per-Component Mapping entry
- [ ] Every Public API list is grounded in an actual exported symbol (no guesses)
- [ ] No component's `Imports from` points at a component in a higher layer
- [ ] Shared directories detected in code are listed under Shared / Cross-Cutting
- [ ] Cycles from Step 0 that span components are surfaced in `## Verification Needed`
**Save**: `_docs/02_document/module-layout.md`
**BLOCKING**: Present the layering table and the `## Verification Needed` block to the user. Do NOT proceed until the user confirms (or patches) the derived layout. Downstream skills assume this file is accurate.
---
### Step 3: System-Level Synthesis
**Role**: Software architect
**Goal**: From component docs, synthesize system-level documents.
All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.
#### 3a. Architecture
Using `.cursor/skills/plan/templates/architecture.md` as structure:
- System context and boundaries from entry points and external integrations
- Tech stack table from discovery (Step 0) + component specs
- Deployment model from Dockerfiles, CI configs, environment strategies
- Data model overview from per-component data access sections
- Integration points from inter-component interfaces
- NFRs from test thresholds, config limits, health checks
- Security architecture from per-module security observations
- Key ADRs inferred from technology choices and patterns
**Save**: `DOCUMENT_DIR/architecture.md`
#### 3b. System Flows
Using `.cursor/skills/plan/templates/system-flows.md` as structure:
- Trace main flows through the component interaction graph
- Entry point → component chain → output for each major flow
- Mermaid sequence diagrams and flowcharts
- Error scenarios from exception handling patterns
- Data flow tables per flow
**Save**: `DOCUMENT_DIR/system-flows.md` and `DOCUMENT_DIR/diagrams/flows/flow_[name].md`
#### 3c. Data Model
- Consolidate all data models from module docs
- Entity-relationship diagram (Mermaid ERD)
- Migration strategy (if ORM/migration tooling detected)
- Seed data observations
- Backward compatibility approach (if versioning found)
**Save**: `DOCUMENT_DIR/data_model.md`
#### 3d. Deployment (if Dockerfile/CI configs exist)
- Containerization summary
- CI/CD pipeline structure
- Environment strategy (dev, staging, production)
- Observability (logging patterns, metrics, health checks found in code)
**Save**: `DOCUMENT_DIR/deployment/` (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)
---
### Step 4: Verification Pass
**Role**: Quality verifier
**Goal**: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.
For each document generated in Steps 1-3:
1. **Entity verification**: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
2. **Interface accuracy**: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
3. **Flow correctness**: for each system flow diagram, trace the actual code path and verify the sequence matches.
4. **Completeness check**: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
5. **Consistency check**: do component docs agree with architecture doc? Do flow diagrams match component interfaces?
Apply corrections inline to the documents that need them.
**Save**: `DOCUMENT_DIR/04_verification_log.md` with:
- Total entities verified vs flagged
- Corrections applied (which document, what changed)
- Remaining gaps or uncertainties
- Completeness score (modules covered / total modules)
**BLOCKING**: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.
**Session boundary**: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (57). These steps produce different artifact types and benefit from fresh context:
```
══════════════════════════════════════
VERIFICATION COMPLETE — session break?
══════════════════════════════════════
Steps 04 (analysis + verification) are done.
Steps 57 (solution + problem extraction + report)
can run in a fresh conversation.
══════════════════════════════════════
A) Continue in this conversation
B) Save and continue in a new conversation (recommended)
══════════════════════════════════════
```
If **Focus Area mode**: Steps 57 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run `/document` again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.
---
### Step 5: Solution Extraction (Retrospective)
**Role**: Software architect
**Goal**: From all verified technical documentation, retrospectively create `solution.md` — the same artifact the research skill produces.
Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):
1. **Product Solution Description**: what the system is, brief component interaction diagram (Mermaid)
2. **Architecture**: the architecture that is implemented, with per-component solution tables:
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|----------|-------|-----------|-------------|-------------|----------|------|-----|
| [actual implementation] | [libs/platforms used] | [observed strengths] | [observed limitations] | [requirements met] | [security approach] | [cost indicators] | [fitness assessment] |
3. **Testing Strategy**: summarize integration/functional tests and non-functional tests found in the codebase
4. **References**: links to key config files, Dockerfiles, CI configs that evidence the solution choices
**Save**: `SOLUTION_DIR/solution.md` (`_docs/01_solution/solution.md`)
---
### Step 6: Problem Extraction (Retrospective)
**Role**: Business analyst
**Goal**: From all verified technical docs, retrospectively derive the high-level problem definition.
#### 6a. `problem.md`
- Synthesize from architecture overview + component purposes + system flows
- What is this system? What problem does it solve? Who are the users? How does it work at a high level?
- Cross-reference with README if one exists
#### 6b. `restrictions.md`
- Extract from: tech stack choices, Dockerfile specs, CI configs, dependency versions, environment configs
- Categorize: Hardware, Software, Environment, Operational
#### 6c. `acceptance_criteria.md`
- Derive from: test assertions, performance configs, health check endpoints, validation rules
- Every criterion must have a measurable value
#### 6d. `input_data/`
- Document data schemas (DB schemas, API request/response types, config file formats)
- Create `data_parameters.md` describing what data the system consumes
#### 6e. `security_approach.md` (only if security code found)
- Authentication, authorization, encryption, secrets handling, CORS, rate limiting, input sanitization
**Save**: all files to `PROBLEM_DIR/` (`_docs/00_problem/`)
**BLOCKING**: Present all problem documents to user. Do NOT proceed until user confirms or requests corrections.
---
### Step 7: Final Report
**Role**: Technical writer
**Goal**: Produce `FINAL_report.md` integrating all generated documentation.
Using `.cursor/skills/plan/templates/final-report.md` as structure:
- Executive summary from architecture + problem docs
- Problem statement (transformed from problem.md, not copy-pasted)
- Architecture overview with tech stack one-liner
- Component summary table (number, name, purpose, dependencies)
- System flows summary table
- Risk observations from verification log (Step 4)
- Open questions (uncertainties flagged during analysis)
- Artifact index listing all generated documents with paths
**Save**: `DOCUMENT_DIR/FINAL_report.md`
**State**: update `state.json` with `current_step: "complete"`.
---
## Escalation Rules
| Situation | Action |
|-----------|--------|
| Minified/obfuscated code detected | WARN user, skip module, note in verification log |
| Module too large for context window | Split into sub-sections, analyze parts separately, combine |
| Cycle in dependency graph | Group cycled modules, analyze together as one doc |
| Generated code (protobuf, swagger-gen) | Note as generated, document the source spec instead |
| No tests found in codebase | Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only |
| Contradictions between code and README | Flag in verification log, ASK user |
| Binary files or non-code assets | Skip, note in discovery |
| `_docs/` already exists | ASK user: overwrite, merge, or use `_docs_generated/` |
| Code intent is ambiguous | ASK user, do not guess |
## Common Mistakes
- **Top-down guessing**: never infer architecture before documenting modules. Build up, don't assume down.
- **Hallucinating entities**: always verify that referenced classes/functions/endpoints actually exist in code.
- **Skipping modules**: every source module must appear in exactly one module doc and one component.
- **Monolithic analysis**: don't try to analyze the entire codebase in one pass. Module by module, in order.
- **Inventing restrictions**: only document constraints actually evidenced in code, configs, or Dockerfiles.
- **Vague acceptance criteria**: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
- **Writing code**: this skill produces documents, never implementation code.
## Quick Reference
```
┌──────────────────────────────────────────────────────────────────┐
│ Bottom-Up Codebase Documentation (8-Step) │
├──────────────────────────────────────────────────────────────────┤
│ MODE: Full / Focus Area (@dir) / Resume (state.json) │
│ PREREQ: Check _docs/ exists (overwrite/merge/new?) │
│ PREREQ: Check state.json for resume │
│ │
│ 0. Discovery → dependency graph, tech stack, topo order │
│ (Focus Area: scoped to FOCUS_DIR + transitive deps) │
│ 1. Module Docs → per-module analysis (leaves first) │
│ (batched ~5 modules; session break between batches) │
│ 2. Component Assembly → group modules, write component specs │
│ [BLOCKING: user confirms components] │
│ 2.5 Module Layout → derive module-layout.md from code │
│ [BLOCKING: user confirms layout] │
│ 3. System Synthesis → architecture, flows, data model, deploy │
│ 4. Verification → compare all docs vs code, fix errors │
│ [BLOCKING: user reviews corrections] │
│ [SESSION BREAK suggested before Steps 57] │
│ ── Focus Area mode stops here ── │
│ 5. Solution Extraction → retrospective solution.md │
│ 6. Problem Extraction → retrospective problem, restrictions, AC │
│ [BLOCKING: user confirms problem docs] │
│ 7. Final Report → FINAL_report.md │
├──────────────────────────────────────────────────────────────────┤
│ Principles: Bottom-up always · Dependencies first │
│ Incremental context · Verify against code │
│ Save immediately · Resume from checkpoint │
│ Batch modules · Session breaks for large codebases │
└──────────────────────────────────────────────────────────────────┘
```