- SKILL.md: hub with frontmatter, principles, mode detection, file index (71 lines) - workflows/full.md: Full/Focus/Resume modes — Steps 0-7 (376 lines) - workflows/task.md: Task mode — incremental doc updates (90 lines) - references/artifacts.md: directory structure, state.json, resumability (70 lines) Made-with: Cursor
20 KiB
Document Skill — Full / Focus Area / Resume Workflow
Covers three related modes that share the same 8-step pipeline:
- Full: entire codebase, no prior state
- Focus Area: scoped to a directory subtree + transitive dependencies
- Resume: continue from
state.jsoncheckpoint
Prerequisite Checks
- If
_docs/already exists and contains files AND mode is Full, ASK user: overwrite, merge, or write to_docs_generated/instead? - Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
- If DOCUMENT_DIR contains a
state.json, offer to resume from last checkpoint or start fresh - If FOCUS_DIR is set, verify the directory exists and contains source files — STOP if missing
Progress Tracking
Create a TodoWrite with all steps (0 through 7). Update status as each step completes.
Steps
Step 0: Codebase Discovery
Role: Code analyst Goal: Build a complete map of the codebase (or targeted subtree) before analyzing any code.
Focus Area scoping: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.
Scan and catalog:
- Directory tree (ignore
node_modules,.git,__pycache__,bin/,obj/, build artifacts) - Language detection from file extensions and config files
- Package manifests:
package.json,requirements.txt,pyproject.toml,*.csproj,Cargo.toml,go.mod - Config files:
Dockerfile,docker-compose.yml,.env.example, CI/CD configs (.github/workflows/,.gitlab-ci.yml,azure-pipelines.yml) - Entry points:
main.*,app.*,index.*,Program.*, startup scripts - Test structure: test directories, test frameworks, test runner configs
- Existing documentation: README,
docs/, wiki references, inline doc coverage - Dependency graph: build a module-level dependency graph by analyzing imports/references. Identify:
- Leaf modules (no internal dependencies)
- Entry points (no internal dependents)
- Cycles (mark for grouped analysis)
- Topological processing order
- If FOCUS_DIR: mark which modules are in-scope vs dependency-only
Save: DOCUMENT_DIR/00_discovery.md containing:
- Directory tree (concise, relevant directories only)
- Tech stack summary table (language, framework, database, infra)
- Dependency graph (textual list + Mermaid diagram)
- Topological processing order
- Entry points and leaf modules
Save: DOCUMENT_DIR/state.json with initial state (see references/artifacts.md for format).
Step 1: Module-Level Documentation
Role: Code analyst Goal: Document every identified module individually, processing in topological order (leaves first).
Batched processing: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update state.json, present a progress summary. Between batches, evaluate whether to suggest a session break.
For each module in topological order:
- Read: read the module's source code. Assess complexity and what context is needed.
- Gather context: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
- Write module doc with these sections:
- Purpose: one-sentence responsibility
- Public interface: exported functions/classes/methods with signatures, input/output types
- Internal logic: key algorithms, patterns, non-obvious behavior
- Dependencies: what it imports internally and why
- Consumers: what uses this module (from the dependency graph)
- Data models: entities/types defined in this module
- Configuration: env vars, config keys consumed
- External integrations: HTTP calls, DB queries, queue operations, file I/O
- Security: auth checks, encryption, input validation, secrets access
- Tests: what tests exist for this module, what they cover
- Verify: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.
Cycle handling: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.
Large modules: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.
Save: DOCUMENT_DIR/modules/[module_name].md for each module.
State: update state.json after each module completes (move from modules_remaining to modules_documented). Increment module_batch after each batch of ~5.
Session break heuristic: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:
══════════════════════════════════════
SESSION BREAK SUGGESTED
══════════════════════════════════════
Modules documented: [X] of [Y]
Batches completed this session: [N]
══════════════════════════════════════
A) Continue in this conversation
B) Save and continue in a fresh conversation (recommended)
══════════════════════════════════════
Recommendation: B — fresh context improves
analysis quality for remaining modules
══════════════════════════════════════
Re-entry is seamless: state.json tracks exactly which modules are done.
Step 2: Component Assembly
Role: Software architect Goal: Group related modules into logical components and produce component specs.
- Analyze module docs from Step 1 to identify natural groupings:
- By directory structure (most common)
- By shared data models or common purpose
- By dependency clusters (tightly coupled modules)
- For each identified component, synthesize its module docs into a single component specification using
.cursor/skills/plan/templates/component-spec.mdas structure:- High-level overview: purpose, pattern, upstream/downstream
- Internal interfaces: method signatures, DTOs (from actual module code)
- External API specification (if the component exposes HTTP/gRPC endpoints)
- Data access patterns: queries, caching, storage estimates
- Implementation details: algorithmic complexity, state management, key libraries
- Extensions and helpers: shared utilities needed
- Caveats and edge cases: limitations, race conditions, bottlenecks
- Dependency graph: implementation order relative to other components
- Logging strategy
- Identify common helpers shared across multiple components → document in
common-helpers/ - Generate component relationship diagram (Mermaid)
Self-verification:
- Every module from Step 1 is covered by exactly one component
- No component has overlapping responsibility with another
- Inter-component interfaces are explicit (who calls whom, with what)
- Component dependency graph has no circular dependencies
Save:
DOCUMENT_DIR/components/[##]_[name]/description.mdper componentDOCUMENT_DIR/common-helpers/[##]_helper_[name].mdper shared helperDOCUMENT_DIR/diagrams/components.md(Mermaid component diagram)
BLOCKING: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.
Step 3: System-Level Synthesis
Role: Software architect Goal: From component docs, synthesize system-level documents.
All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.
3a. Architecture
Using .cursor/skills/plan/templates/architecture.md as structure:
- System context and boundaries from entry points and external integrations
- Tech stack table from discovery (Step 0) + component specs
- Deployment model from Dockerfiles, CI configs, environment strategies
- Data model overview from per-component data access sections
- Integration points from inter-component interfaces
- NFRs from test thresholds, config limits, health checks
- Security architecture from per-module security observations
- Key ADRs inferred from technology choices and patterns
Save: DOCUMENT_DIR/architecture.md
3b. System Flows
Using .cursor/skills/plan/templates/system-flows.md as structure:
- Trace main flows through the component interaction graph
- Entry point → component chain → output for each major flow
- Mermaid sequence diagrams and flowcharts
- Error scenarios from exception handling patterns
- Data flow tables per flow
Save: DOCUMENT_DIR/system-flows.md and DOCUMENT_DIR/diagrams/flows/flow_[name].md
3c. Data Model
- Consolidate all data models from module docs
- Entity-relationship diagram (Mermaid ERD)
- Migration strategy (if ORM/migration tooling detected)
- Seed data observations
- Backward compatibility approach (if versioning found)
Save: DOCUMENT_DIR/data_model.md
3d. Deployment (if Dockerfile/CI configs exist)
- Containerization summary
- CI/CD pipeline structure
- Environment strategy (dev, staging, production)
- Observability (logging patterns, metrics, health checks found in code)
Save: DOCUMENT_DIR/deployment/ (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)
Step 4: Verification Pass
Role: Quality verifier Goal: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.
For each document generated in Steps 1-3:
- Entity verification: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
- Interface accuracy: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
- Flow correctness: for each system flow diagram, trace the actual code path and verify the sequence matches.
- Completeness check: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
- Consistency check: do component docs agree with architecture doc? Do flow diagrams match component interfaces?
Apply corrections inline to the documents that need them.
Save: DOCUMENT_DIR/04_verification_log.md with:
- Total entities verified vs flagged
- Corrections applied (which document, what changed)
- Remaining gaps or uncertainties
- Completeness score (modules covered / total modules)
BLOCKING: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.
Session boundary: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:
══════════════════════════════════════
VERIFICATION COMPLETE — session break?
══════════════════════════════════════
Steps 0–4 (analysis + verification) are done.
Steps 5–7 (solution + problem extraction + report)
can run in a fresh conversation.
══════════════════════════════════════
A) Continue in this conversation
B) Save and continue in a new conversation (recommended)
══════════════════════════════════════
If Focus Area mode: Steps 5–7 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run /document again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.
Step 5: Solution Extraction (Retrospective)
Role: Software architect
Goal: From all verified technical documentation, retrospectively create solution.md — the same artifact the research skill produces.
Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):
- Product Solution Description: what the system is, brief component interaction diagram (Mermaid)
- Architecture: the architecture that is implemented, with per-component solution tables:
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|---|---|---|---|---|---|---|---|
| [actual implementation] | [libs/platforms used] | [observed strengths] | [observed limitations] | [requirements met] | [security approach] | [cost indicators] | [fitness assessment] |
- Testing Strategy: summarize integration/functional tests and non-functional tests found in the codebase
- References: links to key config files, Dockerfiles, CI configs that evidence the solution choices
Save: SOLUTION_DIR/solution.md (_docs/01_solution/solution.md)
Step 6: Problem Extraction (Retrospective)
Role: Business analyst Goal: From all verified technical docs, retrospectively derive the high-level problem definition.
6a. problem.md
- Synthesize from architecture overview + component purposes + system flows
- What is this system? What problem does it solve? Who are the users? How does it work at a high level?
- Cross-reference with README if one exists
6b. restrictions.md
- Extract from: tech stack choices, Dockerfile specs, CI configs, dependency versions, environment configs
- Categorize: Hardware, Software, Environment, Operational
6c. acceptance_criteria.md
- Derive from: test assertions, performance configs, health check endpoints, validation rules
- Every criterion must have a measurable value
6d. input_data/
- Document data schemas (DB schemas, API request/response types, config file formats)
- Create
data_parameters.mddescribing what data the system consumes
6e. security_approach.md (only if security code found)
- Authentication, authorization, encryption, secrets handling, CORS, rate limiting, input sanitization
Save: all files to PROBLEM_DIR/ (_docs/00_problem/)
BLOCKING: Present all problem documents to user. Do NOT proceed until user confirms or requests corrections.
Step 7: Final Report
Role: Technical writer
Goal: Produce FINAL_report.md integrating all generated documentation.
Using .cursor/skills/plan/templates/final-report.md as structure:
- Executive summary from architecture + problem docs
- Problem statement (transformed from problem.md, not copy-pasted)
- Architecture overview with tech stack one-liner
- Component summary table (number, name, purpose, dependencies)
- System flows summary table
- Risk observations from verification log (Step 4)
- Open questions (uncertainties flagged during analysis)
- Artifact index listing all generated documents with paths
Save: DOCUMENT_DIR/FINAL_report.md
State: update state.json with current_step: "complete".
Escalation Rules
| Situation | Action |
|---|---|
| Minified/obfuscated code detected | WARN user, skip module, note in verification log |
| Module too large for context window | Split into sub-sections, analyze parts separately, combine |
| Cycle in dependency graph | Group cycled modules, analyze together as one doc |
| Generated code (protobuf, swagger-gen) | Note as generated, document the source spec instead |
| No tests found in codebase | Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only |
| Contradictions between code and README | Flag in verification log, ASK user |
| Binary files or non-code assets | Skip, note in discovery |
_docs/ already exists |
ASK user: overwrite, merge, or use _docs_generated/ |
| Code intent is ambiguous | ASK user, do not guess |
Common Mistakes
- Top-down guessing: never infer architecture before documenting modules. Build up, don't assume down.
- Hallucinating entities: always verify that referenced classes/functions/endpoints actually exist in code.
- Skipping modules: every source module must appear in exactly one module doc and one component.
- Monolithic analysis: don't try to analyze the entire codebase in one pass. Module by module, in order.
- Inventing restrictions: only document constraints actually evidenced in code, configs, or Dockerfiles.
- Vague acceptance criteria: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
- Writing code: this skill produces documents, never implementation code.
Quick Reference
┌──────────────────────────────────────────────────────────────────┐
│ Bottom-Up Codebase Documentation (8-Step) │
├──────────────────────────────────────────────────────────────────┤
│ MODE: Full / Focus Area (@dir) / Resume (state.json) │
│ PREREQ: Check _docs/ exists (overwrite/merge/new?) │
│ PREREQ: Check state.json for resume │
│ │
│ 0. Discovery → dependency graph, tech stack, topo order │
│ (Focus Area: scoped to FOCUS_DIR + transitive deps) │
│ 1. Module Docs → per-module analysis (leaves first) │
│ (batched ~5 modules; session break between batches) │
│ 2. Component Assembly → group modules, write component specs │
│ [BLOCKING: user confirms components] │
│ 3. System Synthesis → architecture, flows, data model, deploy │
│ 4. Verification → compare all docs vs code, fix errors │
│ [BLOCKING: user reviews corrections] │
│ [SESSION BREAK suggested before Steps 5–7] │
│ ── Focus Area mode stops here ── │
│ 5. Solution Extraction → retrospective solution.md │
│ 6. Problem Extraction → retrospective problem, restrictions, AC │
│ [BLOCKING: user confirms problem docs] │
│ 7. Final Report → FINAL_report.md │
├──────────────────────────────────────────────────────────────────┤
│ Principles: Bottom-up always · Dependencies first │
│ Incremental context · Verify against code │
│ Save immediately · Resume from checkpoint │
│ Batch modules · Session breaks for large codebases │
└──────────────────────────────────────────────────────────────────┘