- Update module docs: main, inference, ai_config, loader_http_client - Add new module doc: media_hash - Update component docs: inference_pipeline, api - Update system-flows (F2, F3) and data_parameters - Add Task Mode to document skill for incremental doc updates - Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14 Made-with: Cursor
30 KiB
name, description, category, tags, disable-model-invocation
| name | description | category | tags | disable-model-invocation | |||||
|---|---|---|---|---|---|---|---|---|---|
| document | Bottom-up codebase documentation skill. Analyzes existing code from modules up through components to architecture, then retrospectively derives problem/restrictions/acceptance criteria. Produces the same _docs/ artifacts as the problem, research, and plan skills, but from code analysis instead of user interview. Trigger phrases: - "document", "document codebase", "document this project" - "documentation", "generate documentation", "create documentation" - "reverse-engineer docs", "code to docs" - "analyze and document" | build |
|
true |
Bottom-Up Codebase Documentation
Analyze an existing codebase from the bottom up — individual modules first, then components, then system-level architecture — and produce the same _docs/ artifacts that the problem and plan skills generate, without requiring user interview.
Core Principles
- Bottom-up always: module docs -> component specs -> architecture/flows -> solution -> problem extraction. Every higher level is synthesized from the level below.
- Dependencies first: process modules in topological order (leaves first). When documenting module X, all of X's dependencies already have docs.
- Incremental context: each module's doc uses already-written dependency docs as context — no ever-growing chain.
- Verify against code: cross-reference every entity in generated docs against actual codebase. Catch hallucinations.
- Save immediately: write each artifact as soon as its step completes. Enable resume from any checkpoint.
- Ask, don't assume: when code intent is ambiguous, ASK the user before proceeding.
Context Resolution
Fixed paths:
- DOCUMENT_DIR:
_docs/02_document/ - SOLUTION_DIR:
_docs/01_solution/ - PROBLEM_DIR:
_docs/00_problem/
Optional input:
- FOCUS_DIR: a specific directory subtree provided by the user (e.g.,
/document @src/api/). When set, only this subtree and its transitive dependencies are analyzed.
Announce resolved paths (and FOCUS_DIR if set) to user before proceeding.
Mode Detection
Determine the execution mode before any other logic:
| Mode | Trigger | Scope |
|---|---|---|
| Full | No input file, no existing state | Entire codebase |
| Focus Area | User provides a directory path (e.g., @src/api/) |
Only the specified subtree + transitive dependencies |
| Resume | state.json exists in DOCUMENT_DIR |
Continue from last checkpoint |
| Task | User provides a task spec file (e.g., @_docs/02_tasks/done/AZ-173_*.md) AND _docs/02_document/ has existing docs |
Targeted update of docs affected by the task |
Focus Area mode produces module + component docs for the targeted area only. It can be run repeatedly for different areas — each run appends to the existing module and component docs without overwriting other areas.
Task mode is a lightweight, incremental update — it reads the task spec to identify which files changed, then updates only the affected module docs, component docs, system flows, and data parameters. It does NOT redo discovery, verification, or problem extraction. See Task Mode Workflow below.
Prerequisite Checks
- If
_docs/already exists and contains files AND mode is Full, ASK user: overwrite, merge, or write to_docs_generated/instead? - Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
- If DOCUMENT_DIR contains a
state.json, offer to resume from last checkpoint or start fresh - If FOCUS_DIR is set, verify the directory exists and contains source files — STOP if missing
Progress Tracking
Create a TodoWrite with all steps (0 through 7). Update status as each step completes.
Workflow
Step 0: Codebase Discovery
Role: Code analyst Goal: Build a complete map of the codebase (or targeted subtree) before analyzing any code.
Focus Area scoping: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.
Scan and catalog:
- Directory tree (ignore
node_modules,.git,__pycache__,bin/,obj/, build artifacts) - Language detection from file extensions and config files
- Package manifests:
package.json,requirements.txt,pyproject.toml,*.csproj,Cargo.toml,go.mod - Config files:
Dockerfile,docker-compose.yml,.env.example, CI/CD configs (.github/workflows/,.gitlab-ci.yml,azure-pipelines.yml) - Entry points:
main.*,app.*,index.*,Program.*, startup scripts - Test structure: test directories, test frameworks, test runner configs
- Existing documentation: README,
docs/, wiki references, inline doc coverage - Dependency graph: build a module-level dependency graph by analyzing imports/references. Identify:
- Leaf modules (no internal dependencies)
- Entry points (no internal dependents)
- Cycles (mark for grouped analysis)
- Topological processing order
- If FOCUS_DIR: mark which modules are in-scope vs dependency-only
Save: DOCUMENT_DIR/00_discovery.md containing:
- Directory tree (concise, relevant directories only)
- Tech stack summary table (language, framework, database, infra)
- Dependency graph (textual list + Mermaid diagram)
- Topological processing order
- Entry points and leaf modules
Save: DOCUMENT_DIR/state.json with initial state:
{
"current_step": "module-analysis",
"completed_steps": ["discovery"],
"focus_dir": null,
"modules_total": 0,
"modules_documented": [],
"modules_remaining": [],
"module_batch": 0,
"components_written": [],
"last_updated": ""
}
Set focus_dir to the FOCUS_DIR path if in Focus Area mode, or null for Full mode.
Step 1: Module-Level Documentation
Role: Code analyst Goal: Document every identified module individually, processing in topological order (leaves first).
Batched processing: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update state.json, present a progress summary. Between batches, evaluate whether to suggest a session break.
For each module in topological order:
- Read: read the module's source code. Assess complexity and what context is needed.
- Gather context: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
- Write module doc with these sections:
- Purpose: one-sentence responsibility
- Public interface: exported functions/classes/methods with signatures, input/output types
- Internal logic: key algorithms, patterns, non-obvious behavior
- Dependencies: what it imports internally and why
- Consumers: what uses this module (from the dependency graph)
- Data models: entities/types defined in this module
- Configuration: env vars, config keys consumed
- External integrations: HTTP calls, DB queries, queue operations, file I/O
- Security: auth checks, encryption, input validation, secrets access
- Tests: what tests exist for this module, what they cover
- Verify: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.
Cycle handling: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.
Large modules: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.
Save: DOCUMENT_DIR/modules/[module_name].md for each module.
State: update state.json after each module completes (move from modules_remaining to modules_documented). Increment module_batch after each batch of ~5.
Session break heuristic: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:
══════════════════════════════════════
SESSION BREAK SUGGESTED
══════════════════════════════════════
Modules documented: [X] of [Y]
Batches completed this session: [N]
══════════════════════════════════════
A) Continue in this conversation
B) Save and continue in a fresh conversation (recommended)
══════════════════════════════════════
Recommendation: B — fresh context improves
analysis quality for remaining modules
══════════════════════════════════════
Re-entry is seamless: state.json tracks exactly which modules are done.
Step 2: Component Assembly
Role: Software architect Goal: Group related modules into logical components and produce component specs.
- Analyze module docs from Step 1 to identify natural groupings:
- By directory structure (most common)
- By shared data models or common purpose
- By dependency clusters (tightly coupled modules)
- For each identified component, synthesize its module docs into a single component specification using
.cursor/skills/plan/templates/component-spec.mdas structure:- High-level overview: purpose, pattern, upstream/downstream
- Internal interfaces: method signatures, DTOs (from actual module code)
- External API specification (if the component exposes HTTP/gRPC endpoints)
- Data access patterns: queries, caching, storage estimates
- Implementation details: algorithmic complexity, state management, key libraries
- Extensions and helpers: shared utilities needed
- Caveats and edge cases: limitations, race conditions, bottlenecks
- Dependency graph: implementation order relative to other components
- Logging strategy
- Identify common helpers shared across multiple components -> document in
common-helpers/ - Generate component relationship diagram (Mermaid)
Self-verification:
- Every module from Step 1 is covered by exactly one component
- No component has overlapping responsibility with another
- Inter-component interfaces are explicit (who calls whom, with what)
- Component dependency graph has no circular dependencies
Save:
DOCUMENT_DIR/components/[##]_[name]/description.mdper componentDOCUMENT_DIR/common-helpers/[##]_helper_[name].mdper shared helperDOCUMENT_DIR/diagrams/components.md(Mermaid component diagram)
BLOCKING: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.
Step 3: System-Level Synthesis
Role: Software architect Goal: From component docs, synthesize system-level documents.
All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.
3a. Architecture
Using .cursor/skills/plan/templates/architecture.md as structure:
- System context and boundaries from entry points and external integrations
- Tech stack table from discovery (Step 0) + component specs
- Deployment model from Dockerfiles, CI configs, environment strategies
- Data model overview from per-component data access sections
- Integration points from inter-component interfaces
- NFRs from test thresholds, config limits, health checks
- Security architecture from per-module security observations
- Key ADRs inferred from technology choices and patterns
Save: DOCUMENT_DIR/architecture.md
3b. System Flows
Using .cursor/skills/plan/templates/system-flows.md as structure:
- Trace main flows through the component interaction graph
- Entry point -> component chain -> output for each major flow
- Mermaid sequence diagrams and flowcharts
- Error scenarios from exception handling patterns
- Data flow tables per flow
Save: DOCUMENT_DIR/system-flows.md and DOCUMENT_DIR/diagrams/flows/flow_[name].md
3c. Data Model
- Consolidate all data models from module docs
- Entity-relationship diagram (Mermaid ERD)
- Migration strategy (if ORM/migration tooling detected)
- Seed data observations
- Backward compatibility approach (if versioning found)
Save: DOCUMENT_DIR/data_model.md
3d. Deployment (if Dockerfile/CI configs exist)
- Containerization summary
- CI/CD pipeline structure
- Environment strategy (dev, staging, production)
- Observability (logging patterns, metrics, health checks found in code)
Save: DOCUMENT_DIR/deployment/ (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)
Step 4: Verification Pass
Role: Quality verifier Goal: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.
For each document generated in Steps 1-3:
- Entity verification: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
- Interface accuracy: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
- Flow correctness: for each system flow diagram, trace the actual code path and verify the sequence matches.
- Completeness check: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
- Consistency check: do component docs agree with architecture doc? Do flow diagrams match component interfaces?
Apply corrections inline to the documents that need them.
Save: DOCUMENT_DIR/04_verification_log.md with:
- Total entities verified vs flagged
- Corrections applied (which document, what changed)
- Remaining gaps or uncertainties
- Completeness score (modules covered / total modules)
BLOCKING: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.
Session boundary: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:
══════════════════════════════════════
VERIFICATION COMPLETE — session break?
══════════════════════════════════════
Steps 0–4 (analysis + verification) are done.
Steps 5–7 (solution + problem extraction + report)
can run in a fresh conversation.
══════════════════════════════════════
A) Continue in this conversation
B) Save and continue in a new conversation (recommended)
══════════════════════════════════════
If Focus Area mode: Steps 5–7 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run /document again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.
Step 5: Solution Extraction (Retrospective)
Role: Software architect
Goal: From all verified technical documentation, retrospectively create solution.md — the same artifact the research skill produces. This makes downstream skills (plan, deploy, decompose) compatible with the documented codebase.
Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):
- Product Solution Description: what the system is, brief component interaction diagram (Mermaid)
- Architecture: the architecture that is implemented, with per-component solution tables:
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|---|---|---|---|---|---|---|---|
| [actual implementation] | [libs/platforms used] | [observed strengths] | [observed limitations] | [requirements met] | [security approach] | [cost indicators] | [fitness assessment] |
- Testing Strategy: summarize integration/functional tests and non-functional tests found in the codebase
- References: links to key config files, Dockerfiles, CI configs that evidence the solution choices
Save: SOLUTION_DIR/solution.md (_docs/01_solution/solution.md)
Step 6: Problem Extraction (Retrospective)
Role: Business analyst
Goal: From all verified technical docs, retrospectively derive the high-level problem definition — producing the same documents the problem skill creates through interview.
This is the inverse of normal workflow: instead of problem -> solution -> code, we go code -> technical docs -> problem understanding.
6a. problem.md
- Synthesize from architecture overview + component purposes + system flows
- What is this system? What problem does it solve? Who are the users? How does it work at a high level?
- Cross-reference with README if one exists
- Free-form text, concise, readable by someone unfamiliar with the project
6b. restrictions.md
- Extract from: tech stack choices, Dockerfile specs (OS, base images), CI configs (platform constraints), dependency versions, environment configs
- Categorize with headers: Hardware, Software, Environment, Operational
- Each restriction should be specific and testable
6c. acceptance_criteria.md
- Derive from: test assertions (expected values, thresholds), performance configs (timeouts, rate limits, batch sizes), health check endpoints, validation rules in code
- Categorize with headers by domain
- Every criterion must have a measurable value — if only implied, note the source
6d. input_data/
- Document data schemas found (DB schemas, API request/response types, config file formats)
- Create
data_parameters.mddescribing what data the system consumes, formats, volumes, update patterns
6e. security_approach.md (only if security code found)
- Authentication mechanisms, authorization patterns, encryption, secrets handling, CORS, rate limiting, input sanitization — all from code observations
- If no security-relevant code found, skip this file
Save: all files to PROBLEM_DIR/ (_docs/00_problem/)
BLOCKING: Present all problem documents to user. These are the most abstracted and therefore most prone to interpretation error. Do NOT proceed until user confirms or requests corrections.
Step 7: Final Report
Role: Technical writer
Goal: Produce FINAL_report.md integrating all generated documentation.
Using .cursor/skills/plan/templates/final-report.md as structure:
- Executive summary from architecture + problem docs
- Problem statement (transformed from problem.md, not copy-pasted)
- Architecture overview with tech stack one-liner
- Component summary table (number, name, purpose, dependencies)
- System flows summary table
- Risk observations from verification log (Step 4)
- Open questions (uncertainties flagged during analysis)
- Artifact index listing all generated documents with paths
Save: DOCUMENT_DIR/FINAL_report.md
State: update state.json with current_step: "complete".
Task Mode Workflow
Triggered when a task spec file is provided as input AND _docs/02_document/ already contains module/component docs.
Accepts: one or more task spec files (from _docs/02_tasks/todo/ or _docs/02_tasks/done/).
Task Step 0: Scope Analysis
- Read each task spec — extract the "Files Modified" or "Scope / Included" section to identify which source files were changed
- Map changed source files to existing module docs in
DOCUMENT_DIR/modules/ - Map affected modules to their parent components in
DOCUMENT_DIR/components/ - Identify which higher-level docs might be affected (system-flows, data_model, data_parameters)
Output: a list of docs to update, organized by level:
- Module docs (direct matches)
- Component docs (parents of affected modules)
- System-level docs (only if the task changed API endpoints, data models, or external integrations)
- Problem-level docs (only if the task changed input parameters, acceptance criteria, or restrictions)
Task Step 1: Module Doc Updates
For each affected module:
- Read the current source file
- Read the existing module doc
- Diff the module doc against current code — identify:
- New functions/methods/classes not in the doc
- Removed functions/methods/classes still in the doc
- Changed signatures or behavior
- New/removed dependencies
- New/removed external integrations
- Update the module doc in-place, preserving the existing structure and style
- If a module is entirely new (no existing doc), create a new module doc following the standard template
Task Step 2: Component Doc Updates
For each affected component:
- Read all module docs belonging to this component (including freshly updated ones)
- Read the existing component doc
- Update internal interfaces, dependency graphs, implementation details, and caveats sections
- Do NOT change the component's purpose, pattern, or high-level overview unless the task fundamentally changed it
Task Step 3: System-Level Doc Updates (conditional)
Only if the task changed API endpoints, system flows, data models, or external integrations:
- Update
system-flows.md— modify affected flow diagrams and data flow tables - Update
data_model.md— if entities changed - Update
architecture.md— only if new external integrations or architectural patterns were added
Task Step 4: Problem-Level Doc Updates (conditional)
Only if the task changed API input parameters, configuration, or acceptance criteria:
- Update
_docs/00_problem/input_data/data_parameters.md - Update
_docs/00_problem/acceptance_criteria.md— if new testable criteria emerged
Task Step 5: Summary
Present a summary of all docs updated:
══════════════════════════════════════
DOCUMENTATION UPDATE COMPLETE
══════════════════════════════════════
Task(s): [task IDs]
Module docs updated: [count]
Component docs updated: [count]
System-level docs updated: [list or "none"]
Problem-level docs updated: [list or "none"]
══════════════════════════════════════
Task Mode Principles
- Minimal changes: only update what the task actually changed. Do not rewrite unaffected sections.
- Preserve style: match the existing doc's structure, tone, and level of detail.
- Verify against code: for every entity added or changed in a doc, confirm it exists in the current source.
- New modules: if the task introduced an entirely new source file, create a new module doc from the standard template.
- Dead references: if the task removed code, remove the corresponding doc entries. Do not keep stale references.
Artifact Management
Directory Structure
_docs/
├── 00_problem/ # Step 6 (retrospective)
│ ├── problem.md
│ ├── restrictions.md
│ ├── acceptance_criteria.md
│ ├── input_data/
│ │ └── data_parameters.md
│ └── security_approach.md
├── 01_solution/ # Step 5 (retrospective)
│ └── solution.md
└── 02_document/ # DOCUMENT_DIR
├── 00_discovery.md # Step 0
├── modules/ # Step 1
│ ├── [module_name].md
│ └── ...
├── components/ # Step 2
│ ├── 01_[name]/description.md
│ ├── 02_[name]/description.md
│ └── ...
├── common-helpers/ # Step 2
├── architecture.md # Step 3
├── system-flows.md # Step 3
├── data_model.md # Step 3
├── deployment/ # Step 3
├── diagrams/ # Steps 2-3
│ ├── components.md
│ └── flows/
├── 04_verification_log.md # Step 4
├── FINAL_report.md # Step 7
└── state.json # Resumability
Resumability
Maintain DOCUMENT_DIR/state.json:
{
"current_step": "module-analysis",
"completed_steps": ["discovery"],
"focus_dir": null,
"modules_total": 12,
"modules_documented": ["utils/helpers", "models/user"],
"modules_remaining": ["services/auth", "api/endpoints"],
"module_batch": 1,
"components_written": [],
"last_updated": "2026-03-21T14:00:00Z"
}
Update after each module/component completes. If interrupted, resume from next undocumented module.
When resuming:
- Read
state.json - Cross-check against actual files in DOCUMENT_DIR (trust files over state if they disagree)
- Continue from the next incomplete item
- Inform user which steps are being skipped
Save Principles
- Save immediately: write each module doc as soon as analysis completes
- Incremental context: each subsequent module uses already-written docs as context
- Preserve intermediates: keep all module docs even after synthesis into component docs
- Enable recovery: state file tracks exact progress for resume
Escalation Rules
| Situation | Action |
|---|---|
| Minified/obfuscated code detected | WARN user, skip module, note in verification log |
| Module too large for context window | Split into sub-sections, analyze parts separately, combine |
| Cycle in dependency graph | Group cycled modules, analyze together as one doc |
| Generated code (protobuf, swagger-gen) | Note as generated, document the source spec instead |
| No tests found in codebase | Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only |
| Contradictions between code and README | Flag in verification log, ASK user |
| Binary files or non-code assets | Skip, note in discovery |
_docs/ already exists |
ASK user: overwrite, merge, or use _docs_generated/ |
| Code intent is ambiguous | ASK user, do not guess |
Common Mistakes
- Top-down guessing: never infer architecture before documenting modules. Build up, don't assume down.
- Hallucinating entities: always verify that referenced classes/functions/endpoints actually exist in code.
- Skipping modules: every source module must appear in exactly one module doc and one component.
- Monolithic analysis: don't try to analyze the entire codebase in one pass. Module by module, in order.
- Inventing restrictions: only document constraints actually evidenced in code, configs, or Dockerfiles.
- Vague acceptance criteria: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
- Writing code: this skill produces documents, never implementation code.
Methodology Quick Reference
┌──────────────────────────────────────────────────────────────────┐
│ Bottom-Up Codebase Documentation (8-Step) │
├──────────────────────────────────────────────────────────────────┤
│ MODE: Full / Focus Area (@dir) / Resume (state.json) │
│ PREREQ: Check _docs/ exists (overwrite/merge/new?) │
│ PREREQ: Check state.json for resume │
│ │
│ 0. Discovery → dependency graph, tech stack, topo order │
│ (Focus Area: scoped to FOCUS_DIR + transitive deps) │
│ 1. Module Docs → per-module analysis (leaves first) │
│ (batched ~5 modules; session break between batches) │
│ 2. Component Assembly → group modules, write component specs │
│ [BLOCKING: user confirms components] │
│ 3. System Synthesis → architecture, flows, data model, deploy │
│ 4. Verification → compare all docs vs code, fix errors │
│ [BLOCKING: user reviews corrections] │
│ [SESSION BREAK suggested before Steps 5–7] │
│ ── Focus Area mode stops here ── │
│ 5. Solution Extraction → retrospective solution.md │
│ 6. Problem Extraction → retrospective problem, restrictions, AC │
│ [BLOCKING: user confirms problem docs] │
│ 7. Final Report → FINAL_report.md │
├──────────────────────────────────────────────────────────────────┤
│ Principles: Bottom-up always · Dependencies first │
│ Incremental context · Verify against code │
│ Save immediately · Resume from checkpoint │
│ Batch modules · Session breaks for large codebases │
└──────────────────────────────────────────────────────────────────┘