Files
admin/.cursor/skills/document/workflows/full.md
T
2026-04-18 22:03:43 +03:00

22 KiB
Raw Blame History

Document Skill — Full / Focus Area / Resume Workflow

Covers three related modes that share the same 8-step pipeline:

  • Full: entire codebase, no prior state
  • Focus Area: scoped to a directory subtree + transitive dependencies
  • Resume: continue from state.json checkpoint

Prerequisite Checks

  1. If _docs/ already exists and contains files AND mode is Full, ASK user: overwrite, merge, or write to _docs_generated/ instead?
  2. Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
  3. If DOCUMENT_DIR contains a state.json, offer to resume from last checkpoint or start fresh
  4. If FOCUS_DIR is set, verify the directory exists and contains source files — STOP if missing

Progress Tracking

Create a TodoWrite with all steps (0 through 7). Update status as each step completes.

Steps

Step 0: Codebase Discovery

Role: Code analyst Goal: Build a complete map of the codebase (or targeted subtree) before analyzing any code.

Focus Area scoping: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.

Scan and catalog:

  1. Directory tree (ignore node_modules, .git, __pycache__, bin/, obj/, build artifacts)
  2. Language detection from file extensions and config files
  3. Package manifests: package.json, requirements.txt, pyproject.toml, *.csproj, Cargo.toml, go.mod
  4. Config files: Dockerfile, docker-compose.yml, .env.example, CI/CD configs (.github/workflows/, .gitlab-ci.yml, azure-pipelines.yml)
  5. Entry points: main.*, app.*, index.*, Program.*, startup scripts
  6. Test structure: test directories, test frameworks, test runner configs
  7. Existing documentation: README, docs/, wiki references, inline doc coverage
  8. Dependency graph: build a module-level dependency graph by analyzing imports/references. Identify:
    • Leaf modules (no internal dependencies)
    • Entry points (no internal dependents)
    • Cycles (mark for grouped analysis)
    • Topological processing order
    • If FOCUS_DIR: mark which modules are in-scope vs dependency-only

Save: DOCUMENT_DIR/00_discovery.md containing:

  • Directory tree (concise, relevant directories only)
  • Tech stack summary table (language, framework, database, infra)
  • Dependency graph (textual list + Mermaid diagram)
  • Topological processing order
  • Entry points and leaf modules

Save: DOCUMENT_DIR/state.json with initial state (see references/artifacts.md for format).


Step 1: Module-Level Documentation

Role: Code analyst Goal: Document every identified module individually, processing in topological order (leaves first).

Batched processing: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update state.json, present a progress summary. Between batches, evaluate whether to suggest a session break.

For each module in topological order:

  1. Read: read the module's source code. Assess complexity and what context is needed.
  2. Gather context: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
  3. Write module doc with these sections:
    • Purpose: one-sentence responsibility
    • Public interface: exported functions/classes/methods with signatures, input/output types
    • Internal logic: key algorithms, patterns, non-obvious behavior
    • Dependencies: what it imports internally and why
    • Consumers: what uses this module (from the dependency graph)
    • Data models: entities/types defined in this module
    • Configuration: env vars, config keys consumed
    • External integrations: HTTP calls, DB queries, queue operations, file I/O
    • Security: auth checks, encryption, input validation, secrets access
    • Tests: what tests exist for this module, what they cover
  4. Verify: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.

Cycle handling: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.

Large modules: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.

Save: DOCUMENT_DIR/modules/[module_name].md for each module. State: update state.json after each module completes (move from modules_remaining to modules_documented). Increment module_batch after each batch of ~5.

Session break heuristic: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:

══════════════════════════════════════
 SESSION BREAK SUGGESTED
══════════════════════════════════════
 Modules documented: [X] of [Y]
 Batches completed this session: [N]
══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a fresh conversation (recommended)
══════════════════════════════════════
 Recommendation: B — fresh context improves
 analysis quality for remaining modules
══════════════════════════════════════

Re-entry is seamless: state.json tracks exactly which modules are done.


Step 2: Component Assembly

Role: Software architect Goal: Group related modules into logical components and produce component specs.

  1. Analyze module docs from Step 1 to identify natural groupings:
    • By directory structure (most common)
    • By shared data models or common purpose
    • By dependency clusters (tightly coupled modules)
  2. For each identified component, synthesize its module docs into a single component specification using .cursor/skills/plan/templates/component-spec.md as structure:
    • High-level overview: purpose, pattern, upstream/downstream
    • Internal interfaces: method signatures, DTOs (from actual module code)
    • External API specification (if the component exposes HTTP/gRPC endpoints)
    • Data access patterns: queries, caching, storage estimates
    • Implementation details: algorithmic complexity, state management, key libraries
    • Extensions and helpers: shared utilities needed
    • Caveats and edge cases: limitations, race conditions, bottlenecks
    • Dependency graph: implementation order relative to other components
    • Logging strategy
  3. Identify common helpers shared across multiple components → document in common-helpers/
  4. Generate component relationship diagram (Mermaid)

Self-verification:

  • Every module from Step 1 is covered by exactly one component
  • No component has overlapping responsibility with another
  • Inter-component interfaces are explicit (who calls whom, with what)
  • Component dependency graph has no circular dependencies

Save:

  • DOCUMENT_DIR/components/[##]_[name]/description.md per component
  • DOCUMENT_DIR/common-helpers/[##]_helper_[name].md per shared helper
  • DOCUMENT_DIR/diagrams/components.md (Mermaid component diagram)

BLOCKING: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.


Step 2.5: Module Layout Derivation

Role: Software architect Goal: Produce _docs/02_document/module-layout.md — the authoritative file-ownership map read by /implement Step 4, /code-review Phase 7, and /refactor discovery. Required for any downstream skill that assigns file ownership or checks architectural layering.

This step derives the layout from the existing codebase rather than from a plan. Decompose Step 1.5 is the greenfield counterpart and uses the same template; this step uses the same output shape so downstream consumers don't branch on origin.

  1. For each component identified in Step 2, resolve its owning directory from module docs (Step 1) and from directory groupings used in Step 2.
  2. For each component, compute:
    • Public API: exported symbols. Language-specific: Python — __init__.py re-exports + non-underscore root-level symbols; TypeScript — index.ts / barrel exports; C# — public types in the namespace root; Rust — pub items in lib.rs / mod.rs; Go — exported (capitalized) identifiers in the package root.
    • Internal: everything else under the component's directory.
    • Owns: the component's directory glob.
    • Imports from: other components whose Public API this one references (parse imports; reuse tooling from Step 0's dependency graph).
    • Consumed by: reverse of Imports from across all components.
  3. Identify shared/* directories already present in the code (or infer candidates: modules imported by ≥2 components and owning no domain logic). Create a Shared / Cross-Cutting entry per concern.
  4. Infer the Allowed Dependencies layering table by topologically sorting the import graph built in step 2. Components that import only from shared/* go to Layer 1; each successive layer imports only from lower layers.
  5. Write _docs/02_document/module-layout.md using .cursor/skills/decompose/templates/module-layout.md. At the top of the file add **Status**: derived-from-code and a ## Verification Needed block listing any inference that was not clean (detected cycles, ambiguous ownership, components not cleanly assignable to a layer).

Self-verification:

  • Every component from Step 2 has a Per-Component Mapping entry
  • Every Public API list is grounded in an actual exported symbol (no guesses)
  • No component's Imports from points at a component in a higher layer
  • Shared directories detected in code are listed under Shared / Cross-Cutting
  • Cycles from Step 0 that span components are surfaced in ## Verification Needed

Save: _docs/02_document/module-layout.md

BLOCKING: Present the layering table and the ## Verification Needed block to the user. Do NOT proceed until the user confirms (or patches) the derived layout. Downstream skills assume this file is accurate.


Step 3: System-Level Synthesis

Role: Software architect Goal: From component docs, synthesize system-level documents.

All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.

3a. Architecture

Using .cursor/skills/plan/templates/architecture.md as structure:

  • System context and boundaries from entry points and external integrations
  • Tech stack table from discovery (Step 0) + component specs
  • Deployment model from Dockerfiles, CI configs, environment strategies
  • Data model overview from per-component data access sections
  • Integration points from inter-component interfaces
  • NFRs from test thresholds, config limits, health checks
  • Security architecture from per-module security observations
  • Key ADRs inferred from technology choices and patterns

Save: DOCUMENT_DIR/architecture.md

3b. System Flows

Using .cursor/skills/plan/templates/system-flows.md as structure:

  • Trace main flows through the component interaction graph
  • Entry point → component chain → output for each major flow
  • Mermaid sequence diagrams and flowcharts
  • Error scenarios from exception handling patterns
  • Data flow tables per flow

Save: DOCUMENT_DIR/system-flows.md and DOCUMENT_DIR/diagrams/flows/flow_[name].md

3c. Data Model

  • Consolidate all data models from module docs
  • Entity-relationship diagram (Mermaid ERD)
  • Migration strategy (if ORM/migration tooling detected)
  • Seed data observations
  • Backward compatibility approach (if versioning found)

Save: DOCUMENT_DIR/data_model.md

3d. Deployment (if Dockerfile/CI configs exist)

  • Containerization summary
  • CI/CD pipeline structure
  • Environment strategy (dev, staging, production)
  • Observability (logging patterns, metrics, health checks found in code)

Save: DOCUMENT_DIR/deployment/ (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)


Step 4: Verification Pass

Role: Quality verifier Goal: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.

For each document generated in Steps 1-3:

  1. Entity verification: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
  2. Interface accuracy: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
  3. Flow correctness: for each system flow diagram, trace the actual code path and verify the sequence matches.
  4. Completeness check: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
  5. Consistency check: do component docs agree with architecture doc? Do flow diagrams match component interfaces?

Apply corrections inline to the documents that need them.

Save: DOCUMENT_DIR/04_verification_log.md with:

  • Total entities verified vs flagged
  • Corrections applied (which document, what changed)
  • Remaining gaps or uncertainties
  • Completeness score (modules covered / total modules)

BLOCKING: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.

Session boundary: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (57). These steps produce different artifact types and benefit from fresh context:

══════════════════════════════════════
 VERIFICATION COMPLETE — session break?
══════════════════════════════════════
 Steps 04 (analysis + verification) are done.
 Steps 57 (solution + problem extraction + report)
 can run in a fresh conversation.
══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a new conversation (recommended)
══════════════════════════════════════

If Focus Area mode: Steps 57 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run /document again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.


Step 5: Solution Extraction (Retrospective)

Role: Software architect Goal: From all verified technical documentation, retrospectively create solution.md — the same artifact the research skill produces.

Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):

  1. Product Solution Description: what the system is, brief component interaction diagram (Mermaid)
  2. Architecture: the architecture that is implemented, with per-component solution tables:
Solution Tools Advantages Limitations Requirements Security Cost Fit
[actual implementation] [libs/platforms used] [observed strengths] [observed limitations] [requirements met] [security approach] [cost indicators] [fitness assessment]
  1. Testing Strategy: summarize integration/functional tests and non-functional tests found in the codebase
  2. References: links to key config files, Dockerfiles, CI configs that evidence the solution choices

Save: SOLUTION_DIR/solution.md (_docs/01_solution/solution.md)


Step 6: Problem Extraction (Retrospective)

Role: Business analyst Goal: From all verified technical docs, retrospectively derive the high-level problem definition.

6a. problem.md

  • Synthesize from architecture overview + component purposes + system flows
  • What is this system? What problem does it solve? Who are the users? How does it work at a high level?
  • Cross-reference with README if one exists

6b. restrictions.md

  • Extract from: tech stack choices, Dockerfile specs, CI configs, dependency versions, environment configs
  • Categorize: Hardware, Software, Environment, Operational

6c. acceptance_criteria.md

  • Derive from: test assertions, performance configs, health check endpoints, validation rules
  • Every criterion must have a measurable value

6d. input_data/

  • Document data schemas (DB schemas, API request/response types, config file formats)
  • Create data_parameters.md describing what data the system consumes

6e. security_approach.md (only if security code found)

  • Authentication, authorization, encryption, secrets handling, CORS, rate limiting, input sanitization

Save: all files to PROBLEM_DIR/ (_docs/00_problem/)

BLOCKING: Present all problem documents to user. Do NOT proceed until user confirms or requests corrections.


Step 7: Final Report

Role: Technical writer Goal: Produce FINAL_report.md integrating all generated documentation.

Using .cursor/skills/plan/templates/final-report.md as structure:

  • Executive summary from architecture + problem docs
  • Problem statement (transformed from problem.md, not copy-pasted)
  • Architecture overview with tech stack one-liner
  • Component summary table (number, name, purpose, dependencies)
  • System flows summary table
  • Risk observations from verification log (Step 4)
  • Open questions (uncertainties flagged during analysis)
  • Artifact index listing all generated documents with paths

Save: DOCUMENT_DIR/FINAL_report.md

State: update state.json with current_step: "complete".


Escalation Rules

Situation Action
Minified/obfuscated code detected WARN user, skip module, note in verification log
Module too large for context window Split into sub-sections, analyze parts separately, combine
Cycle in dependency graph Group cycled modules, analyze together as one doc
Generated code (protobuf, swagger-gen) Note as generated, document the source spec instead
No tests found in codebase Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only
Contradictions between code and README Flag in verification log, ASK user
Binary files or non-code assets Skip, note in discovery
_docs/ already exists ASK user: overwrite, merge, or use _docs_generated/
Code intent is ambiguous ASK user, do not guess

Common Mistakes

  • Top-down guessing: never infer architecture before documenting modules. Build up, don't assume down.
  • Hallucinating entities: always verify that referenced classes/functions/endpoints actually exist in code.
  • Skipping modules: every source module must appear in exactly one module doc and one component.
  • Monolithic analysis: don't try to analyze the entire codebase in one pass. Module by module, in order.
  • Inventing restrictions: only document constraints actually evidenced in code, configs, or Dockerfiles.
  • Vague acceptance criteria: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
  • Writing code: this skill produces documents, never implementation code.

Quick Reference

┌──────────────────────────────────────────────────────────────────┐
│          Bottom-Up Codebase Documentation (8-Step)               │
├──────────────────────────────────────────────────────────────────┤
│ MODE: Full / Focus Area (@dir) / Resume (state.json)             │
│ PREREQ: Check _docs/ exists (overwrite/merge/new?)               │
│ PREREQ: Check state.json for resume                              │
│                                                                  │
│ 0. Discovery          → dependency graph, tech stack, topo order │
│    (Focus Area: scoped to FOCUS_DIR + transitive deps)           │
│ 1. Module Docs        → per-module analysis (leaves first)       │
│    (batched ~5 modules; session break between batches)           │
│ 2. Component Assembly → group modules, write component specs     │
│    [BLOCKING: user confirms components]                          │
│ 2.5 Module Layout    → derive module-layout.md from code         │
│    [BLOCKING: user confirms layout]                              │
│ 3. System Synthesis   → architecture, flows, data model, deploy  │
│ 4. Verification       → compare all docs vs code, fix errors     │
│    [BLOCKING: user reviews corrections]                          │
│    [SESSION BREAK suggested before Steps 57]                    │
│    ── Focus Area mode stops here ──                              │
│ 5. Solution Extraction → retrospective solution.md               │
│ 6. Problem Extraction → retrospective problem, restrictions, AC  │
│    [BLOCKING: user confirms problem docs]                        │
│ 7. Final Report       → FINAL_report.md                          │
├──────────────────────────────────────────────────────────────────┤
│ Principles: Bottom-up always · Dependencies first                │
│             Incremental context · Verify against code            │
│             Save immediately · Resume from checkpoint            │
│             Batch modules · Session breaks for large codebases   │
└──────────────────────────────────────────────────────────────────┘