Files
ui/.cursor/skills/document/workflows/full.md
T
Oleksandr Bezdieniezhnykh f5a46e9790 Sync .cursor from detections
2026-04-12 05:05:12 +03:00

20 KiB
Raw Blame History

Document Skill — Full / Focus Area / Resume Workflow

Covers three related modes that share the same 8-step pipeline:

  • Full: entire codebase, no prior state
  • Focus Area: scoped to a directory subtree + transitive dependencies
  • Resume: continue from state.json checkpoint

Prerequisite Checks

  1. If _docs/ already exists and contains files AND mode is Full, ASK user: overwrite, merge, or write to _docs_generated/ instead?
  2. Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
  3. If DOCUMENT_DIR contains a state.json, offer to resume from last checkpoint or start fresh
  4. If FOCUS_DIR is set, verify the directory exists and contains source files — STOP if missing

Progress Tracking

Create a TodoWrite with all steps (0 through 7). Update status as each step completes.

Steps

Step 0: Codebase Discovery

Role: Code analyst Goal: Build a complete map of the codebase (or targeted subtree) before analyzing any code.

Focus Area scoping: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.

Scan and catalog:

  1. Directory tree (ignore node_modules, .git, __pycache__, bin/, obj/, build artifacts)
  2. Language detection from file extensions and config files
  3. Package manifests: package.json, requirements.txt, pyproject.toml, *.csproj, Cargo.toml, go.mod
  4. Config files: Dockerfile, docker-compose.yml, .env.example, CI/CD configs (.github/workflows/, .gitlab-ci.yml, azure-pipelines.yml)
  5. Entry points: main.*, app.*, index.*, Program.*, startup scripts
  6. Test structure: test directories, test frameworks, test runner configs
  7. Existing documentation: README, docs/, wiki references, inline doc coverage
  8. Dependency graph: build a module-level dependency graph by analyzing imports/references. Identify:
    • Leaf modules (no internal dependencies)
    • Entry points (no internal dependents)
    • Cycles (mark for grouped analysis)
    • Topological processing order
    • If FOCUS_DIR: mark which modules are in-scope vs dependency-only

Save: DOCUMENT_DIR/00_discovery.md containing:

  • Directory tree (concise, relevant directories only)
  • Tech stack summary table (language, framework, database, infra)
  • Dependency graph (textual list + Mermaid diagram)
  • Topological processing order
  • Entry points and leaf modules

Save: DOCUMENT_DIR/state.json with initial state (see references/artifacts.md for format).


Step 1: Module-Level Documentation

Role: Code analyst Goal: Document every identified module individually, processing in topological order (leaves first).

Batched processing: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update state.json, present a progress summary. Between batches, evaluate whether to suggest a session break.

For each module in topological order:

  1. Read: read the module's source code. Assess complexity and what context is needed.
  2. Gather context: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
  3. Write module doc with these sections:
    • Purpose: one-sentence responsibility
    • Public interface: exported functions/classes/methods with signatures, input/output types
    • Internal logic: key algorithms, patterns, non-obvious behavior
    • Dependencies: what it imports internally and why
    • Consumers: what uses this module (from the dependency graph)
    • Data models: entities/types defined in this module
    • Configuration: env vars, config keys consumed
    • External integrations: HTTP calls, DB queries, queue operations, file I/O
    • Security: auth checks, encryption, input validation, secrets access
    • Tests: what tests exist for this module, what they cover
  4. Verify: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.

Cycle handling: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.

Large modules: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.

Save: DOCUMENT_DIR/modules/[module_name].md for each module. State: update state.json after each module completes (move from modules_remaining to modules_documented). Increment module_batch after each batch of ~5.

Session break heuristic: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:

══════════════════════════════════════
 SESSION BREAK SUGGESTED
══════════════════════════════════════
 Modules documented: [X] of [Y]
 Batches completed this session: [N]
══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a fresh conversation (recommended)
══════════════════════════════════════
 Recommendation: B — fresh context improves
 analysis quality for remaining modules
══════════════════════════════════════

Re-entry is seamless: state.json tracks exactly which modules are done.


Step 2: Component Assembly

Role: Software architect Goal: Group related modules into logical components and produce component specs.

  1. Analyze module docs from Step 1 to identify natural groupings:
    • By directory structure (most common)
    • By shared data models or common purpose
    • By dependency clusters (tightly coupled modules)
  2. For each identified component, synthesize its module docs into a single component specification using .cursor/skills/plan/templates/component-spec.md as structure:
    • High-level overview: purpose, pattern, upstream/downstream
    • Internal interfaces: method signatures, DTOs (from actual module code)
    • External API specification (if the component exposes HTTP/gRPC endpoints)
    • Data access patterns: queries, caching, storage estimates
    • Implementation details: algorithmic complexity, state management, key libraries
    • Extensions and helpers: shared utilities needed
    • Caveats and edge cases: limitations, race conditions, bottlenecks
    • Dependency graph: implementation order relative to other components
    • Logging strategy
  3. Identify common helpers shared across multiple components → document in common-helpers/
  4. Generate component relationship diagram (Mermaid)

Self-verification:

  • Every module from Step 1 is covered by exactly one component
  • No component has overlapping responsibility with another
  • Inter-component interfaces are explicit (who calls whom, with what)
  • Component dependency graph has no circular dependencies

Save:

  • DOCUMENT_DIR/components/[##]_[name]/description.md per component
  • DOCUMENT_DIR/common-helpers/[##]_helper_[name].md per shared helper
  • DOCUMENT_DIR/diagrams/components.md (Mermaid component diagram)

BLOCKING: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.


Step 3: System-Level Synthesis

Role: Software architect Goal: From component docs, synthesize system-level documents.

All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.

3a. Architecture

Using .cursor/skills/plan/templates/architecture.md as structure:

  • System context and boundaries from entry points and external integrations
  • Tech stack table from discovery (Step 0) + component specs
  • Deployment model from Dockerfiles, CI configs, environment strategies
  • Data model overview from per-component data access sections
  • Integration points from inter-component interfaces
  • NFRs from test thresholds, config limits, health checks
  • Security architecture from per-module security observations
  • Key ADRs inferred from technology choices and patterns

Save: DOCUMENT_DIR/architecture.md

3b. System Flows

Using .cursor/skills/plan/templates/system-flows.md as structure:

  • Trace main flows through the component interaction graph
  • Entry point → component chain → output for each major flow
  • Mermaid sequence diagrams and flowcharts
  • Error scenarios from exception handling patterns
  • Data flow tables per flow

Save: DOCUMENT_DIR/system-flows.md and DOCUMENT_DIR/diagrams/flows/flow_[name].md

3c. Data Model

  • Consolidate all data models from module docs
  • Entity-relationship diagram (Mermaid ERD)
  • Migration strategy (if ORM/migration tooling detected)
  • Seed data observations
  • Backward compatibility approach (if versioning found)

Save: DOCUMENT_DIR/data_model.md

3d. Deployment (if Dockerfile/CI configs exist)

  • Containerization summary
  • CI/CD pipeline structure
  • Environment strategy (dev, staging, production)
  • Observability (logging patterns, metrics, health checks found in code)

Save: DOCUMENT_DIR/deployment/ (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)


Step 4: Verification Pass

Role: Quality verifier Goal: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.

For each document generated in Steps 1-3:

  1. Entity verification: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
  2. Interface accuracy: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
  3. Flow correctness: for each system flow diagram, trace the actual code path and verify the sequence matches.
  4. Completeness check: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
  5. Consistency check: do component docs agree with architecture doc? Do flow diagrams match component interfaces?

Apply corrections inline to the documents that need them.

Save: DOCUMENT_DIR/04_verification_log.md with:

  • Total entities verified vs flagged
  • Corrections applied (which document, what changed)
  • Remaining gaps or uncertainties
  • Completeness score (modules covered / total modules)

BLOCKING: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.

Session boundary: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (57). These steps produce different artifact types and benefit from fresh context:

══════════════════════════════════════
 VERIFICATION COMPLETE — session break?
══════════════════════════════════════
 Steps 04 (analysis + verification) are done.
 Steps 57 (solution + problem extraction + report)
 can run in a fresh conversation.
══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a new conversation (recommended)
══════════════════════════════════════

If Focus Area mode: Steps 57 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run /document again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.


Step 5: Solution Extraction (Retrospective)

Role: Software architect Goal: From all verified technical documentation, retrospectively create solution.md — the same artifact the research skill produces.

Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):

  1. Product Solution Description: what the system is, brief component interaction diagram (Mermaid)
  2. Architecture: the architecture that is implemented, with per-component solution tables:
Solution Tools Advantages Limitations Requirements Security Cost Fit
[actual implementation] [libs/platforms used] [observed strengths] [observed limitations] [requirements met] [security approach] [cost indicators] [fitness assessment]
  1. Testing Strategy: summarize integration/functional tests and non-functional tests found in the codebase
  2. References: links to key config files, Dockerfiles, CI configs that evidence the solution choices

Save: SOLUTION_DIR/solution.md (_docs/01_solution/solution.md)


Step 6: Problem Extraction (Retrospective)

Role: Business analyst Goal: From all verified technical docs, retrospectively derive the high-level problem definition.

6a. problem.md

  • Synthesize from architecture overview + component purposes + system flows
  • What is this system? What problem does it solve? Who are the users? How does it work at a high level?
  • Cross-reference with README if one exists

6b. restrictions.md

  • Extract from: tech stack choices, Dockerfile specs, CI configs, dependency versions, environment configs
  • Categorize: Hardware, Software, Environment, Operational

6c. acceptance_criteria.md

  • Derive from: test assertions, performance configs, health check endpoints, validation rules
  • Every criterion must have a measurable value

6d. input_data/

  • Document data schemas (DB schemas, API request/response types, config file formats)
  • Create data_parameters.md describing what data the system consumes

6e. security_approach.md (only if security code found)

  • Authentication, authorization, encryption, secrets handling, CORS, rate limiting, input sanitization

Save: all files to PROBLEM_DIR/ (_docs/00_problem/)

BLOCKING: Present all problem documents to user. Do NOT proceed until user confirms or requests corrections.


Step 7: Final Report

Role: Technical writer Goal: Produce FINAL_report.md integrating all generated documentation.

Using .cursor/skills/plan/templates/final-report.md as structure:

  • Executive summary from architecture + problem docs
  • Problem statement (transformed from problem.md, not copy-pasted)
  • Architecture overview with tech stack one-liner
  • Component summary table (number, name, purpose, dependencies)
  • System flows summary table
  • Risk observations from verification log (Step 4)
  • Open questions (uncertainties flagged during analysis)
  • Artifact index listing all generated documents with paths

Save: DOCUMENT_DIR/FINAL_report.md

State: update state.json with current_step: "complete".


Escalation Rules

Situation Action
Minified/obfuscated code detected WARN user, skip module, note in verification log
Module too large for context window Split into sub-sections, analyze parts separately, combine
Cycle in dependency graph Group cycled modules, analyze together as one doc
Generated code (protobuf, swagger-gen) Note as generated, document the source spec instead
No tests found in codebase Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only
Contradictions between code and README Flag in verification log, ASK user
Binary files or non-code assets Skip, note in discovery
_docs/ already exists ASK user: overwrite, merge, or use _docs_generated/
Code intent is ambiguous ASK user, do not guess

Common Mistakes

  • Top-down guessing: never infer architecture before documenting modules. Build up, don't assume down.
  • Hallucinating entities: always verify that referenced classes/functions/endpoints actually exist in code.
  • Skipping modules: every source module must appear in exactly one module doc and one component.
  • Monolithic analysis: don't try to analyze the entire codebase in one pass. Module by module, in order.
  • Inventing restrictions: only document constraints actually evidenced in code, configs, or Dockerfiles.
  • Vague acceptance criteria: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
  • Writing code: this skill produces documents, never implementation code.

Quick Reference

┌──────────────────────────────────────────────────────────────────┐
│          Bottom-Up Codebase Documentation (8-Step)               │
├──────────────────────────────────────────────────────────────────┤
│ MODE: Full / Focus Area (@dir) / Resume (state.json)             │
│ PREREQ: Check _docs/ exists (overwrite/merge/new?)               │
│ PREREQ: Check state.json for resume                              │
│                                                                  │
│ 0. Discovery          → dependency graph, tech stack, topo order │
│    (Focus Area: scoped to FOCUS_DIR + transitive deps)           │
│ 1. Module Docs        → per-module analysis (leaves first)       │
│    (batched ~5 modules; session break between batches)           │
│ 2. Component Assembly → group modules, write component specs     │
│    [BLOCKING: user confirms components]                          │
│ 3. System Synthesis   → architecture, flows, data model, deploy  │
│ 4. Verification       → compare all docs vs code, fix errors     │
│    [BLOCKING: user reviews corrections]                          │
│    [SESSION BREAK suggested before Steps 57]                    │
│    ── Focus Area mode stops here ──                              │
│ 5. Solution Extraction → retrospective solution.md               │
│ 6. Problem Extraction → retrospective problem, restrictions, AC  │
│    [BLOCKING: user confirms problem docs]                        │
│ 7. Final Report       → FINAL_report.md                          │
├──────────────────────────────────────────────────────────────────┤
│ Principles: Bottom-up always · Dependencies first                │
│             Incremental context · Verify against code            │
│             Save immediately · Resume from checkpoint            │
│             Batch modules · Session breaks for large codebases   │
└──────────────────────────────────────────────────────────────────┘