mirror of https://github.com/azaion/detections.git synced 2026-04-22 22:06:32 +00:00

Files

T

Oleksandr Bezdieniezhnykh 22dd5db0d5 Split document skill into hub + workflow files (601 → 71/376/90/70 lines)

- SKILL.md: hub with frontmatter, principles, mode detection, file index (71 lines)
- workflows/full.md: Full/Focus/Resume modes — Steps 0-7 (376 lines)
- workflows/task.md: Task mode — incremental doc updates (90 lines)
- references/artifacts.md: directory structure, state.json, resumability (70 lines)

Made-with: Cursor

2026-03-31 17:32:24 +03:00

20 KiB

Raw Blame History

Document Skill — Full / Focus Area / Resume Workflow

Covers three related modes that share the same 8-step pipeline:

Full: entire codebase, no prior state
Focus Area: scoped to a directory subtree + transitive dependencies
Resume: continue from state.json checkpoint

Prerequisite Checks

If _docs/ already exists and contains files AND mode is Full, ASK user: overwrite, merge, or write to _docs_generated/ instead?
Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
If DOCUMENT_DIR contains a state.json, offer to resume from last checkpoint or start fresh
If FOCUS_DIR is set, verify the directory exists and contains source files — STOP if missing

Progress Tracking

Create a TodoWrite with all steps (0 through 7). Update status as each step completes.

Steps

Step 0: Codebase Discovery

Role: Code analyst Goal: Build a complete map of the codebase (or targeted subtree) before analyzing any code.

Focus Area scoping: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.

Scan and catalog:

Directory tree (ignore node_modules, .git, __pycache__, bin/, obj/, build artifacts)
Language detection from file extensions and config files
Package manifests: package.json, requirements.txt, pyproject.toml, *.csproj, Cargo.toml, go.mod
Config files: Dockerfile, docker-compose.yml, .env.example, CI/CD configs (.github/workflows/, .gitlab-ci.yml, azure-pipelines.yml)
Entry points: main.*, app.*, index.*, Program.*, startup scripts
Test structure: test directories, test frameworks, test runner configs
Existing documentation: README, docs/, wiki references, inline doc coverage
Dependency graph: build a module-level dependency graph by analyzing imports/references. Identify:
- Leaf modules (no internal dependencies)
- Entry points (no internal dependents)
- Cycles (mark for grouped analysis)
- Topological processing order
- If FOCUS_DIR: mark which modules are in-scope vs dependency-only

Save: DOCUMENT_DIR/00_discovery.md containing:

Directory tree (concise, relevant directories only)
Tech stack summary table (language, framework, database, infra)
Dependency graph (textual list + Mermaid diagram)
Topological processing order
Entry points and leaf modules

Save: DOCUMENT_DIR/state.json with initial state (see references/artifacts.md for format).

Step 1: Module-Level Documentation

Role: Code analyst Goal: Document every identified module individually, processing in topological order (leaves first).

Batched processing: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update state.json, present a progress summary. Between batches, evaluate whether to suggest a session break.

For each module in topological order:

Read: read the module's source code. Assess complexity and what context is needed.
Gather context: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
Write module doc with these sections:
- Purpose: one-sentence responsibility
- Public interface: exported functions/classes/methods with signatures, input/output types
- Internal logic: key algorithms, patterns, non-obvious behavior
- Dependencies: what it imports internally and why
- Consumers: what uses this module (from the dependency graph)
- Data models: entities/types defined in this module
- Configuration: env vars, config keys consumed
- External integrations: HTTP calls, DB queries, queue operations, file I/O
- Security: auth checks, encryption, input validation, secrets access
- Tests: what tests exist for this module, what they cover
Verify: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.

Cycle handling: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.

Large modules: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.

Save: DOCUMENT_DIR/modules/[module_name].md for each module. State: update state.json after each module completes (move from modules_remaining to modules_documented). Increment module_batch after each batch of ~5.

Session break heuristic: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:

══════════════════════════════════════
 SESSION BREAK SUGGESTED
══════════════════════════════════════
 Modules documented: [X] of [Y]
 Batches completed this session: [N]
══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a fresh conversation (recommended)
══════════════════════════════════════
 Recommendation: B — fresh context improves
 analysis quality for remaining modules
══════════════════════════════════════

Re-entry is seamless: state.json tracks exactly which modules are done.

Step 2: Component Assembly

Role: Software architect Goal: Group related modules into logical components and produce component specs.

Analyze module docs from Step 1 to identify natural groupings:
- By directory structure (most common)
- By shared data models or common purpose
- By dependency clusters (tightly coupled modules)
For each identified component, synthesize its module docs into a single component specification using .cursor/skills/plan/templates/component-spec.md as structure:
- High-level overview: purpose, pattern, upstream/downstream
- Internal interfaces: method signatures, DTOs (from actual module code)
- External API specification (if the component exposes HTTP/gRPC endpoints)
- Data access patterns: queries, caching, storage estimates
- Implementation details: algorithmic complexity, state management, key libraries
- Extensions and helpers: shared utilities needed
- Caveats and edge cases: limitations, race conditions, bottlenecks
- Dependency graph: implementation order relative to other components
- Logging strategy
Identify common helpers shared across multiple components → document in common-helpers/
Generate component relationship diagram (Mermaid)

Self-verification:

Every module from Step 1 is covered by exactly one component
No component has overlapping responsibility with another
Inter-component interfaces are explicit (who calls whom, with what)
Component dependency graph has no circular dependencies

Save:

DOCUMENT_DIR/components/[##]_[name]/description.md per component
DOCUMENT_DIR/common-helpers/[##]_helper_[name].md per shared helper
DOCUMENT_DIR/diagrams/components.md (Mermaid component diagram)

BLOCKING: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.

Step 3: System-Level Synthesis

Role: Software architect Goal: From component docs, synthesize system-level documents.

All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.

3a. Architecture

Using .cursor/skills/plan/templates/architecture.md as structure:

System context and boundaries from entry points and external integrations
Tech stack table from discovery (Step 0) + component specs
Deployment model from Dockerfiles, CI configs, environment strategies
Data model overview from per-component data access sections
Integration points from inter-component interfaces
NFRs from test thresholds, config limits, health checks
Security architecture from per-module security observations
Key ADRs inferred from technology choices and patterns

Save: DOCUMENT_DIR/architecture.md

3b. System Flows

Using .cursor/skills/plan/templates/system-flows.md as structure:

Trace main flows through the component interaction graph
Entry point → component chain → output for each major flow
Mermaid sequence diagrams and flowcharts
Error scenarios from exception handling patterns
Data flow tables per flow

Save: DOCUMENT_DIR/system-flows.md and DOCUMENT_DIR/diagrams/flows/flow_[name].md

3c. Data Model

Consolidate all data models from module docs
Entity-relationship diagram (Mermaid ERD)
Migration strategy (if ORM/migration tooling detected)
Seed data observations
Backward compatibility approach (if versioning found)

Save: DOCUMENT_DIR/data_model.md

3d. Deployment (if Dockerfile/CI configs exist)

Containerization summary
CI/CD pipeline structure
Environment strategy (dev, staging, production)
Observability (logging patterns, metrics, health checks found in code)

Save: DOCUMENT_DIR/deployment/ (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)

Step 4: Verification Pass

Role: Quality verifier Goal: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.

For each document generated in Steps 1-3:

Entity verification: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
Interface accuracy: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
Flow correctness: for each system flow diagram, trace the actual code path and verify the sequence matches.
Completeness check: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
Consistency check: do component docs agree with architecture doc? Do flow diagrams match component interfaces?

Apply corrections inline to the documents that need them.

Save: DOCUMENT_DIR/04_verification_log.md with:

Total entities verified vs flagged
Corrections applied (which document, what changed)
Remaining gaps or uncertainties
Completeness score (modules covered / total modules)

BLOCKING: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.

Session boundary: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:

══════════════════════════════════════
 VERIFICATION COMPLETE — session break?
══════════════════════════════════════
 Steps 0–4 (analysis + verification) are done.
 Steps 5–7 (solution + problem extraction + report)
 can run in a fresh conversation.
══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a new conversation (recommended)
══════════════════════════════════════

If Focus Area mode: Steps 5–7 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run /document again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.

Step 5: Solution Extraction (Retrospective)

Role: Software architect Goal: From all verified technical documentation, retrospectively create solution.md — the same artifact the research skill produces.

Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):

Product Solution Description: what the system is, brief component interaction diagram (Mermaid)
Architecture: the architecture that is implemented, with per-component solution tables:

Solution	Tools	Advantages	Limitations	Requirements	Security	Cost	Fit
[actual implementation]	[libs/platforms used]	[observed strengths]	[observed limitations]	[requirements met]	[security approach]	[cost indicators]	[fitness assessment]

Testing Strategy: summarize integration/functional tests and non-functional tests found in the codebase
References: links to key config files, Dockerfiles, CI configs that evidence the solution choices

Save: SOLUTION_DIR/solution.md (_docs/01_solution/solution.md)

Step 6: Problem Extraction (Retrospective)

Role: Business analyst Goal: From all verified technical docs, retrospectively derive the high-level problem definition.

6a. `problem.md`

Synthesize from architecture overview + component purposes + system flows
What is this system? What problem does it solve? Who are the users? How does it work at a high level?
Cross-reference with README if one exists

6b. `restrictions.md`

Extract from: tech stack choices, Dockerfile specs, CI configs, dependency versions, environment configs
Categorize: Hardware, Software, Environment, Operational

6c. `acceptance_criteria.md`

Derive from: test assertions, performance configs, health check endpoints, validation rules
Every criterion must have a measurable value

6d. `input_data/`

Document data schemas (DB schemas, API request/response types, config file formats)
Create data_parameters.md describing what data the system consumes

6e. `security_approach.md` (only if security code found)

Authentication, authorization, encryption, secrets handling, CORS, rate limiting, input sanitization

Save: all files to PROBLEM_DIR/ (_docs/00_problem/)

BLOCKING: Present all problem documents to user. Do NOT proceed until user confirms or requests corrections.

Step 7: Final Report

Role: Technical writer Goal: Produce FINAL_report.md integrating all generated documentation.

Using .cursor/skills/plan/templates/final-report.md as structure:

Executive summary from architecture + problem docs
Problem statement (transformed from problem.md, not copy-pasted)
Architecture overview with tech stack one-liner
Component summary table (number, name, purpose, dependencies)
System flows summary table
Risk observations from verification log (Step 4)
Open questions (uncertainties flagged during analysis)
Artifact index listing all generated documents with paths

Save: DOCUMENT_DIR/FINAL_report.md

State: update state.json with current_step: "complete".

Escalation Rules

Situation	Action
Minified/obfuscated code detected	WARN user, skip module, note in verification log
Module too large for context window	Split into sub-sections, analyze parts separately, combine
Cycle in dependency graph	Group cycled modules, analyze together as one doc
Generated code (protobuf, swagger-gen)	Note as generated, document the source spec instead
No tests found in codebase	Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only
Contradictions between code and README	Flag in verification log, ASK user
Binary files or non-code assets	Skip, note in discovery
`_docs/` already exists	ASK user: overwrite, merge, or use `_docs_generated/`
Code intent is ambiguous	ASK user, do not guess

Common Mistakes

Top-down guessing: never infer architecture before documenting modules. Build up, don't assume down.
Hallucinating entities: always verify that referenced classes/functions/endpoints actually exist in code.
Skipping modules: every source module must appear in exactly one module doc and one component.
Monolithic analysis: don't try to analyze the entire codebase in one pass. Module by module, in order.
Inventing restrictions: only document constraints actually evidenced in code, configs, or Dockerfiles.
Vague acceptance criteria: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
Writing code: this skill produces documents, never implementation code.

Quick Reference

┌──────────────────────────────────────────────────────────────────┐
│          Bottom-Up Codebase Documentation (8-Step)               │
├──────────────────────────────────────────────────────────────────┤
│ MODE: Full / Focus Area (@dir) / Resume (state.json)             │
│ PREREQ: Check _docs/ exists (overwrite/merge/new?)               │
│ PREREQ: Check state.json for resume                              │
│                                                                  │
│ 0. Discovery          → dependency graph, tech stack, topo order │
│    (Focus Area: scoped to FOCUS_DIR + transitive deps)           │
│ 1. Module Docs        → per-module analysis (leaves first)       │
│    (batched ~5 modules; session break between batches)           │
│ 2. Component Assembly → group modules, write component specs     │
│    [BLOCKING: user confirms components]                          │
│ 3. System Synthesis   → architecture, flows, data model, deploy  │
│ 4. Verification       → compare all docs vs code, fix errors     │
│    [BLOCKING: user reviews corrections]                          │
│    [SESSION BREAK suggested before Steps 5–7]                    │
│    ── Focus Area mode stops here ──                              │
│ 5. Solution Extraction → retrospective solution.md               │
│ 6. Problem Extraction → retrospective problem, restrictions, AC  │
│    [BLOCKING: user confirms problem docs]                        │
│ 7. Final Report       → FINAL_report.md                          │
├──────────────────────────────────────────────────────────────────┤
│ Principles: Bottom-up always · Dependencies first                │
│             Incremental context · Verify against code            │
│             Save immediately · Resume from checkpoint            │
│             Batch modules · Session breaks for large codebases   │
└──────────────────────────────────────────────────────────────────┘

20 KiB Raw Blame History Unescape Escape