mirror of https://github.com/azaion/detections.git synced 2026-04-22 07:06:32 +00:00

Files

T

Oleksandr Bezdieniezhnykh 1fe9425aa8 [AZ-172] Update documentation for distributed architecture, add Update Docs step to workflow

- Update module docs: main, inference, ai_config, loader_http_client
- Add new module doc: media_hash
- Update component docs: inference_pipeline, api
- Update system-flows (F2, F3) and data_parameters
- Add Task Mode to document skill for incremental doc updates
- Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14

Made-with: Cursor

2026-03-31 17:25:58 +03:00

30 KiB

Raw Blame History

name, description, category, tags, disable-model-invocation

name

description

Bottom-Up Codebase Documentation

Analyze an existing codebase from the bottom up — individual modules first, then components, then system-level architecture — and produce the same _docs/ artifacts that the problem and plan skills generate, without requiring user interview.

Core Principles

Bottom-up always: module docs -> component specs -> architecture/flows -> solution -> problem extraction. Every higher level is synthesized from the level below.
Dependencies first: process modules in topological order (leaves first). When documenting module X, all of X's dependencies already have docs.
Incremental context: each module's doc uses already-written dependency docs as context — no ever-growing chain.
Verify against code: cross-reference every entity in generated docs against actual codebase. Catch hallucinations.
Save immediately: write each artifact as soon as its step completes. Enable resume from any checkpoint.
Ask, don't assume: when code intent is ambiguous, ASK the user before proceeding.

Context Resolution

Fixed paths:

DOCUMENT_DIR: _docs/02_document/
SOLUTION_DIR: _docs/01_solution/
PROBLEM_DIR: _docs/00_problem/

Optional input:

FOCUS_DIR: a specific directory subtree provided by the user (e.g., /document @src/api/). When set, only this subtree and its transitive dependencies are analyzed.

Announce resolved paths (and FOCUS_DIR if set) to user before proceeding.

Mode Detection

Determine the execution mode before any other logic:

Mode	Trigger	Scope
Full	No input file, no existing state	Entire codebase
Focus Area	User provides a directory path (e.g., `@src/api/`)	Only the specified subtree + transitive dependencies
Resume	`state.json` exists in DOCUMENT_DIR	Continue from last checkpoint
Task	User provides a task spec file (e.g., `@_docs/02_tasks/done/AZ-173_*.md`) AND `_docs/02_document/` has existing docs	Targeted update of docs affected by the task

Focus Area mode produces module + component docs for the targeted area only. It can be run repeatedly for different areas — each run appends to the existing module and component docs without overwriting other areas.

Task mode is a lightweight, incremental update — it reads the task spec to identify which files changed, then updates only the affected module docs, component docs, system flows, and data parameters. It does NOT redo discovery, verification, or problem extraction. See Task Mode Workflow below.

Prerequisite Checks

If _docs/ already exists and contains files AND mode is Full, ASK user: overwrite, merge, or write to _docs_generated/ instead?
Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
If DOCUMENT_DIR contains a state.json, offer to resume from last checkpoint or start fresh
If FOCUS_DIR is set, verify the directory exists and contains source files — STOP if missing

Progress Tracking

Create a TodoWrite with all steps (0 through 7). Update status as each step completes.

Workflow

Step 0: Codebase Discovery

Role: Code analyst Goal: Build a complete map of the codebase (or targeted subtree) before analyzing any code.

Focus Area scoping: if FOCUS_DIR is set, limit the scan to that directory subtree. Still identify transitive dependencies outside FOCUS_DIR (modules that FOCUS_DIR imports) and include them in the processing order, but skip modules that are neither inside FOCUS_DIR nor dependencies of it.

Scan and catalog:

Directory tree (ignore node_modules, .git, __pycache__, bin/, obj/, build artifacts)
Language detection from file extensions and config files
Package manifests: package.json, requirements.txt, pyproject.toml, *.csproj, Cargo.toml, go.mod
Config files: Dockerfile, docker-compose.yml, .env.example, CI/CD configs (.github/workflows/, .gitlab-ci.yml, azure-pipelines.yml)
Entry points: main.*, app.*, index.*, Program.*, startup scripts
Test structure: test directories, test frameworks, test runner configs
Existing documentation: README, docs/, wiki references, inline doc coverage
Dependency graph: build a module-level dependency graph by analyzing imports/references. Identify:
- Leaf modules (no internal dependencies)
- Entry points (no internal dependents)
- Cycles (mark for grouped analysis)
- Topological processing order
- If FOCUS_DIR: mark which modules are in-scope vs dependency-only

Save: DOCUMENT_DIR/00_discovery.md containing:

Directory tree (concise, relevant directories only)
Tech stack summary table (language, framework, database, infra)
Dependency graph (textual list + Mermaid diagram)
Topological processing order
Entry points and leaf modules

Save: DOCUMENT_DIR/state.json with initial state:

{
  "current_step": "module-analysis",
  "completed_steps": ["discovery"],
  "focus_dir": null,
  "modules_total": 0,
  "modules_documented": [],
  "modules_remaining": [],
  "module_batch": 0,
  "components_written": [],
  "last_updated": ""
}

Set focus_dir to the FOCUS_DIR path if in Focus Area mode, or null for Full mode.

Step 1: Module-Level Documentation

Role: Code analyst Goal: Document every identified module individually, processing in topological order (leaves first).

Batched processing: process modules in batches of ~5 (sorted by topological order). After each batch: save all module docs, update state.json, present a progress summary. Between batches, evaluate whether to suggest a session break.

For each module in topological order:

Read: read the module's source code. Assess complexity and what context is needed.
Gather context: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
Write module doc with these sections:
- Purpose: one-sentence responsibility
- Public interface: exported functions/classes/methods with signatures, input/output types
- Internal logic: key algorithms, patterns, non-obvious behavior
- Dependencies: what it imports internally and why
- Consumers: what uses this module (from the dependency graph)
- Data models: entities/types defined in this module
- Configuration: env vars, config keys consumed
- External integrations: HTTP calls, DB queries, queue operations, file I/O
- Security: auth checks, encryption, input validation, secrets access
- Tests: what tests exist for this module, what they cover
Verify: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.

Cycle handling: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.

Large modules: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.

Save: DOCUMENT_DIR/modules/[module_name].md for each module. State: update state.json after each module completes (move from modules_remaining to modules_documented). Increment module_batch after each batch of ~5.

Session break heuristic: after each batch, if more than 10 modules remain AND 2+ batches have already completed in this session, suggest a session break:

══════════════════════════════════════
 SESSION BREAK SUGGESTED
══════════════════════════════════════
 Modules documented: [X] of [Y]
 Batches completed this session: [N]
══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a fresh conversation (recommended)
══════════════════════════════════════
 Recommendation: B — fresh context improves
 analysis quality for remaining modules
══════════════════════════════════════

Re-entry is seamless: state.json tracks exactly which modules are done.

Step 2: Component Assembly

Role: Software architect Goal: Group related modules into logical components and produce component specs.

Analyze module docs from Step 1 to identify natural groupings:
- By directory structure (most common)
- By shared data models or common purpose
- By dependency clusters (tightly coupled modules)
For each identified component, synthesize its module docs into a single component specification using .cursor/skills/plan/templates/component-spec.md as structure:
- High-level overview: purpose, pattern, upstream/downstream
- Internal interfaces: method signatures, DTOs (from actual module code)
- External API specification (if the component exposes HTTP/gRPC endpoints)
- Data access patterns: queries, caching, storage estimates
- Implementation details: algorithmic complexity, state management, key libraries
- Extensions and helpers: shared utilities needed
- Caveats and edge cases: limitations, race conditions, bottlenecks
- Dependency graph: implementation order relative to other components
- Logging strategy
Identify common helpers shared across multiple components -> document in common-helpers/
Generate component relationship diagram (Mermaid)

Self-verification:

Every module from Step 1 is covered by exactly one component
No component has overlapping responsibility with another
Inter-component interfaces are explicit (who calls whom, with what)
Component dependency graph has no circular dependencies

Save:

DOCUMENT_DIR/components/[##]_[name]/description.md per component
DOCUMENT_DIR/common-helpers/[##]_helper_[name].md per shared helper
DOCUMENT_DIR/diagrams/components.md (Mermaid component diagram)

BLOCKING: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.

Step 3: System-Level Synthesis

Role: Software architect Goal: From component docs, synthesize system-level documents.

All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.

3a. Architecture

Using .cursor/skills/plan/templates/architecture.md as structure:

System context and boundaries from entry points and external integrations
Tech stack table from discovery (Step 0) + component specs
Deployment model from Dockerfiles, CI configs, environment strategies
Data model overview from per-component data access sections
Integration points from inter-component interfaces
NFRs from test thresholds, config limits, health checks
Security architecture from per-module security observations
Key ADRs inferred from technology choices and patterns

Save: DOCUMENT_DIR/architecture.md

3b. System Flows

Using .cursor/skills/plan/templates/system-flows.md as structure:

Trace main flows through the component interaction graph
Entry point -> component chain -> output for each major flow
Mermaid sequence diagrams and flowcharts
Error scenarios from exception handling patterns
Data flow tables per flow

Save: DOCUMENT_DIR/system-flows.md and DOCUMENT_DIR/diagrams/flows/flow_[name].md

3c. Data Model

Consolidate all data models from module docs
Entity-relationship diagram (Mermaid ERD)
Migration strategy (if ORM/migration tooling detected)
Seed data observations
Backward compatibility approach (if versioning found)

Save: DOCUMENT_DIR/data_model.md

3d. Deployment (if Dockerfile/CI configs exist)

Containerization summary
CI/CD pipeline structure
Environment strategy (dev, staging, production)
Observability (logging patterns, metrics, health checks found in code)

Save: DOCUMENT_DIR/deployment/ (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)

Step 4: Verification Pass

Role: Quality verifier Goal: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.

For each document generated in Steps 1-3:

Entity verification: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
Interface accuracy: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
Flow correctness: for each system flow diagram, trace the actual code path and verify the sequence matches.
Completeness check: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
Consistency check: do component docs agree with architecture doc? Do flow diagrams match component interfaces?

Apply corrections inline to the documents that need them.

Save: DOCUMENT_DIR/04_verification_log.md with:

Total entities verified vs flagged
Corrections applied (which document, what changed)
Remaining gaps or uncertainties
Completeness score (modules covered / total modules)

BLOCKING: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.

Session boundary: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:

══════════════════════════════════════
 VERIFICATION COMPLETE — session break?
══════════════════════════════════════
 Steps 0–4 (analysis + verification) are done.
 Steps 5–7 (solution + problem extraction + report)
 can run in a fresh conversation.
══════════════════════════════════════
 A) Continue in this conversation
 B) Save and continue in a new conversation (recommended)
══════════════════════════════════════

If Focus Area mode: Steps 5–7 are skipped (they require full codebase coverage). Present a summary of modules and components documented for this area. The user can run /document again for another area, or run without FOCUS_DIR once all areas are covered to produce the full synthesis.

Step 5: Solution Extraction (Retrospective)

Role: Software architect Goal: From all verified technical documentation, retrospectively create solution.md — the same artifact the research skill produces. This makes downstream skills (plan, deploy, decompose) compatible with the documented codebase.

Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):

Product Solution Description: what the system is, brief component interaction diagram (Mermaid)
Architecture: the architecture that is implemented, with per-component solution tables:

Solution	Tools	Advantages	Limitations	Requirements	Security	Cost	Fit
[actual implementation]	[libs/platforms used]	[observed strengths]	[observed limitations]	[requirements met]	[security approach]	[cost indicators]	[fitness assessment]

Testing Strategy: summarize integration/functional tests and non-functional tests found in the codebase
References: links to key config files, Dockerfiles, CI configs that evidence the solution choices

Save: SOLUTION_DIR/solution.md (_docs/01_solution/solution.md)

Step 6: Problem Extraction (Retrospective)

Role: Business analyst Goal: From all verified technical docs, retrospectively derive the high-level problem definition — producing the same documents the problem skill creates through interview.

This is the inverse of normal workflow: instead of problem -> solution -> code, we go code -> technical docs -> problem understanding.

6a. `problem.md`

Synthesize from architecture overview + component purposes + system flows
What is this system? What problem does it solve? Who are the users? How does it work at a high level?
Cross-reference with README if one exists
Free-form text, concise, readable by someone unfamiliar with the project

6b. `restrictions.md`

Extract from: tech stack choices, Dockerfile specs (OS, base images), CI configs (platform constraints), dependency versions, environment configs
Categorize with headers: Hardware, Software, Environment, Operational
Each restriction should be specific and testable

6c. `acceptance_criteria.md`

Derive from: test assertions (expected values, thresholds), performance configs (timeouts, rate limits, batch sizes), health check endpoints, validation rules in code
Categorize with headers by domain
Every criterion must have a measurable value — if only implied, note the source

6d. `input_data/`

Document data schemas found (DB schemas, API request/response types, config file formats)
Create data_parameters.md describing what data the system consumes, formats, volumes, update patterns

6e. `security_approach.md` (only if security code found)

Authentication mechanisms, authorization patterns, encryption, secrets handling, CORS, rate limiting, input sanitization — all from code observations
If no security-relevant code found, skip this file

Save: all files to PROBLEM_DIR/ (_docs/00_problem/)

BLOCKING: Present all problem documents to user. These are the most abstracted and therefore most prone to interpretation error. Do NOT proceed until user confirms or requests corrections.

Step 7: Final Report

Role: Technical writer Goal: Produce FINAL_report.md integrating all generated documentation.

Using .cursor/skills/plan/templates/final-report.md as structure:

Executive summary from architecture + problem docs
Problem statement (transformed from problem.md, not copy-pasted)
Architecture overview with tech stack one-liner
Component summary table (number, name, purpose, dependencies)
System flows summary table
Risk observations from verification log (Step 4)
Open questions (uncertainties flagged during analysis)
Artifact index listing all generated documents with paths

Save: DOCUMENT_DIR/FINAL_report.md

State: update state.json with current_step: "complete".

Task Mode Workflow

Triggered when a task spec file is provided as input AND _docs/02_document/ already contains module/component docs.

Accepts: one or more task spec files (from _docs/02_tasks/todo/ or _docs/02_tasks/done/).

Task Step 0: Scope Analysis

Read each task spec — extract the "Files Modified" or "Scope / Included" section to identify which source files were changed
Map changed source files to existing module docs in DOCUMENT_DIR/modules/
Map affected modules to their parent components in DOCUMENT_DIR/components/
Identify which higher-level docs might be affected (system-flows, data_model, data_parameters)

Output: a list of docs to update, organized by level:

Module docs (direct matches)
Component docs (parents of affected modules)
System-level docs (only if the task changed API endpoints, data models, or external integrations)
Problem-level docs (only if the task changed input parameters, acceptance criteria, or restrictions)

Task Step 1: Module Doc Updates

For each affected module:

Read the current source file
Read the existing module doc
Diff the module doc against current code — identify:
- New functions/methods/classes not in the doc
- Removed functions/methods/classes still in the doc
- Changed signatures or behavior
- New/removed dependencies
- New/removed external integrations
Update the module doc in-place, preserving the existing structure and style
If a module is entirely new (no existing doc), create a new module doc following the standard template

Task Step 2: Component Doc Updates

For each affected component:

Read all module docs belonging to this component (including freshly updated ones)
Read the existing component doc
Update internal interfaces, dependency graphs, implementation details, and caveats sections
Do NOT change the component's purpose, pattern, or high-level overview unless the task fundamentally changed it

Task Step 3: System-Level Doc Updates (conditional)

Only if the task changed API endpoints, system flows, data models, or external integrations:

Update system-flows.md — modify affected flow diagrams and data flow tables
Update data_model.md — if entities changed
Update architecture.md — only if new external integrations or architectural patterns were added

Task Step 4: Problem-Level Doc Updates (conditional)

Only if the task changed API input parameters, configuration, or acceptance criteria:

Update _docs/00_problem/input_data/data_parameters.md
Update _docs/00_problem/acceptance_criteria.md — if new testable criteria emerged

Task Step 5: Summary

Present a summary of all docs updated:

══════════════════════════════════════
 DOCUMENTATION UPDATE COMPLETE
══════════════════════════════════════
 Task(s): [task IDs]
 Module docs updated: [count]
 Component docs updated: [count]
 System-level docs updated: [list or "none"]
 Problem-level docs updated: [list or "none"]
══════════════════════════════════════

Task Mode Principles

Minimal changes: only update what the task actually changed. Do not rewrite unaffected sections.
Preserve style: match the existing doc's structure, tone, and level of detail.
Verify against code: for every entity added or changed in a doc, confirm it exists in the current source.
New modules: if the task introduced an entirely new source file, create a new module doc from the standard template.
Dead references: if the task removed code, remove the corresponding doc entries. Do not keep stale references.

Artifact Management

Directory Structure

_docs/
├── 00_problem/                          # Step 6 (retrospective)
│   ├── problem.md
│   ├── restrictions.md
│   ├── acceptance_criteria.md
│   ├── input_data/
│   │   └── data_parameters.md
│   └── security_approach.md
├── 01_solution/                         # Step 5 (retrospective)
│   └── solution.md
└── 02_document/                         # DOCUMENT_DIR
    ├── 00_discovery.md                  # Step 0
    ├── modules/                         # Step 1
    │   ├── [module_name].md
    │   └── ...
    ├── components/                      # Step 2
    │   ├── 01_[name]/description.md
    │   ├── 02_[name]/description.md
    │   └── ...
    ├── common-helpers/                  # Step 2
    ├── architecture.md                  # Step 3
    ├── system-flows.md                  # Step 3
    ├── data_model.md                    # Step 3
    ├── deployment/                      # Step 3
    ├── diagrams/                        # Steps 2-3
    │   ├── components.md
    │   └── flows/
    ├── 04_verification_log.md           # Step 4
    ├── FINAL_report.md                  # Step 7
    └── state.json                       # Resumability

Resumability

Maintain DOCUMENT_DIR/state.json:

{
  "current_step": "module-analysis",
  "completed_steps": ["discovery"],
  "focus_dir": null,
  "modules_total": 12,
  "modules_documented": ["utils/helpers", "models/user"],
  "modules_remaining": ["services/auth", "api/endpoints"],
  "module_batch": 1,
  "components_written": [],
  "last_updated": "2026-03-21T14:00:00Z"
}

Update after each module/component completes. If interrupted, resume from next undocumented module.

When resuming:

Read state.json
Cross-check against actual files in DOCUMENT_DIR (trust files over state if they disagree)
Continue from the next incomplete item
Inform user which steps are being skipped

Save Principles

Save immediately: write each module doc as soon as analysis completes
Incremental context: each subsequent module uses already-written docs as context
Preserve intermediates: keep all module docs even after synthesis into component docs
Enable recovery: state file tracks exact progress for resume

Escalation Rules

Situation	Action
Minified/obfuscated code detected	WARN user, skip module, note in verification log
Module too large for context window	Split into sub-sections, analyze parts separately, combine
Cycle in dependency graph	Group cycled modules, analyze together as one doc
Generated code (protobuf, swagger-gen)	Note as generated, document the source spec instead
No tests found in codebase	Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only
Contradictions between code and README	Flag in verification log, ASK user
Binary files or non-code assets	Skip, note in discovery
`_docs/` already exists	ASK user: overwrite, merge, or use `_docs_generated/`
Code intent is ambiguous	ASK user, do not guess

Common Mistakes

Top-down guessing: never infer architecture before documenting modules. Build up, don't assume down.
Hallucinating entities: always verify that referenced classes/functions/endpoints actually exist in code.
Skipping modules: every source module must appear in exactly one module doc and one component.
Monolithic analysis: don't try to analyze the entire codebase in one pass. Module by module, in order.
Inventing restrictions: only document constraints actually evidenced in code, configs, or Dockerfiles.
Vague acceptance criteria: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
Writing code: this skill produces documents, never implementation code.

Methodology Quick Reference

┌──────────────────────────────────────────────────────────────────┐
│          Bottom-Up Codebase Documentation (8-Step)               │
├──────────────────────────────────────────────────────────────────┤
│ MODE: Full / Focus Area (@dir) / Resume (state.json)             │
│ PREREQ: Check _docs/ exists (overwrite/merge/new?)               │
│ PREREQ: Check state.json for resume                              │
│                                                                  │
│ 0. Discovery          → dependency graph, tech stack, topo order │
│    (Focus Area: scoped to FOCUS_DIR + transitive deps)           │
│ 1. Module Docs        → per-module analysis (leaves first)       │
│    (batched ~5 modules; session break between batches)           │
│ 2. Component Assembly → group modules, write component specs     │
│    [BLOCKING: user confirms components]                          │
│ 3. System Synthesis   → architecture, flows, data model, deploy  │
│ 4. Verification       → compare all docs vs code, fix errors     │
│    [BLOCKING: user reviews corrections]                          │
│    [SESSION BREAK suggested before Steps 5–7]                    │
│    ── Focus Area mode stops here ──                              │
│ 5. Solution Extraction → retrospective solution.md               │
│ 6. Problem Extraction → retrospective problem, restrictions, AC  │
│    [BLOCKING: user confirms problem docs]                        │
│ 7. Final Report       → FINAL_report.md                          │
├──────────────────────────────────────────────────────────────────┤
│ Principles: Bottom-up always · Dependencies first                │
│             Incremental context · Verify against code            │
│             Save immediately · Resume from checkpoint            │
│             Batch modules · Session breaks for large codebases   │
└──────────────────────────────────────────────────────────────────┘

30 KiB Raw Blame History Unescape Escape