mirror of
https://github.com/azaion/detections.git
synced 2026-04-23 09:36:32 +00:00
446 lines
21 KiB
Markdown
446 lines
21 KiB
Markdown
---
|
|
name: document
|
|
description: |
|
|
Bottom-up codebase documentation skill. Analyzes existing code from modules up through components
|
|
to architecture, then retrospectively derives problem/restrictions/acceptance criteria.
|
|
Produces the same _docs/ artifacts as the problem, research, and plan skills, but from code
|
|
analysis instead of user interview.
|
|
Trigger phrases:
|
|
- "document", "document codebase", "document this project"
|
|
- "documentation", "generate documentation", "create documentation"
|
|
- "reverse-engineer docs", "code to docs"
|
|
- "analyze and document"
|
|
category: build
|
|
tags: [documentation, code-analysis, reverse-engineering, architecture, bottom-up]
|
|
disable-model-invocation: true
|
|
---
|
|
|
|
# Bottom-Up Codebase Documentation
|
|
|
|
Analyze an existing codebase from the bottom up — individual modules first, then components, then system-level architecture — and produce the same `_docs/` artifacts that the `problem` and `plan` skills generate, without requiring user interview.
|
|
|
|
## Core Principles
|
|
|
|
- **Bottom-up always**: module docs -> component specs -> architecture/flows -> solution -> problem extraction. Every higher level is synthesized from the level below.
|
|
- **Dependencies first**: process modules in topological order (leaves first). When documenting module X, all of X's dependencies already have docs.
|
|
- **Incremental context**: each module's doc uses already-written dependency docs as context — no ever-growing chain.
|
|
- **Verify against code**: cross-reference every entity in generated docs against actual codebase. Catch hallucinations.
|
|
- **Save immediately**: write each artifact as soon as its step completes. Enable resume from any checkpoint.
|
|
- **Ask, don't assume**: when code intent is ambiguous, ASK the user before proceeding.
|
|
|
|
## Context Resolution
|
|
|
|
Fixed paths:
|
|
|
|
- DOCUMENT_DIR: `_docs/02_document/`
|
|
- SOLUTION_DIR: `_docs/01_solution/`
|
|
- PROBLEM_DIR: `_docs/00_problem/`
|
|
|
|
Announce resolved paths to user before proceeding.
|
|
|
|
## Prerequisite Checks
|
|
|
|
1. If `_docs/` already exists and contains files, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?**
|
|
2. Create DOCUMENT_DIR, SOLUTION_DIR, and PROBLEM_DIR if they don't exist
|
|
3. If DOCUMENT_DIR contains a `state.json`, offer to **resume from last checkpoint or start fresh**
|
|
|
|
## Progress Tracking
|
|
|
|
Create a TodoWrite with all steps (0 through 7). Update status as each step completes.
|
|
|
|
## Workflow
|
|
|
|
### Step 0: Codebase Discovery
|
|
|
|
**Role**: Code analyst
|
|
**Goal**: Build a complete map of the codebase before analyzing any code.
|
|
|
|
Scan and catalog:
|
|
|
|
1. Directory tree (ignore `node_modules`, `.git`, `__pycache__`, `bin/`, `obj/`, build artifacts)
|
|
2. Language detection from file extensions and config files
|
|
3. Package manifests: `package.json`, `requirements.txt`, `pyproject.toml`, `*.csproj`, `Cargo.toml`, `go.mod`
|
|
4. Config files: `Dockerfile`, `docker-compose.yml`, `.env.example`, CI/CD configs (`.github/workflows/`, `.gitlab-ci.yml`, `azure-pipelines.yml`)
|
|
5. Entry points: `main.*`, `app.*`, `index.*`, `Program.*`, startup scripts
|
|
6. Test structure: test directories, test frameworks, test runner configs
|
|
7. Existing documentation: README, `docs/`, wiki references, inline doc coverage
|
|
8. **Dependency graph**: build a module-level dependency graph by analyzing imports/references. Identify:
|
|
- Leaf modules (no internal dependencies)
|
|
- Entry points (no internal dependents)
|
|
- Cycles (mark for grouped analysis)
|
|
- Topological processing order
|
|
|
|
**Save**: `DOCUMENT_DIR/00_discovery.md` containing:
|
|
- Directory tree (concise, relevant directories only)
|
|
- Tech stack summary table (language, framework, database, infra)
|
|
- Dependency graph (textual list + Mermaid diagram)
|
|
- Topological processing order
|
|
- Entry points and leaf modules
|
|
|
|
**Save**: `DOCUMENT_DIR/state.json` with initial state:
|
|
```json
|
|
{
|
|
"current_step": "module-analysis",
|
|
"completed_steps": ["discovery"],
|
|
"modules_total": 0,
|
|
"modules_documented": [],
|
|
"modules_remaining": [],
|
|
"components_written": [],
|
|
"last_updated": ""
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Step 1: Module-Level Documentation
|
|
|
|
**Role**: Code analyst
|
|
**Goal**: Document every identified module individually, processing in topological order (leaves first).
|
|
|
|
For each module in topological order:
|
|
|
|
1. **Read**: read the module's source code. Assess complexity and what context is needed.
|
|
2. **Gather context**: collect already-written docs of this module's dependencies (available because of bottom-up order). Note external library usage.
|
|
3. **Write module doc** with these sections:
|
|
- **Purpose**: one-sentence responsibility
|
|
- **Public interface**: exported functions/classes/methods with signatures, input/output types
|
|
- **Internal logic**: key algorithms, patterns, non-obvious behavior
|
|
- **Dependencies**: what it imports internally and why
|
|
- **Consumers**: what uses this module (from the dependency graph)
|
|
- **Data models**: entities/types defined in this module
|
|
- **Configuration**: env vars, config keys consumed
|
|
- **External integrations**: HTTP calls, DB queries, queue operations, file I/O
|
|
- **Security**: auth checks, encryption, input validation, secrets access
|
|
- **Tests**: what tests exist for this module, what they cover
|
|
4. **Verify**: cross-check that every entity referenced in the doc exists in the codebase. Flag uncertainties.
|
|
|
|
**Cycle handling**: modules in a dependency cycle are analyzed together as a group, producing a single combined doc.
|
|
|
|
**Large modules**: if a module exceeds comfortable analysis size, split into logical sub-sections and analyze each part, then combine.
|
|
|
|
**Save**: `DOCUMENT_DIR/modules/[module_name].md` for each module.
|
|
**State**: update `state.json` after each module completes (move from `modules_remaining` to `modules_documented`).
|
|
|
|
---
|
|
|
|
### Step 2: Component Assembly
|
|
|
|
**Role**: Software architect
|
|
**Goal**: Group related modules into logical components and produce component specs.
|
|
|
|
1. Analyze module docs from Step 1 to identify natural groupings:
|
|
- By directory structure (most common)
|
|
- By shared data models or common purpose
|
|
- By dependency clusters (tightly coupled modules)
|
|
2. For each identified component, synthesize its module docs into a single component specification using `templates/component-spec.md` as structure:
|
|
- High-level overview: purpose, pattern, upstream/downstream
|
|
- Internal interfaces: method signatures, DTOs (from actual module code)
|
|
- External API specification (if the component exposes HTTP/gRPC endpoints)
|
|
- Data access patterns: queries, caching, storage estimates
|
|
- Implementation details: algorithmic complexity, state management, key libraries
|
|
- Extensions and helpers: shared utilities needed
|
|
- Caveats and edge cases: limitations, race conditions, bottlenecks
|
|
- Dependency graph: implementation order relative to other components
|
|
- Logging strategy
|
|
3. Identify common helpers shared across multiple components -> document in `common-helpers/`
|
|
4. Generate component relationship diagram (Mermaid)
|
|
|
|
**Self-verification**:
|
|
- [ ] Every module from Step 1 is covered by exactly one component
|
|
- [ ] No component has overlapping responsibility with another
|
|
- [ ] Inter-component interfaces are explicit (who calls whom, with what)
|
|
- [ ] Component dependency graph has no circular dependencies
|
|
|
|
**Save**:
|
|
- `DOCUMENT_DIR/components/[##]_[name]/description.md` per component
|
|
- `DOCUMENT_DIR/common-helpers/[##]_helper_[name].md` per shared helper
|
|
- `DOCUMENT_DIR/diagrams/components.md` (Mermaid component diagram)
|
|
|
|
**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms the component breakdown is correct.
|
|
|
|
---
|
|
|
|
### Step 3: System-Level Synthesis
|
|
|
|
**Role**: Software architect
|
|
**Goal**: From component docs, synthesize system-level documents.
|
|
|
|
All documents here are derived from component docs (Step 2) + module docs (Step 1). No new code reading should be needed. If it is, that indicates a gap in Steps 1-2 — go back and fill it.
|
|
|
|
#### 3a. Architecture
|
|
|
|
Using `templates/architecture.md` as structure:
|
|
|
|
- System context and boundaries from entry points and external integrations
|
|
- Tech stack table from discovery (Step 0) + component specs
|
|
- Deployment model from Dockerfiles, CI configs, environment strategies
|
|
- Data model overview from per-component data access sections
|
|
- Integration points from inter-component interfaces
|
|
- NFRs from test thresholds, config limits, health checks
|
|
- Security architecture from per-module security observations
|
|
- Key ADRs inferred from technology choices and patterns
|
|
|
|
**Save**: `DOCUMENT_DIR/architecture.md`
|
|
|
|
#### 3b. System Flows
|
|
|
|
Using `templates/system-flows.md` as structure:
|
|
|
|
- Trace main flows through the component interaction graph
|
|
- Entry point -> component chain -> output for each major flow
|
|
- Mermaid sequence diagrams and flowcharts
|
|
- Error scenarios from exception handling patterns
|
|
- Data flow tables per flow
|
|
|
|
**Save**: `DOCUMENT_DIR/system-flows.md` and `DOCUMENT_DIR/diagrams/flows/flow_[name].md`
|
|
|
|
#### 3c. Data Model
|
|
|
|
- Consolidate all data models from module docs
|
|
- Entity-relationship diagram (Mermaid ERD)
|
|
- Migration strategy (if ORM/migration tooling detected)
|
|
- Seed data observations
|
|
- Backward compatibility approach (if versioning found)
|
|
|
|
**Save**: `DOCUMENT_DIR/data_model.md`
|
|
|
|
#### 3d. Deployment (if Dockerfile/CI configs exist)
|
|
|
|
- Containerization summary
|
|
- CI/CD pipeline structure
|
|
- Environment strategy (dev, staging, production)
|
|
- Observability (logging patterns, metrics, health checks found in code)
|
|
|
|
**Save**: `DOCUMENT_DIR/deployment/` (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md — only files for which sufficient code evidence exists)
|
|
|
|
---
|
|
|
|
### Step 4: Verification Pass
|
|
|
|
**Role**: Quality verifier
|
|
**Goal**: Compare every generated document against actual code. Fix hallucinations, fill gaps, correct inaccuracies.
|
|
|
|
For each document generated in Steps 1-3:
|
|
|
|
1. **Entity verification**: extract all code entities (class names, function names, module names, endpoints) mentioned in the doc. Cross-reference each against the actual codebase. Flag any that don't exist.
|
|
2. **Interface accuracy**: for every method signature, DTO, or API endpoint in component specs, verify it matches actual code.
|
|
3. **Flow correctness**: for each system flow diagram, trace the actual code path and verify the sequence matches.
|
|
4. **Completeness check**: are there modules or components discovered in Step 0 that aren't covered by any document? Flag gaps.
|
|
5. **Consistency check**: do component docs agree with architecture doc? Do flow diagrams match component interfaces?
|
|
|
|
Apply corrections inline to the documents that need them.
|
|
|
|
**Save**: `DOCUMENT_DIR/04_verification_log.md` with:
|
|
- Total entities verified vs flagged
|
|
- Corrections applied (which document, what changed)
|
|
- Remaining gaps or uncertainties
|
|
- Completeness score (modules covered / total modules)
|
|
|
|
**BLOCKING**: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.
|
|
|
|
---
|
|
|
|
### Step 5: Solution Extraction (Retrospective)
|
|
|
|
**Role**: Software architect
|
|
**Goal**: From all verified technical documentation, retrospectively create `solution.md` — the same artifact the research skill produces. This makes downstream skills (`plan`, `deploy`, `decompose`) compatible with the documented codebase.
|
|
|
|
Synthesize from architecture (Step 3) + component specs (Step 2) + system flows (Step 3) + verification findings (Step 4):
|
|
|
|
1. **Product Solution Description**: what the system is, brief component interaction diagram (Mermaid)
|
|
2. **Architecture**: the architecture that is implemented, with per-component solution tables:
|
|
|
|
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
|
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
|
| [actual implementation] | [libs/platforms used] | [observed strengths] | [observed limitations] | [requirements met] | [security approach] | [cost indicators] | [fitness assessment] |
|
|
|
|
3. **Testing Strategy**: summarize integration/functional tests and non-functional tests found in the codebase
|
|
4. **References**: links to key config files, Dockerfiles, CI configs that evidence the solution choices
|
|
|
|
**Save**: `SOLUTION_DIR/solution.md` (`_docs/01_solution/solution.md`)
|
|
|
|
---
|
|
|
|
### Step 6: Problem Extraction (Retrospective)
|
|
|
|
**Role**: Business analyst
|
|
**Goal**: From all verified technical docs, retrospectively derive the high-level problem definition — producing the same documents the `problem` skill creates through interview.
|
|
|
|
This is the inverse of normal workflow: instead of problem -> solution -> code, we go code -> technical docs -> problem understanding.
|
|
|
|
#### 6a. `problem.md`
|
|
|
|
- Synthesize from architecture overview + component purposes + system flows
|
|
- What is this system? What problem does it solve? Who are the users? How does it work at a high level?
|
|
- Cross-reference with README if one exists
|
|
- Free-form text, concise, readable by someone unfamiliar with the project
|
|
|
|
#### 6b. `restrictions.md`
|
|
|
|
- Extract from: tech stack choices, Dockerfile specs (OS, base images), CI configs (platform constraints), dependency versions, environment configs
|
|
- Categorize with headers: Hardware, Software, Environment, Operational
|
|
- Each restriction should be specific and testable
|
|
|
|
#### 6c. `acceptance_criteria.md`
|
|
|
|
- Derive from: test assertions (expected values, thresholds), performance configs (timeouts, rate limits, batch sizes), health check endpoints, validation rules in code
|
|
- Categorize with headers by domain
|
|
- Every criterion must have a measurable value — if only implied, note the source
|
|
|
|
#### 6d. `input_data/`
|
|
|
|
- Document data schemas found (DB schemas, API request/response types, config file formats)
|
|
- Create `data_parameters.md` describing what data the system consumes, formats, volumes, update patterns
|
|
|
|
#### 6e. `security_approach.md` (only if security code found)
|
|
|
|
- Authentication mechanisms, authorization patterns, encryption, secrets handling, CORS, rate limiting, input sanitization — all from code observations
|
|
- If no security-relevant code found, skip this file
|
|
|
|
**Save**: all files to `PROBLEM_DIR/` (`_docs/00_problem/`)
|
|
|
|
**BLOCKING**: Present all problem documents to user. These are the most abstracted and therefore most prone to interpretation error. Do NOT proceed until user confirms or requests corrections.
|
|
|
|
---
|
|
|
|
### Step 7: Final Report
|
|
|
|
**Role**: Technical writer
|
|
**Goal**: Produce `FINAL_report.md` integrating all generated documentation.
|
|
|
|
Using `templates/final-report.md` as structure:
|
|
|
|
- Executive summary from architecture + problem docs
|
|
- Problem statement (transformed from problem.md, not copy-pasted)
|
|
- Architecture overview with tech stack one-liner
|
|
- Component summary table (number, name, purpose, dependencies)
|
|
- System flows summary table
|
|
- Risk observations from verification log (Step 4)
|
|
- Open questions (uncertainties flagged during analysis)
|
|
- Artifact index listing all generated documents with paths
|
|
|
|
**Save**: `DOCUMENT_DIR/FINAL_report.md`
|
|
|
|
**State**: update `state.json` with `current_step: "complete"`.
|
|
|
|
---
|
|
|
|
## Artifact Management
|
|
|
|
### Directory Structure
|
|
|
|
```
|
|
_docs/
|
|
├── 00_problem/ # Step 6 (retrospective)
|
|
│ ├── problem.md
|
|
│ ├── restrictions.md
|
|
│ ├── acceptance_criteria.md
|
|
│ ├── input_data/
|
|
│ │ └── data_parameters.md
|
|
│ └── security_approach.md
|
|
├── 01_solution/ # Step 5 (retrospective)
|
|
│ └── solution.md
|
|
└── 02_document/ # DOCUMENT_DIR
|
|
├── 00_discovery.md # Step 0
|
|
├── modules/ # Step 1
|
|
│ ├── [module_name].md
|
|
│ └── ...
|
|
├── components/ # Step 2
|
|
│ ├── 01_[name]/description.md
|
|
│ ├── 02_[name]/description.md
|
|
│ └── ...
|
|
├── common-helpers/ # Step 2
|
|
├── architecture.md # Step 3
|
|
├── system-flows.md # Step 3
|
|
├── data_model.md # Step 3
|
|
├── deployment/ # Step 3
|
|
├── diagrams/ # Steps 2-3
|
|
│ ├── components.md
|
|
│ └── flows/
|
|
├── 04_verification_log.md # Step 4
|
|
├── FINAL_report.md # Step 7
|
|
└── state.json # Resumability
|
|
```
|
|
|
|
### Resumability
|
|
|
|
Maintain `DOCUMENT_DIR/state.json`:
|
|
|
|
```json
|
|
{
|
|
"current_step": "module-analysis",
|
|
"completed_steps": ["discovery"],
|
|
"modules_total": 12,
|
|
"modules_documented": ["utils/helpers", "models/user"],
|
|
"modules_remaining": ["services/auth", "api/endpoints"],
|
|
"components_written": [],
|
|
"last_updated": "2026-03-21T14:00:00Z"
|
|
}
|
|
```
|
|
|
|
Update after each module/component completes. If interrupted, resume from next undocumented module.
|
|
|
|
When resuming:
|
|
1. Read `state.json`
|
|
2. Cross-check against actual files in DOCUMENT_DIR (trust files over state if they disagree)
|
|
3. Continue from the next incomplete item
|
|
4. Inform user which steps are being skipped
|
|
|
|
### Save Principles
|
|
|
|
1. **Save immediately**: write each module doc as soon as analysis completes
|
|
2. **Incremental context**: each subsequent module uses already-written docs as context
|
|
3. **Preserve intermediates**: keep all module docs even after synthesis into component docs
|
|
4. **Enable recovery**: state file tracks exact progress for resume
|
|
|
|
## Escalation Rules
|
|
|
|
| Situation | Action |
|
|
|-----------|--------|
|
|
| Minified/obfuscated code detected | WARN user, skip module, note in verification log |
|
|
| Module too large for context window | Split into sub-sections, analyze parts separately, combine |
|
|
| Cycle in dependency graph | Group cycled modules, analyze together as one doc |
|
|
| Generated code (protobuf, swagger-gen) | Note as generated, document the source spec instead |
|
|
| No tests found in codebase | Note gap in acceptance_criteria.md, derive AC from validation rules and config limits only |
|
|
| Contradictions between code and README | Flag in verification log, ASK user |
|
|
| Binary files or non-code assets | Skip, note in discovery |
|
|
| `_docs/` already exists | ASK user: overwrite, merge, or use `_docs_generated/` |
|
|
| Code intent is ambiguous | ASK user, do not guess |
|
|
|
|
## Common Mistakes
|
|
|
|
- **Top-down guessing**: never infer architecture before documenting modules. Build up, don't assume down.
|
|
- **Hallucinating entities**: always verify that referenced classes/functions/endpoints actually exist in code.
|
|
- **Skipping modules**: every source module must appear in exactly one module doc and one component.
|
|
- **Monolithic analysis**: don't try to analyze the entire codebase in one pass. Module by module, in order.
|
|
- **Inventing restrictions**: only document constraints actually evidenced in code, configs, or Dockerfiles.
|
|
- **Vague acceptance criteria**: "should be fast" is not a criterion. Extract actual numeric thresholds from code.
|
|
- **Writing code**: this skill produces documents, never implementation code.
|
|
|
|
## Methodology Quick Reference
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ Bottom-Up Codebase Documentation (8-Step) │
|
|
├──────────────────────────────────────────────────────────────────┤
|
|
│ PREREQ: Check _docs/ exists (overwrite/merge/new?) │
|
|
│ PREREQ: Check state.json for resume │
|
|
│ │
|
|
│ 0. Discovery → dependency graph, tech stack, topo order │
|
|
│ 1. Module Docs → per-module analysis (leaves first) │
|
|
│ 2. Component Assembly → group modules, write component specs │
|
|
│ [BLOCKING: user confirms components] │
|
|
│ 3. System Synthesis → architecture, flows, data model, deploy │
|
|
│ 4. Verification → compare all docs vs code, fix errors │
|
|
│ [BLOCKING: user reviews corrections] │
|
|
│ 5. Solution Extraction → retrospective solution.md │
|
|
│ 6. Problem Extraction → retrospective problem, restrictions, AC │
|
|
│ [BLOCKING: user confirms problem docs] │
|
|
│ 7. Final Report → FINAL_report.md │
|
|
├──────────────────────────────────────────────────────────────────┤
|
|
│ Principles: Bottom-up always · Dependencies first │
|
|
│ Incremental context · Verify against code │
|
|
│ Save immediately · Resume from checkpoint │
|
|
└──────────────────────────────────────────────────────────────────┘
|
|
```
|