12 KiB
name, description, category, tags, disable-model-invocation
| name | description | category | tags | disable-model-invocation | |||||
|---|---|---|---|---|---|---|---|---|---|
| retrospective | Collect metrics from implementation batch reports and code review findings, analyze trends across cycles, and produce improvement reports with actionable recommendations. 3-step workflow: collect metrics, analyze trends, produce report. Outputs to _docs/06_metrics/. Trigger phrases: - "retrospective", "retro", "run retro" - "metrics review", "feedback loop" - "implementation metrics", "analyze trends" | evolve |
|
true |
Retrospective
Collect metrics from implementation artifacts, analyze trends across development cycles, and produce actionable improvement reports.
Core Principles
- Data-driven: conclusions come from metrics, not impressions
- Actionable: every finding must have a concrete improvement suggestion
- Cumulative: each retrospective compares against previous ones to track progress
- Save immediately: write artifacts to disk after each step
- Non-judgmental: focus on process improvement, not blame
Context Resolution
Fixed paths:
- IMPL_DIR:
_docs/03_implementation/ - METRICS_DIR:
_docs/06_metrics/ - TASKS_DIR:
_docs/02_tasks/(scan all subfolders:todo/,backlog/,done/)
Announce the resolved paths to the user before proceeding.
Prerequisite Checks (BLOCKING)
IMPL_DIRexists and contains at least onebatch_*_report.md— STOP if missing (nothing to analyze)- Create METRICS_DIR if it does not exist
- Check for previous retrospective reports in METRICS_DIR to enable trend comparison
Artifact Management
Directory Structure
METRICS_DIR/
├── retro_[YYYY-MM-DD].md
├── retro_[YYYY-MM-DD].md
└── ...
Invocation Modes
- cycle-end mode (default): invoked automatically at end of cycle by the autodev orchestrator — as greenfield Step 11 Retrospective (after Step 10 Deploy) and existing-code Step 17 Retrospective (after Step 16 Deploy). Runs Steps 1–4. Output:
retro_<YYYY-MM-DD>.md+ LESSONS.md update. - incident mode: invoked automatically after the failure retry protocol reaches
retry_count: 3and the user has made a recovery choice. Runs Steps 1 (scoped to the failing skill's artifacts only), 2 (focused on the failure), 3 (shorter report), 4 (append 1–3 lessons in theprocessortoolingcategory). Output:_docs/06_metrics/incident_<YYYY-MM-DD>_<skill>.md+ LESSONS.md update. Pass the invocation context withmode: incident,failing_skill: <skill-name>, andfailure_summary: <string>. - on-demand mode: user-triggered (trigger phrases above). Runs Steps 1–4 over the entire artifact set.
Progress Tracking
At the start of execution, create a TodoWrite with all steps (1 through 4). Update status as each step completes.
Workflow
Step 1: Collect Metrics
Role: Data analyst Goal: Parse all implementation artifacts and extract quantitative metrics Constraints: Collection only — no interpretation yet
Sources
| Source | Metrics Extracted |
|---|---|
batch_*_report.md |
Tasks per batch, batch count, task statuses (Done/Blocked/Partial) |
| Code review sections in batch reports | PASS/FAIL/PASS_WITH_WARNINGS ratios, finding counts by severity and category |
| Task spec files in TASKS_DIR | Complexity points per task, dependency count |
implementation_report_*.md |
Total tasks, total batches, overall duration |
| Git log (if available) | Commits per batch, files changed per batch |
cumulative_review_batches_*.md ## Baseline Delta |
Architecture findings: carried over / resolved / newly introduced counts |
_docs/02_document/module-layout.md + source import graph |
Component count, cross-component edges, cycles, avg imports/module |
_docs/02_document/contracts/**/*.md |
Contract count, contracts per public-API symbol |
Metrics to Compute
Implementation Metrics:
- Total tasks implemented
- Total batches executed
- Average tasks per batch
- Average complexity points per batch
- Total complexity points delivered
Quality Metrics:
- Code review pass rate (PASS / total reviews)
- Code review findings by severity: Critical, High, Medium, Low counts
- Code review findings by category: Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope
- FAIL count (batches that required user intervention)
Structural Metrics (skip only if module-layout.md is absent):
- Component count and change vs previous cycle
- Cross-component import edges and change vs previous cycle
- Cycles in the component import graph (should stay 0; any new cycle is a regression)
- Average imports per module
- New Architecture violations this cycle (from
## Baseline Delta→ Newly introduced) - Resolved Architecture violations this cycle (from
## Baseline Delta→ Resolved) - Net Architecture delta = new − resolved (negative is good)
- Percentage of public-API symbols covered by a contract file (contract count / public-API symbol count)
shared/*entries used by ≥2 components (healthy) vs by ≤1 component (dead cross-cutting)
Persist the structural snapshot to METRICS_DIR/structure_[YYYY-MM-DD].md so future retros can compute deltas without re-deriving from source.
Efficiency Metrics:
- Blocked task count and reasons
- Tasks completed on first attempt vs requiring fixes
- Batch with most findings (identify problem areas)
Auto-lesson triggers (feed Step 4 LESSONS.md generation):
- Net Architecture delta > 0 this cycle →
architecturelesson - Any structural metric regressed by >20% vs previous snapshot →
architectureordependencieslesson depending on the metric - Contract coverage % decreased →
architecturelesson
Self-verification:
- All batch reports parsed
- All metric categories computed
- No batch reports missed
- Structural snapshot written (or explicitly skipped with reason "module-layout.md absent")
- If a previous
structure_*.mdexists, deltas are computed against the most recent one
Step 2: Analyze Trends
Role: Process improvement analyst Goal: Identify patterns, recurring issues, and improvement opportunities Constraints: Analysis must be grounded in the metrics from Step 1
- If previous retrospective reports exist in METRICS_DIR, load the most recent one for comparison
- Identify patterns:
- Recurring findings: which code review categories appear most frequently?
- Problem components: which components/files generate the most findings?
- Complexity accuracy: do high-complexity tasks actually produce more issues?
- Blocker patterns: what types of blockers occur and can they be prevented?
- Compare against previous retrospective (if exists):
- Which metrics improved?
- Which metrics degraded?
- Were previous improvement actions effective?
- Identify top 3 improvement actions ranked by impact
Self-verification:
- Patterns are grounded in specific metrics
- Comparison with previous retro included (if exists)
- Top 3 actions are concrete and actionable
Step 3: Produce Report
Role: Technical writer Goal: Write a structured retrospective report with metrics, trends, and recommendations Constraints: Concise, data-driven, actionable
Write METRICS_DIR/retro_[YYYY-MM-DD].md using templates/retrospective-report.md as structure.
Self-verification:
- All metrics from Step 1 included
- Trend analysis from Step 2 included
- Top 3 improvement actions clearly stated
- Suggested rule/skill updates are specific
Save action: Write retro_[YYYY-MM-DD].md (in cycle-end / on-demand mode) or incident_[YYYY-MM-DD]_[skill].md (in incident mode).
Present the report summary to the user.
Step 4: Update Lessons Log
Role: Process improvement analyst Goal: Keep a short, frequently-consulted log of actionable lessons that downstream skills read before they plan or estimate.
-
Extract the top 3 concrete lessons from the current retrospective (or 1–3 lessons in incident mode, scoped to the failing skill). Each lesson must:
- Be specific enough to change future behavior (not a platitude).
- Be single-sentence.
- Be tied to one of the categories:
estimation,architecture,testing,dependencies,tooling,process.
-
Append one bullet per lesson to
_docs/LESSONS.mdusing this format:- [YYYY-MM-DD] [category] one-line lesson statement. Source: _docs/06_metrics/retro_YYYY-MM-DD.md -
After appending, trim
_docs/LESSONS.mdto keep only the last 15 entries (ring buffer). Oldest entries drop off the top. Preserve the file's header section if present. -
If
_docs/LESSONS.mddoes not exist, create it with this skeleton before appending:# Lessons Log A ring buffer of the last 15 actionable lessons extracted from retrospectives and incidents. Downstream skills consume this file: - `.cursor/skills/new-task/SKILL.md` (Step 2 Complexity Assessment) - `.cursor/skills/plan/steps/06_work-item-epics.md` (epic sizing) - `.cursor/skills/decompose/SKILL.md` (Step 2 task complexity) - `.cursor/skills/autodev/SKILL.md` (Execution Loop step 0 — surface top 3 lessons) Categories: estimation · architecture · testing · dependencies · tooling · process
Self-verification:
- 1–3 lessons extracted (3 in cycle-end / on-demand mode, 1–3 in incident mode)
- Each lesson is single-sentence, specific, and tagged with a valid category
- Each lesson includes a Source link back to its retro or incident file
_docs/LESSONS.mdtrimmed to at most 15 entries- Skeleton header preserved if file was just created
Save action: Write (or update) _docs/LESSONS.md.
Escalation Rules
| Situation | Action |
|---|---|
| No batch reports exist | STOP — nothing to analyze |
| Batch reports have inconsistent format | WARN user, extract what is available |
| No previous retrospective for comparison | PROCEED — report baseline metrics only |
| Metrics suggest systemic issue (>50% FAIL rate) | WARN user — suggest immediate process review |
Methodology Quick Reference
┌────────────────────────────────────────────────────────────────┐
│ Retrospective (3-Step Method) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ: batch reports exist in _docs/03_implementation/ │
│ │
│ 1. Collect Metrics → parse batch reports, compute metrics │
│ 2. Analyze Trends → patterns, comparison, improvement areas │
│ 3. Produce Report → _docs/06_metrics/retro_[date].md │
│ 4. Update Lessons → append top-3 to _docs/LESSONS.md (≤15) │
├────────────────────────────────────────────────────────────────┤
│ Principles: Data-driven · Actionable · Cumulative │
│ Non-judgmental · Save immediately │
└────────────────────────────────────────────────────────────────┘