chore: WIP pre-implement

Bundled hygiene commit before cycle-3 /implement (AZ-776, AZ-777). Mixes
two concerns by user choice (autodev option B):

- Cycle-3 autodev artifacts not yet committed by Step 9 (new-task):
  task specs for AZ-776 / AZ-777 under _docs/02_tasks/todo/ and the
  updated _docs/02_tasks/_dependencies_table.md.
- Accumulated skill / rule tooling maintenance under .cursor/ (skills:
  autodev, code-review, decompose, deploy, implement, new-task, plan,
  refactor, retrospective, test-spec; rules: coderule, cursor-meta,
  meta-rule, testing; new release skill scaffolding).
- Autodev bootstrap state: _docs/_autodev_state.md (step 10 in_progress)
  and _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md
  (replay timestamp refreshed; gtsam 4.2 still numpy<2-only).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-21 13:14:11 +03:00
parent 9bc170ffe0
commit 6044a33197
36 changed files with 1559 additions and 252 deletions
+136 -72
View File
@@ -11,10 +11,20 @@ If you want to run a specific skill directly (without the orchestrator), use the
``` ```
/problem — interactive problem gathering → _docs/00_problem/ /problem — interactive problem gathering → _docs/00_problem/
/research — solution drafts → _docs/01_solution/ /research — solution drafts → _docs/01_solution/
/plan — architecture, components, tests → _docs/02_document/ /plan — architecture, ADRs, components, tests, epics → _docs/02_document/
/decomposeatomic task specs → _docs/02_tasks/todo/ /test-specblackbox/perf/resilience/security test specs → _docs/02_document/tests/
/implementbatched parallel implementation → _docs/03_implementation/ /decompose — atomic task specs (multi-mode) → _docs/02_tasks/todo/
/deploy containerization, CI/CD, observability → _docs/04_deploy/ /implementsequential dependency-aware batches with code review and completeness gates → _docs/03_implementation/
/test-run — runs the test suite (functional / perf modes) with gating
/code-review — multi-phase review used by /implement
/refactor — 8-phase structured refactoring (incl. testability sub-mode) → _docs/04_refactoring/
/security — OWASP-driven audit → _docs/05_security/
/deploy — containerization, CI/CD, environments, observability, procedures, scripts → _docs/04_deploy/
/release — execute deploy artifacts in prod, smoke-test, watch, decide rollback → _docs/04_release/
/document — bottom-up reverse-engineering of an existing codebase → _docs/02_document/
/new-task — interactive feature planning for an existing codebase → _docs/02_tasks/todo/
/ui-design — HTML+CSS mockups + design system → _docs/02_document/ui_mockups/
/retrospective — metrics + lessons log → _docs/06_metrics/ + _docs/LESSONS.md
``` ```
## How It Works ## How It Works
@@ -41,148 +51,201 @@ The state file tracks completed steps, key decisions, blockers, and session cont
Skills auto-chain without pausing between them. The only pauses are: Skills auto-chain without pausing between them. The only pauses are:
- **BLOCKING gates** inside each skill (user must confirm before proceeding) - **BLOCKING gates** inside each skill (user must confirm before proceeding)
- **Session boundary** after decompose (suggests new conversation before implement) - **Session boundaries** declared in each flow's auto-chain rules (e.g., after `decompose`, after `decompose tests`) — suggested new-conversation breakpoints to keep context fresh
A typical project runs in 2-4 conversations: There are three flows, resolved on every invocation (see `skills/autodev/SKILL.md` § Flow Resolution):
- Session 1: Problem → Research → Research decision
- Session 2: Plan → Decompose
- Session 3: Implement (may span multiple sessions)
- Session 4: Deploy
Re-entry is seamless: type `/autodev` in a new conversation and the orchestrator reads the state file to pick up exactly where you left off. | Flow | When | Steps |
|------|------|-------|
| **greenfield** | empty workspace, no source yet | 17 steps: Problem → Research → Plan → UI Design → Test Spec → Decompose → Implement → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → Test-Spec Sync → Update Docs → Security Audit (opt) → Performance Test (opt) → Deploy → Release → Retrospective |
| **existing-code** | source files present | one-time baseline (Document → Architecture Baseline Scan → Test Spec → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → optional Refactor) then a feature-cycle loop (New Task → Implement → Run Tests → Test-Spec Sync → Update Docs → Security Audit (opt) → Performance Test (opt) → Deploy → Release → Retrospective → loops back to New Task) |
| **meta-repo** | `.gitmodules`, workspace manifest, or multi-component aggregator | uses `monorepo-*` skills + `_docs/_repo-config.yaml` instead of per-component BUILD-SHIP folders |
A typical greenfield project spans several conversations because of session boundaries. Re-entry is seamless: type `/autodev` in a new conversation and the orchestrator reads `_docs/_autodev_state.md` to pick up exactly where you left off.
## Skill Descriptions ## Skill Descriptions
### autodev (meta-orchestrator) ### autodev (meta-orchestrator)
Auto-chaining engine that sequences the full BUILD → SHIP workflow. Persists state to `_docs/_autodev_state.md`, tracks key decisions and session context, and flows through problem → research → plan → decompose → implement → deploy without manual skill invocation. Maximizes work per conversation with seamless cross-session re-entry. Auto-chaining engine that sequences the full BUILD → SHIP → EVOLVE workflow. Persists state to `_docs/_autodev_state.md`, surfaces top-3 lessons from `_docs/LESSONS.md` at every invocation, replays any `_docs/_process_leftovers/` entries, tracks key decisions and session context, and flows through the active flow's steps without manual skill invocation. Maximizes work per conversation with seamless cross-session re-entry.
### problem ### problem
Interactive interview that builds `_docs/00_problem/`. Asks probing questions across 8 dimensions (problem, scope, hardware, software, acceptance criteria, input data, security, operations) until all required files can be written with concrete, measurable content. Interactive 4-phase interview that builds `_docs/00_problem/`. Asks probing questions across 8 dimensions (problem & goals, scope, hardware & environment, software & tech, acceptance criteria, input data, security, operational) until all required files can be written with concrete, measurable, quantifiable content. Acceptance criteria must include numeric targets; input data must include `expected_results/` mappings.
### research ### research
8-step deep research methodology. Mode A produces initial solution drafts. Mode B assesses and revises existing drafts. Includes AC assessment, source tiering, fact extraction, comparison frameworks, and validation. Run multiple rounds until the solution is solid. 8-step deep research methodology. Mode A produces initial solution drafts. Mode B assesses and revises existing drafts. Classifies output as **Technical-component selection** (full per-mode API verification gates apply) or **Non-technical investigation** (gates relaxed). Source tiering, fact extraction, comparison frameworks, validation, exact-fit component selection. Run multiple rounds until the solution is solid.
### plan ### plan
6-step planning workflow. Produces integration test specs, architecture, system flows, data model, deployment plan, component specs with interfaces, risk assessment, test specifications, and work item epics. Heavy interaction at BLOCKING gates. 6-step planning workflow with one half-step (4.5: Architecture Decision Records). Produces blackbox test specs (delegated to test-spec), glossary, architecture vision, architecture document, data model, deployment plan, component specs with interfaces, risk assessment, ADRs, test specifications, and work item epics. Heavy interaction at BLOCKING gates (glossary+vision, architecture, components, mitigations, ADRs).
### test-spec
4-phase test specification workflow. Phase 1 analyzes input data + expected-results completeness. Phase 2 emits 8 test artifacts (environment, test-data, blackbox, performance, resilience, security, resource-limit, traceability matrix). Phase 3 is the hard gate that requires every test to have quantifiable expected results. Phase 4 emits runner scripts. Cycle-update mode for incremental refresh.
### decompose ### decompose
4-step task decomposition. Produces a bootstrap structure plan, atomic task specs per component, integration test tasks, and a cross-task dependency table. Each task gets a work item ticket and is capped at 8 complexity points. Multi-mode task decomposition with 6 internal step files. Implementation mode runs Step 1 (Bootstrap), 1.5 (Module Layout), 1.7 (System-Pipeline owner tasks), 2 (per-component tasks), 4 (Cross-Verification). Tests-only mode runs Step 1t (Test Infrastructure), 3 (Blackbox tasks), 4. Single-component mode runs Step 2 only. Each task is tracker-prefixed and capped at 5 complexity points. The 1.7 step exists specifically to prevent the GPS-passthrough class of failure (see `meta-rule.mdc`).
### implement ### implement
Orchestrator that reads task specs, computes dependency-aware execution batches, launches up to 4 parallel implementer subagents, runs code review after each batch, and commits per batch. Does not write code itself. Orchestrator that reads task specs, computes dependency-aware execution batches via topological sort, **implements tasks sequentially within each batch** (no subagents, no parallel execution — see `.cursor/rules/no-subagents.mdc`), runs code review after each batch, runs cumulative code review every K batches, and commits per batch. Has a Product Implementation Completeness Gate (Step 15) that compares promises in task specs / architecture against actual production code, plus a System-Pipeline Audit (Step 15.b) that walks architecture-named pipelines and verifies a real production caller wires each adjacent component pair. Either gate's FAIL stops the cycle until remediation tasks are created.
### deploy
7-step deployment planning. Status check, containerization, CI/CD pipeline, environment strategy, observability, deployment procedures, and deployment scripts. Produces documents for steps 1-6 and executable scripts in step 7.
### code-review ### code-review
Multi-phase code review against task specs. Produces structured findings with verdict: PASS, FAIL, or PASS_WITH_WARNINGS. 7-phase code review against task specs (Phase 7 is Architecture Compliance against `module-layout.md` and `architecture.md`). Produces structured findings with verdict: PASS, PASS_WITH_WARNINGS, or FAIL. Three modes: full (per batch), baseline (one-time architecture scan of an existing codebase), cumulative (mid-implementation across batches with `## Baseline Delta`).
### test-run
Runs the test suite. Functional mode (default): detects pytest/dotnet/cargo/npm or `scripts/run-tests.sh`, applies a System-Under-Test Reality Gate to refuse passes where internal product modules were stubbed, classifies failures and skips, gates on outcome. Perf mode: detects `scripts/run-performance-tests.sh` or k6/locust/artillery/wrk, captures latency/throughput/error metrics, compares against thresholds.
### refactor ### refactor
6-phase structured refactoring: baseline, discovery, analysis, safety net, execution, hardening. 8-phase structured refactoring: baseline discovery analysis safety net execution → test sync → verification → documentation. Two input modes (Automatic / Guided). Testability sub-mode skips Phase 3 by design and emits a `testability_changes_summary.md` for user review. Each run lives in its own `RUN_DIR` under `_docs/04_refactoring/NN-<run-name>/`.
### security ### security
OWASP-based security testing and audit. 5-phase OWASP-based audit: dependency scan → static analysis → OWASP Top 10 review → infrastructure review → consolidated security report. Severity-ranked, evidence-based, actionable. Complementary to `code-review` Phase 4 (lightweight security quick-scan).
### deploy
7-step deployment planning. Produces documents for steps 16 (status & env, containerization, CI/CD pipeline, environment strategy, observability, deployment procedures) and executable scripts in step 7 (`deploy.sh`, `pull-images.sh`, `start-services.sh`, `stop-services.sh`, `health-check.sh`).
### release
Executes the deployment plan produced by `/deploy` against a target environment. 6 phases: pre-release gate (AC + risk + rollback readiness), strategy select (all-at-once / blue-green / canary / manual), execute (run scripts, monitor exit codes), smoke test (delegate to test-run prod-smoke), watch window (read observability for the configured duration), commit-or-rollback. Outputs `_docs/04_release/release_<version>.md`. Produces a definitive Released / Rolled-Back / Aborted verdict; failure of any phase auto-triggers rollback unless the user opts to investigate.
### retrospective ### retrospective
Collects metrics from implementation batch reports, analyzes trends, produces improvement reports. 4-step workflow: collect metrics → analyze trends → produce report → update lessons log (`_docs/LESSONS.md`, ring buffer of last 15 entries consumed by `new-task`, `plan`, `decompose`, and `autodev`). Cycle-end (default) and incident modes; incident mode is auto-invoked after a 3-strike failure.
### document ### document
Bottom-up codebase documentation. Analyzes existing code from modules through components to architecture, then retrospectively derives problem/restrictions/acceptance criteria. Alternative entry point for existing codebases — produces the same `_docs/` artifacts as problem + plan, but from code analysis instead of user interview. Bottom-up codebase documentation. Analyzes existing code from modules through components to architecture, then retrospectively derives problem/restrictions/acceptance criteria. Alternative entry point for existing codebases — produces the same `_docs/` artifacts as problem + plan, but from code analysis instead of user interview. Two workflow files: `workflows/full.md` (full / focus-area / resume) and `workflows/task.md` (incremental update for a single task).
### new-task
Existing-code feature planning loop. Walks the user through Step 1 (description) → Step 2 (complexity assessment, consults `LESSONS.md`) → Step 3 (research if needed) → Step 4 (codebase analysis incl. test-coverage gap) → Step 4.5 (contract & layout check) → Step 5 (validate assumptions) → Step 6 (write task spec) → Step 7 (tracker ticket) → Step 8 (loop or finalize).
### ui-design
End-to-end UI workflow. Phase 0 (complexity detection: full vs quick) → Phase 1 (context check) → Phase 2 (requirements) → Phase 3 (direction exploration) → Phase 4 (design system synthesis: `DESIGN.md`) → Phase 5 (HTML+Tailwind code generation) → Phase 6 (visual verification, optional MCP enhancements) → Phase 7 (user review) → Phase 8 (iteration). Has Applicability Check that refuses to run on non-UI projects.
### monorepo-* (suite-level)
Six skills for meta-repos: `monorepo-discover` (write/refresh `_docs/_repo-config.yaml`), `monorepo-document` (sync unified docs), `monorepo-cicd` (sync CI/compose/env templates), `monorepo-onboard` (atomic add-component), `monorepo-status` (read-only drift report), `monorepo-e2e` (sync suite-level integration harness). They never cross domains; each touches exactly one artifact class.
## Developer TODO (Project Mode) ## Developer TODO (Project Mode)
### BUILD The numbered list below mirrors greenfield-flow ordering. Existing-code projects start at `/document`, then enter the feature-cycle loop at `/new-task`. See `skills/autodev/flows/{greenfield,existing-code,meta-repo}.md` for the authoritative step tables.
### BUILD (greenfield)
``` ```
0. /problem — interactive interview → _docs/00_problem/ 1. /problem — interactive 4-phase interview → _docs/00_problem/
- problem.md (required) required: problem.md, restrictions.md, acceptance_criteria.md, input_data/
- restrictions.md (required) optional: security_approach.md
- acceptance_criteria.md (required) 2. /research — solution drafts (Mode A draft, Mode B assess) → _docs/01_solution/
- input_data/ (required) 3. /plan — glossary, architecture vision, architecture, data model, deployment, components,
- security_approach.md (optional) risks, ADRs (Step 4.5), test specs, epics → _docs/02_document/
(Step 1 invokes /test-spec internally)
1. /research — solution drafts → _docs/01_solution/ 4. /ui-design — HTML+Tailwind mockups (UI projects only) → _docs/02_document/ui_mockups/
Run multiple times: Mode A → draft, Mode B → assess & revise 5. /test-spec — produces 8 test-spec artifacts + traceability matrix → _docs/02_document/tests/
(already invoked from /plan Step 1; Step 5 here is the explicit autodev step)
2. /plan — architecture, data model, deployment, components, risks, tests, epics → _docs/02_document/ 6. /decompose — implementation tasks + module-layout + system-pipeline owner tasks →
_docs/02_tasks/todo/
3. /decompose — atomic task specs + dependency table → _docs/02_tasks/todo/ 7. /implement — sequential dependency-aware batches; per-batch code-review;
Product Completeness Gate + System-Pipeline Audit → _docs/03_implementation/
4. /implement — batched parallel agents, code review, commit per batch → _docs/03_implementation/ 8. (auto) Code Testability Revision — surgical refactor to make code runnable under tests
9. /decompose tests — test-only decomposition mode → _docs/02_tasks/todo/
10. /implement (tests) — implements test tasks
11. /test-run — full functional suite gate
12. /test-spec --cycle-update — append implementation-learned scenarios
13. /document --task — update affected component / module / architecture docs
14. /security — OWASP-based audit (optional gate)
15. /test-run --perf — perf/load tests (optional gate)
``` ```
### SHIP ### SHIP
``` ```
5. /deploy — containerization, CI/CD, environments, observability, procedures → _docs/04_deploy/ 16. /deploy — containerization, CI/CD, environments, observability, procedures, scripts → _docs/04_deploy/
17. /release — execute deploy artifacts in prod, smoke-test, watch, decide rollback → _docs/04_release/
``` ```
### EVOLVE ### EVOLVE
``` ```
6. /refactor — structured refactoring → _docs/04_refactoring/ 18. /retrospective — metrics + trends + lessons-log update → _docs/06_metrics/ + _docs/LESSONS.md
7. /retrospective — metrics, trends, improvement actions → _docs/06_metrics/ (cycle-end mode after release; incident mode auto-fires after 3-strike failure)
After greenfield completes, the state file is rewritten to point at the existing-code flow's
feature-cycle loop, which begins with /new-task and ends with /retrospective. The loop runs once
per feature with state.cycle incremented.
Off-cycle:
/refactor — full 8-phase refactor → _docs/04_refactoring/NN-<run-name>/
/document — full reverse-engineering of an unfamiliar codebase
``` ```
Or just use `/autodev` to run steps 0-5 automatically. Or just use `/autodev` to run all the above automatically — the orchestrator chooses the right flow, sequences steps, surfaces lessons, processes leftovers, and pauses only at BLOCKING gates and declared session boundaries.
## Available Skills ## Available Skills
| Skill | Triggers | Output | | Skill | Triggers | Output |
|-------|----------|--------| |-------|----------|--------|
| **autodev** | "autodev", "auto", "start", "continue", "what's next" | Orchestrates full workflow | | **autodev** | "autodev", "auto", "start", "continue", "what's next" | Orchestrates full workflow (3 flows) |
| **problem** | "problem", "define problem", "new project" | `_docs/00_problem/` | | **problem** | "problem", "define problem", "new project" | `_docs/00_problem/` |
| **research** | "research", "investigate" | `_docs/01_solution/` | | **research** | "research", "investigate" | `_docs/01_solution/` |
| **plan** | "plan", "decompose solution" | `_docs/02_document/` | | **plan** | "plan", "decompose solution" | `_docs/02_document/` (incl. ADRs) |
| **test-spec** | "test spec", "blackbox tests", "test scenarios" | `_docs/02_document/tests/` + `scripts/` | | **test-spec** | "test spec", "blackbox tests", "test scenarios" | `_docs/02_document/tests/` + `scripts/` |
| **decompose** | "decompose", "task decomposition" | `_docs/02_tasks/todo/` | | **decompose** | "decompose", "task decomposition", "decompose tests" | `_docs/02_tasks/todo/` + `_docs/02_document/module-layout.md` |
| **implement** | "implement", "start implementation" | `_docs/03_implementation/` | | **implement** | "implement", "start implementation" | `_docs/03_implementation/` (sequential — see `no-subagents.mdc`) |
| **test-run** | "run tests", "test suite", "verify tests" | Test results + verdict | | **test-run** | "run tests", "test suite", "verify tests", "perf test" | Test results + verdict |
| **code-review** | "code review", "review code" | Verdict: PASS / FAIL / PASS_WITH_WARNINGS | | **code-review** | "code review", "review code" | Verdict: PASS / FAIL / PASS_WITH_WARNINGS (7 phases) |
| **new-task** | "new task", "add feature", "new functionality" | `_docs/02_tasks/todo/` | | **new-task** | "new task", "add feature", "new functionality" | `_docs/02_tasks/todo/` |
| **ui-design** | "design a UI", "mockup", "design system" | `_docs/02_document/ui_mockups/` | | **ui-design** | "design a UI", "mockup", "design system" | `_docs/02_document/ui_mockups/` |
| **refactor** | "refactor", "improve code" | `_docs/04_refactoring/` | | **refactor** | "refactor", "improve code", "testability" | `_docs/04_refactoring/NN-<run-name>/` |
| **security** | "security audit", "OWASP" | `_docs/05_security/` | | **security** | "security audit", "OWASP", "vulnerability scan" | `_docs/05_security/` |
| **document** | "document", "document codebase", "reverse-engineer docs" | `_docs/02_document/` + `_docs/00_problem/` + `_docs/01_solution/` | | **document** | "document", "document codebase", "reverse-engineer docs" | `_docs/02_document/` + `_docs/00_problem/` + `_docs/01_solution/` |
| **deploy** | "deploy", "CI/CD", "observability" | `_docs/04_deploy/` | | **deploy** | "deploy", "CI/CD", "observability", "containerize" | `_docs/04_deploy/` (plans + scripts) |
| **retrospective** | "retrospective", "retro" | `_docs/06_metrics/` | | **release** | "release", "ship", "go live", "rollback" | `_docs/04_release/` (executed deploy + verdict) |
| **retrospective** | "retrospective", "retro", "metrics review" | `_docs/06_metrics/` + `_docs/LESSONS.md` |
| **monorepo-discover** | "discover monorepo", "scan submodules" | `_docs/_repo-config.yaml` |
| **monorepo-document** | "sync monorepo docs" | unified `_docs/*.md` |
| **monorepo-cicd** | "sync compose", "sync ci" | suite-level CI/compose/env templates |
| **monorepo-onboard** | "onboard component", "register submodule" | atomic component addition |
| **monorepo-status** | "monorepo status", "drift report" | read-only drift report |
| **monorepo-e2e** | "suite e2e", "integration harness" | `e2e/docker-compose.suite-e2e.yml` and fixtures |
## Tools > The `.cursor/agents/` directory is intentionally empty. Per `.cursor/rules/no-subagents.mdc` the main agent does not delegate to subagents in this workspace; `/implement` runs tasks sequentially.
| Tool | Type | Purpose |
|------|------|---------|
| `implementer` | Subagent | Implements a single task. Launched by `/implement`. |
## Project Folder Structure ## Project Folder Structure
``` ```
_project.md — project-specific config (tracker type, project key, etc.)
_docs/ _docs/
├── _autodev_state.md — autodev orchestrator state (progress, decisions, session context) ├── _autodev_state.md — autodev orchestrator state (≤30 lines; pointer only)
├── 00_problem/ problem definition, restrictions, AC, input data ├── _process_leftovers/deferred tracker writes replayed at next /autodev (per tracker.mdc)
├── _repo-config.yaml — meta-repo only; produced by monorepo-discover
├── LESSONS.md — ring buffer of last 15 actionable lessons (consumed by autodev/new-task/plan/decompose)
├── 00_problem/ — problem definition, restrictions, AC, input data + expected_results/
├── 00_research/ — intermediate research artifacts ├── 00_research/ — intermediate research artifacts
├── 01_solution/ — solution drafts, tech stack, security analysis ├── 01_solution/ — solution drafts, tech stack, security analysis
├── 02_document/ ├── 02_document/
│ ├── architecture.md │ ├── architecture.md — includes ## Architecture Vision (user-confirmed)
│ ├── glossary.md — user-confirmed terminology
│ ├── system-flows.md │ ├── system-flows.md
│ ├── data_model.md │ ├── data_model.md
│ ├── module-layout.md — per-component Owns/Imports-from/Public API (decompose Step 1.5)
│ ├── architecture_compliance_baseline.md — existing-code baseline scan output
│ ├── risk_mitigations.md │ ├── risk_mitigations.md
│ ├── adr/[NNN]_[decision_slug].md — Architectural Decision Records (plan Step 4.5)
│ ├── components/[##]_[name]/ — description.md + tests.md per component │ ├── components/[##]_[name]/ — description.md + tests.md per component
│ ├── contracts/<component>/<name>.md — versioned public-API contracts
│ ├── common-helpers/ │ ├── common-helpers/
│ ├── tests/ — environment, test data, blackbox, performance, resilience, security, traceability │ ├── tests/ — environment, test-data, blackbox, performance, resilience, security, resource-limit, traceability matrix
│ ├── deployment/ — containerization, CI/CD, environments, observability, procedures
│ ├── ui_mockups/ — HTML+CSS mockups, DESIGN.md (ui-design skill) │ ├── ui_mockups/ — HTML+CSS mockups, DESIGN.md (ui-design skill)
│ ├── diagrams/ │ ├── diagrams/
│ └── FINAL_report.md │ └── FINAL_report.md
@@ -192,12 +255,13 @@ _docs/
│ ├── backlog/ — parked tasks (not scheduled yet) │ ├── backlog/ — parked tasks (not scheduled yet)
│ └── done/ — completed/archived tasks │ └── done/ — completed/archived tasks
├── 02_task_plans/ — per-task research artifacts (new-task skill) ├── 02_task_plans/ — per-task research artifacts (new-task skill)
├── 03_implementation/ — batch reports, implementation_report_*.md ├── 03_implementation/ — batch_*_cycle*.md, implementation_report_*.md, implementation_completeness_cycle*.md, cumulative_review_*.md
│ └── reviews/ — code review reports per batch │ └── reviews/ — code review reports per batch
├── 04_deploy/ — containerization, CI/CD, environments, observability, procedures, scripts ├── 04_deploy/ — containerization, CI/CD, environments, observability, procedures, deploy_scripts.md, reports/
├── 04_refactoring/ — baseline, discovery, analysis, execution, hardening ├── 04_refactoring/NN-<run-name>/ — baseline_metrics, discovery, analysis, test_specs, execution_log, test_sync, verification, FINAL_report (one folder per refactor run)
├── 05_security/dependency scan, SAST, OWASP review, security report ├── 04_release/ release_<version>.md (one per /release invocation), rollback_<version>.md
── 06_metrics/ — retro_[YYYY-MM-DD].md ── 05_security/ — dependency_scan, static_analysis, owasp_review, infrastructure_review, security_report
└── 06_metrics/ — retro_<YYYY-MM-DD>.md, structure_<YYYY-MM-DD>.md, perf_<YYYY-MM-DD>_<run-label>.md, incident_<YYYY-MM-DD>_<skill>.md
``` ```
## Standalone Mode ## Standalone Mode
-105
View File
@@ -1,105 +0,0 @@
---
name: implementer
description: |
Implements a single task from its spec file. Use when implementing tasks from _docs/02_tasks/todo/.
Reads the task spec, analyzes the codebase, implements the feature with tests, and verifies acceptance criteria.
Launched by the /implement skill as a subagent.
---
You are a professional software developer implementing a single task.
## Input
You receive from the `/implement` orchestrator:
- Path to a task spec file (e.g., `_docs/02_tasks/todo/[TRACKER-ID]_[short_name].md`)
- Files OWNED (exclusive write access — only you may modify these)
- Files READ-ONLY (shared interfaces, types — read but do not modify)
- Files FORBIDDEN (other agents' owned files — do not touch)
## Context (progressive loading)
Load context in this order, stopping when you have enough:
1. Read the task spec thoroughly — acceptance criteria, scope, constraints, dependencies
2. Read `_docs/02_tasks/_dependencies_table.md` to understand where this task fits
3. Read project-level context:
- `_docs/00_problem/problem.md`
- `_docs/00_problem/restrictions.md`
- `_docs/01_solution/solution.md`
4. Analyze the specific codebase areas related to your OWNED files and task dependencies
## Boundaries
**Always:**
- Run tests before reporting done
- Follow existing code conventions and patterns
- Implement error handling per the project's strategy
- Stay within the task spec's Scope/Included section
**Ask first:**
- Adding new dependencies or libraries
- Creating files outside your OWNED directories
- Changing shared interfaces that other tasks depend on
**Never:**
- Modify files in the FORBIDDEN list
- Skip writing tests
- Change database schema unless the task spec explicitly requires it
- Commit secrets, API keys, or passwords
- Modify CI/CD configuration unless the task spec explicitly requires it
## Process
1. Read the task spec thoroughly — understand every acceptance criterion
2. Analyze the existing codebase: conventions, patterns, related code, shared interfaces
3. Research best implementation approaches for the tech stack if needed
4. If the task has a dependency on an unimplemented component, create a minimal interface mock
5. Implement the feature following existing code conventions
6. Implement error handling per the project's defined strategy
7. Implement unit tests (use Arrange / Act / Assert section comments in language-appropriate syntax)
8. Implement integration tests — analyze existing tests, add to them or create new
9. Run all tests, fix any failures
10. Verify every acceptance criterion is satisfied — trace each AC with evidence
## Stop Conditions
- If the same fix fails 3+ times with different approaches, stop and report as blocker
- If blocked on an unimplemented dependency, create a minimal interface mock and document it
- If the task scope is unclear, stop and ask rather than assume
## Completion Report
Report using this exact structure:
```
## Implementer Report: [task_name]
**Status**: Done | Blocked | Partial
**Task**: [TRACKER-ID]_[short_name]
### Acceptance Criteria
| AC | Satisfied | Evidence |
|----|-----------|----------|
| AC-1 | Yes/No | [test name or description] |
| AC-2 | Yes/No | [test name or description] |
### Files Modified
- [path] (new/modified)
### Test Results
- Unit: [X/Y] passed
- Integration: [X/Y] passed
### Mocks Created
- [path and reason, or "None"]
### Blockers
- [description, or "None"]
```
## Principles
- Follow SOLID, KISS, DRY
- Dumb code, smart data
- No unnecessary comments or logs (only exceptions)
- Ask if requirements are ambiguous — do not assume
+2
View File
@@ -45,3 +45,5 @@ alwaysApply: true
- Make sure we don't commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task - Make sure we don't commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task
- Never force-push to main or dev branches - Never force-push to main or dev branches
- For new projects, place source code under `src/` (this works for all stacks including .NET). For existing projects, follow the established directory structure. Keep project-level config, tests, and tooling at the repo root. - For new projects, place source code under `src/` (this works for all stacks including .NET). For existing projects, follow the established directory structure. Keep project-level config, tests, and tooling at the repo root.
- **Never run e2e or CI tests in quiet mode (`-q`).** Always use `-v --tb=short` (or equivalent verbosity flags) in all Dockerfiles, compose files, and scripts that invoke pytest. Full test output must be visible so failures can be diagnosed without re-running. This applies to both Tier-1 (Colima) and Tier-2 (Jetson) harnesses.
- **Never substitute real algorithm execution with a data passthrough to make tests pass.** If a test is designed to validate output from a specific pipeline (e.g. VIO estimation, sensor fusion, inference), the implementation MUST actually run that pipeline — not bypass it by returning the input data directly as output. Tests that pass by skipping the component they are supposed to exercise create false confidence and hide the fact that the component is not integrated. If the real integration cannot be completed in this session, STOP and report the blocker to the user explicitly. A failing test with an honest explanation is always better than a passing test that proves nothing.
+6 -5
View File
@@ -19,7 +19,7 @@ globs: [".cursor/**"]
- Kebab-case filenames - Kebab-case filenames
## Agent Files (.cursor/agents/) ## Agent Files (.cursor/agents/)
- Must have `name` and `description` in frontmatter - The `.cursor/agents/` directory is intentionally empty. Per `.cursor/rules/no-subagents.mdc`, the main agent does not delegate to subagents in this workspace. Do not add agent files here without a corresponding rule change.
## Security ## Security
- All `.cursor/` files must be scanned for hidden Unicode before committing (see cursor-security.mdc) - All `.cursor/` files must be scanned for hidden Unicode before committing (see cursor-security.mdc)
@@ -30,10 +30,11 @@ All rules and skills must reference the single source of truth below. Do NOT res
| Concern | Threshold | Enforcement | | Concern | Threshold | Enforcement |
|---------|-----------|-------------| |---------|-----------|-------------|
| Test coverage on business logic | 75% | Aim (warn below); 100% on critical paths | | Test coverage on business logic | 75% | Aim (warn below); critical-path floor enforced separately (next row) |
| Test coverage on critical paths | 90% floor / 100% aim | **90% is the enforcement floor** in CI gates, refactor verification, and release pre-flight. **100% is the aim** — drift below 100% but at-or-above 90% is acceptable; drift below 90% blocks. Critical paths = code paths where a bug would cause data loss, security breach, financial error, or system outage; identify from `acceptance_criteria.md` (must-have) and `_docs/00_problem/security_approach.md`. |
| Test scenario coverage (vs AC + restrictions) | 75% | Blocking in test-spec Phase 1 and Phase 3 | | Test scenario coverage (vs AC + restrictions) | 75% | Blocking in test-spec Phase 1 and Phase 3 |
| CI coverage gate | 75% | Fail build below | | CI coverage gate | 75% overall, 90% critical-path | Fail build below either threshold |
| Lint errors (Critical/High) | 0 | Blocking pre-commit | | Lint errors (Critical/High) | 0 | Blocking pre-commit |
| Code-review auto-fix | Low + Medium (Style/Maint/Perf) + High (Style/Scope) | Critical and Security always escalate | | Code-review auto-fix | Low + Medium (Style/Maint/Perf) + High (Style/Scope) | Critical and Security always escalate. Full categorization: see `.cursor/skills/implement/SKILL.md` § "Auto-Fix eligibility matrix" |
When a skill or rule needs to cite a threshold, link to this table instead of hardcoding a different number. When a skill or rule needs to cite a threshold, link to this table instead of hardcoding a different number. The full auto-fix eligibility matrix (severity × category) lives in `implement/SKILL.md`; cite that file rather than re-tabulating the matrix.
+20
View File
@@ -4,6 +4,26 @@ alwaysApply: true
--- ---
# Agent Meta Rules # Agent Meta Rules
## Real Results, Not Simulated Ones
**The goal is a working product, not the appearance of one.**
- If something does not work, STOP and report it honestly. Do not find a way around it.
- Never produce results by bypassing, faking, stubbing, or passthrough-ing the component that is supposed to produce them. A passing test that skips the real pipeline is worse than a failing test — it hides the truth.
- If the real implementation is not ready, say so. A clear "this is not implemented yet, here is what is missing" is always the right answer.
- Do not measure success by whether the output looks correct. Measure it by whether the output was produced by the real system under test.
- Workarounds that produce the right answer via the wrong path are defects, not solutions.
### When a test reveals missing production code — STOP
This is the specific failure mode that produced the GPS-passthrough scaffold in `runtime_root._run_replay_loop` (May 2026). Generalised so it never repeats:
- If, while implementing or running a test, you discover that the production code path the test is supposed to exercise does not exist (no caller, no integration, no main loop, etc.), **STOP immediately**.
- Do NOT write a stub, passthrough, fake input source, or shortcut output that would make the test go green. Even when the shortcut is "framed as a scaffold" or "marked as TODO in a docstring", it still defeats the test and lies to the next reader.
- Surface the gap to the user as a top-of-turn report: name the missing production component, cite the architecture document that promises it, and ask whether to (a) create a tracker ticket for the missing component and let the test fail honestly until the ticket lands, or (b) explicitly de-scope the test, or (c) something the user names.
- The default outcome is (a): a failing test plus a new tracker ticket. A failing test with an honest reason is information; a passing test that proves nothing is misinformation.
- Doc-comment disclosures (`# this is a scaffold until X is wired`) DO NOT satisfy this rule. The user must be told in the assistant message, not in code.
## Execution Safety ## Execution Safety
- Run the full test suite automatically when you believe code changes are complete (as required by coderule.mdc). For other long-running/resource-heavy/security-risky operations (builds, Docker commands, deployments, performance tests), ask the user first — unless explicitly stated in a skill or the user already asked to do so. - Run the full test suite automatically when you believe code changes are complete (as required by coderule.mdc). For other long-running/resource-heavy/security-risky operations (builds, Docker commands, deployments, performance tests), ask the user first — unless explicitly stated in a skill or the user already asked to do so.
+1 -1
View File
@@ -8,7 +8,7 @@ globs: ["**/*test*", "**/*spec*", "**/*Test*", "**/tests/**", "**/test/**"]
- One assertion per test when practical; name tests descriptively: `MethodName_Scenario_ExpectedResult` - One assertion per test when practical; name tests descriptively: `MethodName_Scenario_ExpectedResult`
- Test boundary conditions, error paths, and happy paths - Test boundary conditions, error paths, and happy paths
- Use mocks only for external dependencies; prefer real implementations for internal code - Use mocks only for external dependencies; prefer real implementations for internal code
- Aim for 75%+ coverage on business logic; 100% on critical paths (code paths where a bug would cause data loss, security breaches, financial errors, or system outages — identify from acceptance criteria marked as must-have or from security_approach.md). The 75% threshold is canonical — see `cursor-meta.mdc` Quality Thresholds. - Aim for 75%+ coverage on business logic; **90% floor / 100% aim on critical paths** (code paths where a bug would cause data loss, security breaches, financial errors, or system outages — identify from acceptance criteria marked as must-have or from `security_approach.md`). 90% is the enforcement floor (blocking in CI / refactor verification / release pre-flight); 100% is the aspirational aim — drift below 100% but at-or-above 90% is acceptable. Both numbers are canonical — see `cursor-meta.mdc` Quality Thresholds.
- Integration tests use real database (Postgres testcontainers or dedicated test DB) - Integration tests use real database (Postgres testcontainers or dedicated test DB)
- Never use Thread Sleep or fixed delays in tests; use polling or async waits - Never use Thread Sleep or fixed delays in tests; use polling or async waits
- Keep test data factories/builders for reusable test setup - Keep test data factories/builders for reusable test setup
+3 -3
View File
@@ -1,9 +1,9 @@
--- ---
name: autodev name: autodev
description: | description: |
Auto-chaining orchestrator that drives the full BUILD-SHIP workflow from problem gathering through deployment. Auto-chaining orchestrator that drives the full BUILDSHIP → EVOLVE workflow from problem gathering through release and retrospective.
Detects current project state from _docs/ folder, resumes from where it left off, and flows through Detects current project state from _docs/ folder, resumes from where it left off, and flows through
problem → research → plan → test specs → decompose → implement → tests → docs sync → deploy without manual skill invocation. problem → research → plan (incl. ADRs) → test specs → decompose → implement → tests → docs sync → deploy → release → retrospective without manual skill invocation.
Maximizes work per conversation by auto-transitioning between skills. Maximizes work per conversation by auto-transitioning between skills.
Trigger phrases: Trigger phrases:
- "autodev", "auto", "start", "continue" - "autodev", "auto", "start", "continue"
@@ -15,7 +15,7 @@ disable-model-invocation: true
# Autodev Orchestrator # Autodev Orchestrator
Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Detects project state from `_docs/`, resumes from where work stopped, and flows through skills automatically. The user invokes `/autodev` once — the engine handles sequencing, transitions, and re-entry. Auto-chaining execution engine that drives the full BUILD → SHIP → EVOLVE workflow. Detects project state from `_docs/`, resumes from where work stopped, and flows through skills automatically. The user invokes `/autodev` once — the engine handles sequencing, transitions, and re-entry.
## File Index ## File Index
+39 -9
View File
@@ -3,7 +3,7 @@
Workflow for projects with an existing codebase. Structurally it has **two phases**: Workflow for projects with an existing codebase. Structurally it has **two phases**:
- **Phase A — One-time baseline setup (Steps 18)**: runs exactly once per codebase. Documents the code, produces test specs, makes the code testable, writes and runs the initial test suite, optionally refactors with that safety net. - **Phase A — One-time baseline setup (Steps 18)**: runs exactly once per codebase. Documents the code, produces test specs, makes the code testable, writes and runs the initial test suite, optionally refactors with that safety net.
- **Phase B — Feature cycle (Steps 917, loops)**: runs once per new feature. After Step 17 (Retrospective), the flow loops back to Step 9 (New Task) with `state.cycle` incremented. - **Phase B — Feature cycle (Steps 917, loops)**: runs once per new feature. After Step 17 (Retrospective), the flow loops back to Step 9 (New Task) with `state.cycle` incremented. Step 16.5 (Release) sits between Deploy (16) and Retrospective (17).
A first-time run executes Phase A then Phase B; every subsequent invocation re-enters Phase B. A first-time run executes Phase A then Phase B; every subsequent invocation re-enters Phase B.
@@ -34,6 +34,7 @@ A first-time run executes Phase A then Phase B; every subsequent invocation re-e
| 14 | Security Audit | security/SKILL.md | Phase 15 (optional) | | 14 | Security Audit | security/SKILL.md | Phase 15 (optional) |
| 15 | Performance Test | test-run/SKILL.md (perf mode) | Steps 15 (optional) | | 15 | Performance Test | test-run/SKILL.md (perf mode) | Steps 15 (optional) |
| 16 | Deploy | deploy/SKILL.md | Step 17 | | 16 | Deploy | deploy/SKILL.md | Step 17 |
| 16.5 | Release | release/SKILL.md | Phase 16 |
| 17 | Retrospective | retrospective/SKILL.md (cycle-end mode) | Steps 14 | | 17 | Retrospective | retrospective/SKILL.md (cycle-end mode) | Steps 14 |
After Step 17, the feature cycle completes and the flow loops back to Step 9 with `state.cycle + 1` — see "Re-Entry After Completion" below. After Step 17, the feature cycle completes and the flow loops back to Step 9 with `state.cycle + 1` — see "Re-Entry After Completion" below.
@@ -287,21 +288,43 @@ State-driven: reached by auto-chain from Step 15 (completed or skipped).
Action: Read and execute `.cursor/skills/deploy/SKILL.md`. Action: Read and execute `.cursor/skills/deploy/SKILL.md`.
After the deploy skill completes successfully, mark Step 16 as `completed` and auto-chain to Step 17 (Retrospective). After the deploy skill completes successfully, mark Step 16 as `completed` and auto-chain to Step 16.5 (Release).
---
**Step 16.5 — Release**
State-driven: reached by auto-chain from Step 16, for the current `state.cycle`.
Action: Read and execute `.cursor/skills/release/SKILL.md`. The release skill owns its own user interaction (Phase 1 pre-release gate, Phase 2 strategy select, Phase 6 escalation). Autodev does NOT add a wrapping A/B/C gate. Pass cycle context (`cycle: state.cycle`).
After the release skill exits, route on the verdict:
- **Verdict `Released`** → mark Step 16.5 `completed` and auto-chain to Step 17 (Retrospective in cycle-end mode).
- **Verdict `Released-with-override`** → mark Step 16.5 `completed` AND auto-chain to Step 17 (Retrospective in **incident mode**).
- **Verdict `Rolled-Back`** → mark Step 16.5 `failed`. Auto-chain to Step 17 (Retrospective in **incident mode**). The cycle does NOT loop back to Step 9.
- **Verdict `Aborted`** → mark Step 16.5 `not_started` (no live-system change) OR `failed` (live-system touched before abort). Surface the abort reason and STOP. Next `/autodev` invocation re-evaluates Phase B from the failed step.
--- ---
**Step 17 — Retrospective** **Step 17 — Retrospective**
State-driven: reached by auto-chain from Step 16, for the current `state.cycle`. State-driven: reached by auto-chain from Step 16.5 with a `Released`, `Released-with-override`, or `Rolled-Back` verdict, for the current `state.cycle`.
Action: Read and execute `.cursor/skills/retrospective/SKILL.md` in **cycle-end mode**. Pass cycle context (`cycle: state.cycle`) so the retro report and LESSONS.md entries record which feature cycle they came from. Action: Read and execute `.cursor/skills/retrospective/SKILL.md`. Mode selection:
After retrospective completes, mark Step 17 as `completed` and enter "Re-Entry After Completion" evaluation. - Step 16.5 verdict `Released` → cycle-end mode
- Step 16.5 verdict `Released-with-override` or `Rolled-Back` → incident mode
Pass cycle context (`cycle: state.cycle`) so the retro report and LESSONS.md entries record which feature cycle they came from.
After retrospective completes:
- If Step 16.5 verdict was `Released` or `Released-with-override` → mark Step 17 as `completed` and enter "Re-Entry After Completion" evaluation (loop back to Step 9 for cycle N+1).
- If Step 16.5 verdict was `Rolled-Back` → mark Step 17 as `completed` but do NOT loop back. Surface the incident retro path and STOP.
--- ---
**Re-Entry After Completion** **Re-Entry After Completion**
State-driven: `state.step == done` OR Step 17 (Retrospective) is completed for `state.cycle`. State-driven: `state.step == done` OR Step 17 (Retrospective) is completed for `state.cycle` AND Step 16.5 verdict was `Released` or `Released-with-override`. A `Rolled-Back` cycle does NOT trigger Re-Entry — the user must explicitly invoke `/autodev` again.
Action: The project completed a full cycle. Print the status banner and automatically loop back to New Task — do NOT ask the user for confirmation: Action: The project completed a full cycle. Print the status banner and automatically loop back to New Task — do NOT ask the user for confirmation:
@@ -316,7 +339,7 @@ Action: The project completed a full cycle. Print the status banner and automati
Set `step: 9`, `status: not_started`, and **increment `cycle`** (`cycle: state.cycle + 1`) in the state file, then auto-chain to Step 9 (New Task). Reset `sub_step` to `phase: 0, name: awaiting-invocation, detail: ""` and `retry_count: 0`. Set `step: 9`, `status: not_started`, and **increment `cycle`** (`cycle: state.cycle + 1`) in the state file, then auto-chain to Step 9 (New Task). Reset `sub_step` to `phase: 0, name: awaiting-invocation, detail: ""` and `retry_count: 0`.
Note: the loop (Steps 9 → 17 → 9) ensures every feature cycle includes: New Task → Implement → Run Tests → Test-Spec Sync → Update Docs → Security → Performance → Deploy → Retrospective. Note: the loop (Steps 9 → 17 → 9) ensures every feature cycle includes: New Task → Implement → Run Tests → Test-Spec Sync → Update Docs → Security → Performance → Deploy → Release → Retrospective. The cycle only completes (and loops back to Step 9) on a `Released` or `Released-with-override` verdict; rolled-back or aborted releases stop the cycle.
## Auto-Chain Rules ## Auto-Chain Rules
@@ -344,8 +367,13 @@ Note: the loop (Steps 9 → 17 → 9) ensures every feature cycle includes: New
| Update Docs (13) | Auto-chain → Security Audit choice (14) | | Update Docs (13) | Auto-chain → Security Audit choice (14) |
| Security Audit (14, done or skipped) | Auto-chain → Performance Test choice (15) | | Security Audit (14, done or skipped) | Auto-chain → Performance Test choice (15) |
| Performance Test (15, done or skipped) | Auto-chain → Deploy (16) | | Performance Test (15, done or skipped) | Auto-chain → Deploy (16) |
| Deploy (16) | Auto-chain → Retrospective (17) | | Deploy (16) | Auto-chain → Release (16.5) |
| Retrospective (17) | **Cycle complete** — loop back to New Task (9) with incremented cycle counter | | Release (16.5, verdict Released) | Auto-chain → Retrospective (17, cycle-end mode) |
| Release (16.5, verdict Released-with-override) | Auto-chain → Retrospective (17, **incident mode**) |
| Release (16.5, verdict Rolled-Back) | Auto-chain → Retrospective (17, **incident mode**); cycle does NOT loop back |
| Release (16.5, verdict Aborted) | STOP — surface abort reason; do not auto-chain |
| Retrospective (17, after Released / Released-with-override) | **Cycle complete** — loop back to New Task (9) with incremented cycle counter |
| Retrospective (17, after Rolled-Back) | Cycle remains incomplete — STOP and surface incident retro path |
## Status Summary — Step List ## Status Summary — Step List
@@ -381,6 +409,7 @@ Flow-specific slot values:
| 14 | Security Audit | — | | 14 | Security Audit | — |
| 15 | Performance Test | — | | 15 | Performance Test | — |
| 16 | Deploy | — | | 16 | Deploy | — |
| 16.5 | Release | `DONE (Released | Released-with-override | Rolled-Back | Aborted)` |
| 17 | Retrospective | — | | 17 | Retrospective | — |
All rows accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 2, 4, 8, 12, 13, 14, 15 additionally accept `SKIPPED`. All rows accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 2, 4, 8, 12, 13, 14, 15 additionally accept `SKIPPED`.
@@ -406,5 +435,6 @@ Row rendering format (renders with a phase separator between Step 8 and Step 9):
Step 14 Security Audit [<state token>] Step 14 Security Audit [<state token>]
Step 15 Performance Test [<state token>] Step 15 Performance Test [<state token>]
Step 16 Deploy [<state token>] Step 16 Deploy [<state token>]
Step 16.5 Release [<state token>]
Step 17 Retrospective [<state token>] Step 17 Retrospective [<state token>]
``` ```
+35 -7
View File
@@ -1,6 +1,6 @@
# Greenfield Workflow # Greenfield Workflow
Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → UI Design (if applicable) → Test Spec → Decompose → Implement + Product Completeness Gate → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → Test-Spec Sync → Update Docs → Security Audit (optional) → Performance Test (optional) → Deploy → Retrospective. Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → UI Design (if applicable) → Test Spec → Decompose → Implement + Product Completeness Gate → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → Test-Spec Sync → Update Docs → Security Audit (optional) → Performance Test (optional) → Deploy → Release → Retrospective.
## Step Reference Table ## Step Reference Table
@@ -8,7 +8,7 @@ Workflow for new projects built from scratch. Flows linearly: Problem → Resear
|------|------|-----------|-------------------| |------|------|-----------|-------------------|
| 1 | Problem | problem/SKILL.md | Phase 14 | | 1 | Problem | problem/SKILL.md | Phase 14 |
| 2 | Research | research/SKILL.md | Mode A: Phase 14 · Mode B: Step 08 | | 2 | Research | research/SKILL.md | Mode A: Phase 14 · Mode B: Step 08 |
| 3 | Plan | plan/SKILL.md | Step 16 + Final | | 3 | Plan | plan/SKILL.md | Step 1, 2, 3, 4, 4.5 (ADR Capture), 5, 6 + Final |
| 4 | UI Design | ui-design/SKILL.md | Phase 08 (conditional — UI projects only) | | 4 | UI Design | ui-design/SKILL.md | Phase 08 (conditional — UI projects only) |
| 5 | Test Spec | test-spec/SKILL.md | Phases 14 | | 5 | Test Spec | test-spec/SKILL.md | Phases 14 |
| 6 | Decompose | decompose/SKILL.md (implementation task decomposition) | Step 1 + Step 1.5 + Step 2 + Step 4 | | 6 | Decompose | decompose/SKILL.md (implementation task decomposition) | Step 1 + Step 1.5 + Step 2 + Step 4 |
@@ -22,6 +22,7 @@ Workflow for new projects built from scratch. Flows linearly: Problem → Resear
| 14 | Security Audit | security/SKILL.md | Phase 15 (optional) | | 14 | Security Audit | security/SKILL.md | Phase 15 (optional) |
| 15 | Performance Test | test-run/SKILL.md (perf mode) | Steps 15 (optional) | | 15 | Performance Test | test-run/SKILL.md (perf mode) | Steps 15 (optional) |
| 16 | Deploy | deploy/SKILL.md | Step 17 | | 16 | Deploy | deploy/SKILL.md | Step 17 |
| 16.5 | Release | release/SKILL.md | Phase 16 |
| 17 | Retrospective | retrospective/SKILL.md (cycle-end mode) | Steps 14 | | 17 | Retrospective | retrospective/SKILL.md (cycle-end mode) | Steps 14 |
## Detection Rules ## Detection Rules
@@ -284,21 +285,42 @@ State-driven: reached by auto-chain from Step 15 (after Step 15 is completed or
Action: Read and execute `.cursor/skills/deploy/SKILL.md`. Action: Read and execute `.cursor/skills/deploy/SKILL.md`.
After the deploy skill completes successfully, mark Step 16 as `completed` and auto-chain to Step 17 (Retrospective). After the deploy skill completes successfully, mark Step 16 as `completed` and auto-chain to Step 16.5 (Release).
---
**Step 16.5 — Release**
State-driven: reached by auto-chain from Step 16.
Action: Read and execute `.cursor/skills/release/SKILL.md`. The release skill is responsible for selecting the target environment, executing the deploy artifacts, smoke-testing, watching the rollout, and producing a definitive verdict (`Released`, `Released-with-override`, `Rolled-Back`, or `Aborted`).
The release skill has its own internal BLOCKING gates (Phase 1 pre-release gate, Phase 2 strategy select, Phase 6 user confirmation when soft regression escalates). Autodev does NOT add a wrapping A/B/C gate — the release skill owns its own user interaction.
After the release skill exits:
- **Verdict `Released`** → mark Step 16.5 `completed` and auto-chain to Step 17 (Retrospective in cycle-end mode).
- **Verdict `Released-with-override`** → mark Step 16.5 `completed` AND auto-chain to Step 17 (Retrospective in **incident mode**) — the override is itself an incident the retrospective must analyze.
- **Verdict `Rolled-Back`** → mark Step 16.5 `failed`. Auto-chain to Step 17 (Retrospective in **incident mode**). Do NOT consider the project "Done" — the user owns the next move (re-run /implement on a fix branch, re-run /deploy, re-run /release).
- **Verdict `Aborted`** → mark Step 16.5 `not_started` (the release was never started) OR `failed` if the abort came after Phase 3 had already touched the live system. Surface the abort reason and STOP — do not auto-chain to retrospective.
--- ---
**Step 17 — Retrospective** **Step 17 — Retrospective**
State-driven: reached by auto-chain from Step 16. State-driven: reached by auto-chain from Step 16.5 with a `Released` or `Released-with-override` verdict, OR from a `Rolled-Back` verdict (in incident mode).
Action: Read and execute `.cursor/skills/retrospective/SKILL.md` in **cycle-end mode**. This closes the cycle's feedback loop by folding metrics into `_docs/06_metrics/retro_<date>.md` and appending the top-3 lessons to `_docs/LESSONS.md`. Action: Read and execute `.cursor/skills/retrospective/SKILL.md`. Mode selection:
- Step 16.5 verdict `Released` → cycle-end mode
- Step 16.5 verdict `Released-with-override` or `Rolled-Back` → incident mode
The retrospective closes the cycle's feedback loop by folding metrics into `_docs/06_metrics/retro_<date>.md` (or `incident_<date>_release.md` in incident mode) and appending the top-3 lessons to `_docs/LESSONS.md`.
After retrospective completes, mark Step 17 as `completed` and enter "Done" evaluation. After retrospective completes, mark Step 17 as `completed` and enter "Done" evaluation.
--- ---
**Done** **Done**
State-driven: reached by auto-chain from Step 17. (Sanity check: `_docs/04_deploy/` should contain all expected artifacts — containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md, deploy_scripts.md.) State-driven: reached by auto-chain from Step 17. (Sanity check: `_docs/04_deploy/` should contain all expected artifacts — containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md, deploy_scripts.md. `_docs/04_release/` should contain at least one `release_<version>_<env>_<timestamp>.md` with a `Released` verdict — or the user has explicitly chosen to handle release outside autodev.)
Action: Report project completion with summary. Then **rewrite the state file** so the next `/autodev` invocation enters the feature-cycle loop in the existing-code flow: Action: Report project completion with summary. Then **rewrite the state file** so the next `/autodev` invocation enters the feature-cycle loop in the existing-code flow:
@@ -337,7 +359,11 @@ On the next invocation, Flow Resolution rule 1 reads `flow: existing-code` and r
| Update Docs (13, done or skipped) | Auto-chain → Security Audit choice (14) | | Update Docs (13, done or skipped) | Auto-chain → Security Audit choice (14) |
| Security Audit (14, done or skipped) | Auto-chain → Performance Test choice (15) | | Security Audit (14, done or skipped) | Auto-chain → Performance Test choice (15) |
| Performance Test (15, done or skipped) | Auto-chain → Deploy (16) | | Performance Test (15, done or skipped) | Auto-chain → Deploy (16) |
| Deploy (16) | Auto-chain → Retrospective (17) | | Deploy (16) | Auto-chain → Release (16.5) |
| Release (16.5, verdict Released) | Auto-chain → Retrospective (17, cycle-end mode) |
| Release (16.5, verdict Released-with-override) | Auto-chain → Retrospective (17, **incident mode**) |
| Release (16.5, verdict Rolled-Back) | Auto-chain → Retrospective (17, **incident mode**); do NOT enter Done |
| Release (16.5, verdict Aborted) | STOP — surface abort reason; do not auto-chain |
| Retrospective (17) | Report completion; rewrite state to existing-code flow, step 9 | | Retrospective (17) | Report completion; rewrite state to existing-code flow, step 9 |
## Status Summary — Step List ## Status Summary — Step List
@@ -362,6 +388,7 @@ Flow name: `greenfield`. Render using the banner template in `protocols.md` →
| 14 | Security Audit | — | | 14 | Security Audit | — |
| 15 | Performance Test | — | | 15 | Performance Test | — |
| 16 | Deploy | — | | 16 | Deploy | — |
| 16.5 | Release | `DONE (Released | Released-with-override | Rolled-Back | Aborted)` |
| 17 | Retrospective | — | | 17 | Retrospective | — |
All rows also accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 4, 12, 13, 14, 15 additionally accept `SKIPPED`. All rows also accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 4, 12, 13, 14, 15 additionally accept `SKIPPED`.
@@ -385,5 +412,6 @@ Row rendering format (step-number column is right-padded to 2 characters for ali
Step 14 Security Audit [<state token>] Step 14 Security Audit [<state token>]
Step 15 Performance Test [<state token>] Step 15 Performance Test [<state token>]
Step 16 Deploy [<state token>] Step 16 Deploy [<state token>]
Step 16.5 Release [<state token>]
Step 17 Retrospective [<state token>] Step 17 Retrospective [<state token>]
``` ```
+1 -1
View File
@@ -13,7 +13,7 @@ The autodev persists its position to `_docs/_autodev_state.md`. This is a lightw
## Current Step ## Current Step
flow: [greenfield | existing-code | meta-repo] flow: [greenfield | existing-code | meta-repo]
step: [1-17 for greenfield, 1-17 for existing-code, 1-6 for meta-repo (incl. fractional 2.5 and 3.5), or "done"] step: [1-17 for greenfield (incl. fractional 16.5), 1-17 for existing-code (incl. fractional 16.5), 1-6 for meta-repo (incl. fractional 2.5 and 3.5), or "done"]
name: [step name from the active flow's Step Reference Table] name: [step name from the active flow's Step Reference Table]
status: [not_started / in_progress / completed / skipped / failed] status: [not_started / in_progress / completed / skipped / failed]
sub_step: sub_step:
+10 -4
View File
@@ -2,7 +2,7 @@
name: code-review name: code-review
description: | description: |
Multi-phase code review against task specs with structured findings output. Multi-phase code review against task specs with structured findings output.
6-phase workflow: context loading, spec compliance, code quality, security quick-scan, performance scan, cross-task consistency. 7-phase workflow: context loading, spec compliance, code quality, security quick-scan, performance scan, cross-task consistency, architecture compliance.
Produces a structured report with severity-ranked findings and a PASS/FAIL/PASS_WITH_WARNINGS verdict. Produces a structured report with severity-ranked findings and a PASS/FAIL/PASS_WITH_WARNINGS verdict.
Invoked by /implement skill after each batch, or manually. Invoked by /implement skill after each batch, or manually.
Trigger phrases: Trigger phrases:
@@ -106,11 +106,12 @@ When multiple tasks were implemented in the same batch:
## Phase 7: Architecture Compliance ## Phase 7: Architecture Compliance
Verify the implemented code respects the architecture documented in `_docs/02_document/architecture.md` and the component boundaries declared in `_docs/02_document/module-layout.md`. Verify the implemented code respects the architecture documented in `_docs/02_document/architecture.md`, the component boundaries declared in `_docs/02_document/module-layout.md`, and the **accepted Architectural Decision Records** under `_docs/02_document/adr/`.
**Inputs**: **Inputs**:
- `_docs/02_document/architecture.md` — layering, allowed dependencies, patterns - `_docs/02_document/architecture.md` — layering, allowed dependencies, patterns
- `_docs/02_document/module-layout.md` — per-component directories, Public API surface, `Imports from` lists, Allowed Dependencies table - `_docs/02_document/module-layout.md` — per-component directories, Public API surface, `Imports from` lists, Allowed Dependencies table
- `_docs/02_document/adr/` — every `Status: Accepted` ADR is an enforceable structural rule. `Status: Proposed`, `Status: Deprecated`, and `Status: Superseded` ADRs are NOT enforced (Proposed = not yet ratified; Deprecated/Superseded = a later ADR overturned it). If the directory does not exist or has only the index file, ADRs are skipped — log this skip in the report so the absence is visible.
- The cumulative list of changed files (for per-batch invocation) or the full codebase (for baseline invocation) - The cumulative list of changed files (for per-batch invocation) or the full codebase (for baseline invocation)
**Checks**: **Checks**:
@@ -125,6 +126,11 @@ Verify the implemented code respects the architecture documented in `_docs/02_do
5. **Cross-cutting concerns not locally re-implemented**: if a file under a component directory contains logic that should live in `shared/<concern>/` (e.g., custom logging setup, config loader, error envelope), flag it. Severity: Medium. Category: Architecture. 5. **Cross-cutting concerns not locally re-implemented**: if a file under a component directory contains logic that should live in `shared/<concern>/` (e.g., custom logging setup, config loader, error envelope), flag it. Severity: Medium. Category: Architecture.
6. **ADR compliance**: for each `Status: Accepted` ADR, confirm the changed code does not contradict the ADR's `Decision`. Two failure modes are flagged:
- **ADR-Violation**: the changed code does the opposite of an Accepted ADR's `Decision`. Example: ADR-002 says "We will use Postgres for transactional data" and the changed code introduces a SQLite dependency for a transactional path. Severity: **Critical**. Category: Architecture. The finding cites the ADR by `NNN_<slug>` and the offending file/line.
- **ADR-Drift**: the changed code does something the ADR did not anticipate AND that materially affects the ADR's `Consequences` (positive or negative). Example: ADR-004 says "Event-driven cross-component comms" and a changed file introduces a new synchronous HTTP call between two components. Severity: **High**. Category: Architecture. The finding either proposes "Update ADR-NNN to acknowledge the new pattern" or "Remove the drift to align with ADR-NNN" — never silently accepts.
The check skips ADRs that are explicitly out of scope of the changed batch (e.g., ADR-001 about deployment pipeline when the batch only touches business-logic files). Use the ADR's `Evidence` section to determine scope: if no Evidence path overlaps with any changed file, skip the ADR for this batch.
**Detection approach (per language)**: **Detection approach (per language)**:
- Python: parse `import` / `from ... import` statements; optionally AST with `ast` module for reliable symbol resolution. - Python: parse `import` / `from ... import` statements; optionally AST with `ast` module for reliable symbol resolution.
@@ -197,7 +203,7 @@ Produce a structured report with findings deduplicated and sorted by severity:
Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope, Architecture Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope, Architecture
`Architecture` findings come from Phase 7. They indicate layering violations, Public API bypasses, new cyclic dependencies, duplicate symbols, or cross-cutting concerns re-implemented locally. `Architecture` findings come from Phase 7. They indicate layering violations, Public API bypasses, new cyclic dependencies, duplicate symbols, cross-cutting concerns re-implemented locally, **ADR-Violation** (changed code contradicts an `Accepted` ADR's Decision — Critical), or **ADR-Drift** (changed code introduces a pattern that materially affects an `Accepted` ADR's Consequences without superseding it — High).
## Verdict Logic ## Verdict Logic
@@ -232,7 +238,7 @@ The implement skill invokes code-review by:
1. Reading `.cursor/skills/code-review/SKILL.md` 1. Reading `.cursor/skills/code-review/SKILL.md`
2. Providing the inputs above as context (read the files, pass content to the review phases) 2. Providing the inputs above as context (read the files, pass content to the review phases)
3. Executing all 6 phases sequentially 3. Executing all 7 phases sequentially
4. Consuming the verdict from the output 4. Consuming the verdict from the output
### Outputs (returned to the implement skill) ### Outputs (returned to the implement skill)
+17
View File
@@ -65,6 +65,7 @@ Announce the selected entrypoint and resolved paths to the user before proceedin
| 1 Bootstrap Structure | `steps/01_bootstrap-structure.md` | ✓ | — | — | | 1 Bootstrap Structure | `steps/01_bootstrap-structure.md` | ✓ | — | — |
| 1t Test Infrastructure | `steps/01t_test-infrastructure.md` | — | — | ✓ | | 1t Test Infrastructure | `steps/01t_test-infrastructure.md` | — | — | ✓ |
| 1.5 Module Layout | `steps/01-5_module-layout.md` | ✓ | — | — | | 1.5 Module Layout | `steps/01-5_module-layout.md` | ✓ | — | — |
| 1.7 System-Pipeline Tasks | `steps/01-7_system-pipeline-tasks.md` | ✓ | — | — |
| 2 Task Decomposition | `steps/02_task-decomposition.md` | ✓ | ✓ | — | | 2 Task Decomposition | `steps/02_task-decomposition.md` | ✓ | ✓ | — |
| 3 Blackbox Test Tasks | `steps/03_blackbox-test-decomposition.md` | — | — | ✓ | | 3 Blackbox Test Tasks | `steps/03_blackbox-test-decomposition.md` | — | — | ✓ |
| 4 Cross-Verification | `steps/04_cross-verification.md` | ✓ | — | ✓ | | 4 Cross-Verification | `steps/04_cross-verification.md` | ✓ | — | ✓ |
@@ -191,6 +192,20 @@ Read and follow `steps/01-5_module-layout.md`.
--- ---
### Step 1.7: System-Pipeline Tasks (implementation mode only)
Read and follow `steps/01-7_system-pipeline-tasks.md`.
This step exists because per-component task decomposition (Step 2)
produces one task per component but NEVER produces a task whose
deliverable is "the production code that drives the end-to-end
pipeline by calling each component in order against real inputs".
The architecture document describes the loop; nobody owns it. The
GPS-passthrough incident (May 2026) is the canonical failure this
step prevents.
---
### Step 2: Task Decomposition (implementation and single component modes) ### Step 2: Task Decomposition (implementation and single component modes)
Read and follow `steps/02_task-decomposition.md`. Read and follow `steps/02_task-decomposition.md`.
@@ -243,6 +258,8 @@ Read and follow `steps/04_cross-verification.md`.
│ [BLOCKING: user confirms structure] │ │ [BLOCKING: user confirms structure] │
│ 1.5 Module Layout → steps/01-5_module-layout.md │ │ 1.5 Module Layout → steps/01-5_module-layout.md │
│ [BLOCKING: user confirms layout] │ │ [BLOCKING: user confirms layout] │
│ 1.7 System-Pipeline → steps/01-7_system-pipeline-tasks.md │
│ [BLOCKING: user confirms pipeline owners] │
│ 2. Component Tasks → steps/02_task-decomposition.md │ │ 2. Component Tasks → steps/02_task-decomposition.md │
│ 4. Cross-Verification → steps/04_cross-verification.md │ │ 4. Cross-Verification → steps/04_cross-verification.md │
│ [BLOCKING: user confirms dependencies] │ │ [BLOCKING: user confirms dependencies] │
@@ -16,7 +16,8 @@
3. Each component owns ONE top-level directory. Shared code goes under `<root>/shared/` (or language equivalent). 3. Each component owns ONE top-level directory. Shared code goes under `<root>/shared/` (or language equivalent).
4. Public API surface = files in the layout's `public:` list for each component; everything else is internal and MUST NOT be imported from other components. 4. Public API surface = files in the layout's `public:` list for each component; everything else is internal and MUST NOT be imported from other components.
5. Cross-cutting concerns (logging, error handling, config, telemetry, auth middleware, feature flags, i18n) each get ONE entry under Shared / Cross-Cutting; per-component tasks consume them (see Step 2 cross-cutting rule). 5. Cross-cutting concerns (logging, error handling, config, telemetry, auth middleware, feature flags, i18n) each get ONE entry under Shared / Cross-Cutting; per-component tasks consume them (see Step 2 cross-cutting rule).
6. Write `_docs/02_document/module-layout.md` using `templates/module-layout.md` format. 6. **ADR cross-check**: if `_docs/02_document/adr/` exists, read every `Status: Accepted` ADR. For each, confirm the proposed module layout does not contradict the ADR's `Decision` (e.g., an ADR mandating an event-bus boundary between two components must show up as a `Imports from` exclusion in the layout; an ADR locking a layering style must show up in the Layering table). If an ADR conflicts with the language-conventional layout from step 2, the ADR wins — record the conflict in a `## ADR-driven exceptions to the conventional layout` section of `module-layout.md` with `See ADR NNN_<slug>` references. If the ADR conflict is irreconcilable (the ADR demands something the language genuinely cannot express), STOP and ask the user A/B/C: (A) update the ADR via plan Step 4.5 supersede flow, (B) accept a layered exception with documented rationale, (C) re-open architecture.
7. Write `_docs/02_document/module-layout.md` using `templates/module-layout.md` format. Each Per-Component Mapping entry that is governed by an ADR includes a trailing `> See ADR NNN_<slug>` line.
## Self-verification ## Self-verification
@@ -26,6 +27,8 @@
- [ ] No component's `Imports from` list points at a higher layer - [ ] No component's `Imports from` list points at a higher layer
- [ ] Paths follow the detected language's convention - [ ] Paths follow the detected language's convention
- [ ] No two components own overlapping paths - [ ] No two components own overlapping paths
- [ ] If `_docs/02_document/adr/` exists with Accepted ADRs, every layout decision that an ADR governs has a trailing `> See ADR NNN_<slug>` reference
- [ ] No Accepted ADR is contradicted by the layout without a documented exception
## Save action ## Save action
@@ -0,0 +1,72 @@
# Step 1.7: System-Pipeline Tasks (implementation mode only)
**Role**: Professional software architect, integration-focused.
**Goal**: For every end-to-end pipeline named in `_docs/02_document/architecture.md` and `_docs/02_document/system-flows.md`, ensure there is exactly ONE explicit task that owns the production code that drives that pipeline against real inputs. This step prevents the failure mode where every individual component is "complete" but no production code wires them together (May 2026 GPS-passthrough incident — see `meta-rule.mdc` "When a test reveals missing production code").
**Constraints**:
- This step produces *integration* tasks, not per-component tasks. Per-component tasks come from Step 2.
- An integration task's owner is typically the composition root, runtime root, main loop, or whichever component the module layout (Step 1.5) names as the "system spine". It is NEVER a leaf component.
- Each integration task must be sized at 5 points or fewer. If the pipeline is too large for one task, split it into per-stage integration tasks (e.g. "wire ingress → C1", then "wire C1 → C5") rather than one giant task.
## Inputs
| File | Purpose |
|------|---------|
| `_docs/02_document/architecture.md` | Source of named end-to-end pipelines and their component sequences |
| `_docs/02_document/system-flows.md` | Source of operational flows (per-frame loop, request lifecycle, batch job, etc.) |
| `_docs/02_document/module-layout.md` | Produced by Step 1.5. Names the "system spine" component(s) — typically `runtime_root`, `app`, `main`, `composition`, or equivalent. |
| `_docs/02_document/components/*/description.md` | Per-component contracts so you can tell which side of a seam each method lives on |
## Steps
1. **Enumerate end-to-end pipelines.** Read `architecture.md` and `system-flows.md`. For each named pipeline / flow that spans 2+ components, record:
- The pipeline name (e.g. "per-frame nav loop", "tile-cache build", "operator pre-flight verification").
- The ordered sequence of components it touches (e.g. `frame_source → c1_vio → c2_vpr → ... → c5_state → replay_sink`).
- The trigger (per-frame, per-request, scheduled, manual).
- The output (what the pipeline emits and to whom).
2. **For each pipeline, locate the owner.** Use `module-layout.md` to find the component that owns the orchestration (the "spine"). If `module-layout.md` does not name one, STOP and ASK the user which component owns the pipeline. Do NOT silently default to the bootstrap structure task — bootstrap is about project skeleton, not behavior.
3. **Check whether the pipeline is already covered by an existing task spec or by the bootstrap-structure task.** A pipeline is "covered" only if:
- A task spec's `Outcome` or `Acceptance Criteria` section explicitly names "drives the {pipeline_name} end-to-end against real production components", AND
- That task's owned files include the orchestration code (typically the spine component's main loop / entrypoint).
4. **For every uncovered pipeline, create a system-integration task spec** in `_docs/02_tasks/todo/` using `.cursor/skills/decompose/templates/task.md`:
- **Component**: the spine component from step 2 (e.g. `runtime_root`).
- **Outcome**: the production callsite that drives the pipeline exists and runs end-to-end on real inputs.
- **Scope / Included**: the orchestration code (loop body, dispatcher, scheduler, entrypoint); explicit list of every component it must call in order; the data type at each seam.
- **Acceptance Criteria** (write each as testable):
- At least one production caller of every component method in the pipeline can be found by grep — name the methods explicitly.
- The orchestration runs against the real production component instances (NOT mocks, NOT a passthrough that bypasses them).
- At least one integration test exercises the orchestration end-to-end against real inputs.
- **Dependencies**: every per-component task whose component appears in the pipeline.
- **Complexity points**: ≤5; split the pipeline if it doesn't fit.
- **Tracker**: create a ticket immediately (per `decompose/SKILL.md` "Tracker inline" principle); rename the file to `[TRACKER-ID]_pipeline_<name>.md`.
5. **Mark the integration task as `Dependencies` for the integration test task.** If `tests-only` decomposition has already produced an e2e/integration test task for this pipeline, append the new integration task to its `Dependencies` field so the test cannot be "made green" before the integration ships.
## Anti-patterns this step explicitly blocks
- **"compose_root returns a wired runtime"** prose interpreted as "the loop exists". Composition assembles the graph; it is NOT the loop. The loop is the code that pulls inputs, drives each node, and emits outputs. If grep finds zero callers of the leaf components, the loop does not exist regardless of what compose_root does.
- **Treating the bootstrap-structure task as the home of the main loop.** Bootstrap is project skeleton (package layout, CLI scaffold, build files). It is NOT the main loop. Main loop is its own task.
- **Per-component tasks claiming integration scope.** A C1 VIO task's deliverable is "C1 works in isolation against unit tests". A C1 task's acceptance criteria MUST NOT include "C1 is wired into the runtime" — that's the integration task's job.
## Self-verification
- [ ] Every pipeline named in `architecture.md` / `system-flows.md` is listed in your enumeration.
- [ ] Every enumerated pipeline either (a) has an existing covered task, or (b) has a new integration task in `todo/`.
- [ ] No integration task exceeds 5 complexity points.
- [ ] Every integration task names every component in the pipeline as a `Dependencies` entry.
- [ ] No integration task is owned by a leaf component — every owner is named in `module-layout.md` as a spine / orchestrator.
- [ ] Every integration task has a tracker ticket created and the filename renamed to `[TRACKER-ID]_pipeline_<name>.md`.
## Save action
Write the new integration task files into `_docs/02_tasks/todo/`. They will be picked up by Step 2 (Task Decomposition's dependency-table writer) and by Step 4 (Cross-Verification).
## Blocking
**BLOCKING**: Present the pipeline enumeration + the list of new integration tasks to the user. Do NOT proceed to Step 2 until the user confirms:
- The enumeration matches what they expect from the architecture documents.
- Every uncovered pipeline now has an integration task.
- The chosen spine owners are correct.
If the user identifies a pipeline you missed, add it before proceeding. If the user names a different spine owner, update the task and re-run self-verification.
@@ -29,7 +29,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`.
### Test ### Test
- Unit tests: [framework and command] - Unit tests: [framework and command]
- Blackbox tests: [framework and command, uses docker-compose.test.yml] - Blackbox tests: [framework and command, uses docker-compose.test.yml]
- Coverage threshold: 75% overall, 90% critical paths - Coverage threshold: 75% overall, 90% critical-path floor (100% aim) — per `.cursor/rules/cursor-meta.mdc` Quality Thresholds
- Coverage report published as pipeline artifact - Coverage report published as pipeline artifact
### Security ### Security
+30 -3
View File
@@ -291,19 +291,46 @@ For each completed product task:
- **BLOCKED**: production code exists but cannot be fully verified due to external hardware/data/license/runtime prerequisites; the blocker is explicit and tests report blocked/skipped with reason. - **BLOCKED**: production code exists but cannot be fully verified due to external hardware/data/license/runtime prerequisites; the blocker is explicit and tests report blocked/skipped with reason.
- **FAIL**: promised production behavior is missing, only scaffolded, or only represented in tests/reports. - **FAIL**: promised production behavior is missing, only scaffolded, or only represented in tests/reports.
#### 15.b System-Pipeline Check (runs ONCE per gate invocation, after per-task classification)
The per-task classification above (steps 18) operates on `_docs/02_tasks/done/`. It catches missing component-local behavior but it CANNOT catch a missing *integration* — there is no task to fail if no task ever owned the integration in the first place. The GPS-passthrough incident (May 2026) escaped this gate because every per-component task in `done/` was honestly complete; the missing piece was the cross-component loop, which had no owning task.
The system-pipeline check fixes that by walking the architecture documents directly, independent of `done/`.
**Inputs**:
- `_docs/02_document/architecture.md`
- `_docs/02_document/system-flows.md`
- Full source tree under the project's production directory (e.g. `src/`).
**Procedure**:
1. **Enumerate end-to-end pipelines.** Read `architecture.md` and `system-flows.md`. For each named pipeline / operational flow that spans 2+ components, record the ordered component sequence and the trigger (per-frame, per-request, scheduled, manual).
2. **Grep for production callers of each seam method.** For each adjacent pair `A → B` in a pipeline, find a production source file (not under `tests/`, not under a `bench/` package, not a doc) that calls `A`'s public output method AND passes the result into `B`'s public input method.
3. **Classify the pipeline**:
- **WIRED**: a production caller exists and the chain is complete from the first to the last component in the sequence.
- **PARTIALLY WIRED**: some adjacent pairs have callers but at least one seam is missing.
- **NOT WIRED**: no production code calls the pipeline's components in order. Bench tools, unit tests, and microbenchmarks do NOT count as "wiring".
4. **Distinguish "wired but stubbed" from "wired with real components"**: a caller that invokes a passthrough / GPS-from-tlog / mock-output-generator instead of the real component is `NOT WIRED` for the purposes of this gate. The seam exists in the source file but the production behavior is faked. Grep for the same scaffold markers Step 15 already enumerates (`placeholder`, `stub`, `passthrough`, `scaffold until`, etc.) inside the caller's body.
5. **Output**: append a `## System Pipeline Audit` section to `_docs/03_implementation/implementation_completeness_cycle[N]_report.md`. Per-pipeline row: name, sequence, classification, evidence file (the caller, or "NONE FOUND"), remediation suggestion if not `WIRED`.
**Pipeline classification feeds the combined gate below.** Any pipeline that is not `WIRED` is a system-level FAIL that the per-task gate cannot rescue.
**Why this is here and not only in decompose**: decompose Step 1.7 creates integration tasks up front; this check verifies the integration tasks actually got implemented (or, if they were never created, surfaces the gap before the cycle closes). The two layers are belt-and-suspenders by design.
Save the audit to `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` with: Save the audit to `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` with:
- Per-task classification - Per-task classification
- Evidence files/symbols checked - Evidence files/symbols checked
- Any unresolved scaffold/native placeholders - Any unresolved scaffold/native placeholders
- Any named promised technologies not integrated - Any named promised technologies not integrated
- **System Pipeline Audit table** (per pipeline: name, sequence, WIRED / PARTIALLY WIRED / NOT WIRED, evidence file, remediation suggestion)
- Required remediation task suggestions, each sized to 5 points or less - Required remediation task suggestions, each sized to 5 points or less
Gate: Gate:
- If every product task is `PASS` or `BLOCKED` with explicit prerequisite evidence, continue to Final Test Run. - If every product task is `PASS` or `BLOCKED` with explicit prerequisite evidence, AND every enumerated pipeline is `WIRED`, continue to Final Test Run.
- If any product task is `FAIL`, STOP. Do not write the final product implementation report and do not proceed to any downstream autodev step. Completed original task files remain in `done/`; the missing work is represented by remediation tasks. Present a Choose block: - If any product task is `FAIL` OR any pipeline is `PARTIALLY WIRED` / `NOT WIRED`, STOP. Do not write the final product implementation report and do not proceed to any downstream autodev step. Completed original task files remain in `done/`; the missing work is represented by remediation tasks. Present a Choose block:
- A) Create remediation tasks now and return to implementation - A) Create remediation tasks now and return to implementation. (For pipeline FAILs the remediation task is a NEW integration task owned by the spine component per `_docs/02_document/module-layout.md`; it is NOT a test task and NOT a doc task; its deliverable is production code that drives the pipeline against real components.)
- B) Mark the missing behavior explicitly out of scope in task/docs, then re-run this gate - B) Mark the missing behavior explicitly out of scope in task/docs, then re-run this gate
- C) Abort for manual correction - C) Abort for manual correction
- Recommendation must normally be A unless the user deliberately accepts reduced scope. - Recommendation must normally be A unless the user deliberately accepts reduced scope.
+56 -10
View File
@@ -84,29 +84,66 @@ Assess the change along these dimensions:
- **Novelty**: does it involve libraries, protocols, or patterns not already in the codebase? - **Novelty**: does it involve libraries, protocols, or patterns not already in the codebase?
- **Risk**: could it break existing functionality or require architectural changes? - **Risk**: could it break existing functionality or require architectural changes?
Classification: ### 2a. Complexity-Points Estimate
Project policy (per the workspace user-rule on ADO points): aim for tasks at 23 points (rarely 5). Tasks at 8 points are high risk; tasks at 13 are too complex and MUST be broken down. The new-task skill enforces this here, before producing a single-file task spec.
Map the Scope/Novelty/Risk profile to a points estimate using this table:
| Profile | Points | Examples |
|---------|--------|----------|
| All three low | **12** | One-line config change; trivial CRUD field addition |
| Two low + one medium | **3** | Localized refactor; add one well-understood endpoint |
| One low + two medium, OR all medium | **5** | New small feature touching 23 components; integration with a known library |
| Any high, OR two medium + one high | **8** | Cross-cutting concern across 4+ components; integration with an unfamiliar protocol; significant architectural change |
| Two or three high | **13** | New subsystem; unfamiliar tech across the stack; multiple unknown unknowns |
If a relevant LESSONS.md entry biases the estimate (e.g., "auth-related changes historically take 2× estimate"), apply the multiplier and round up to the next discrete point on the scale (1, 2, 3, 5, 8, 13).
### 2b. Routing by Complexity
| Estimate | Default routing | Override path |
|----------|-----------------|---------------|
| **15** | Continue this skill at Step 3 (Research) or Step 4 (Codebase Analysis) — see classification below | — |
| **8** | **STOP this skill and recommend handoff to `/decompose @<feature_description>`** (single-component decompose mode if the affected scope fits inside one component, default mode if it does not). The user may override and proceed in `/new-task`, but the override must be explicitly chosen. | C) Proceed in /new-task anyway with the user's acknowledgement that the resulting task is high-risk and may need to be re-decomposed mid-implementation |
| **13** | **STOP this skill — auto-handoff is mandatory.** A 13-point feature cannot be a single task spec. Invoke `/decompose @<feature_description>` (default mode) before writing any task file. Surface the handoff to the user with no override path; this is a hard policy gate. | None — must decompose |
For the auto-handoff path:
1. Render a one-paragraph description of the feature suitable to feed `/decompose` (combine Step 1's verbatim user description with the complexity-points reasoning).
2. Save it to `_docs/02_task_plans/<feature_slug>/feature-description.md` so the decompose skill has a stable input file.
3. Either (a) directly auto-chain into `.cursor/skills/decompose/SKILL.md` in default mode with this file as input, or (b) report the handoff to the user along with the exact `/decompose` invocation and stop. Pick (a) only if the user has explicitly enabled auto-chain across skills (e.g., we are inside an `/autodev` invocation); otherwise pick (b).
### 2c. Research vs Skip Research (only for ≤5 estimates)
Classification (independent of points; runs only when points ≤ 5 and Step 2b chose Continue):
| Category | Criteria | Action | | Category | Criteria | Action |
|----------|----------|--------| |----------|----------|--------|
| **Needs research** | New libraries/frameworks, unfamiliar protocols, significant architectural change, multiple unknowns | Proceed to Step 3 (Research) | | **Needs research** | New libraries/frameworks, unfamiliar protocols, multiple unknowns | Proceed to Step 3 (Research) |
| **Skip research** | Extends existing functionality, uses patterns already in codebase, straightforward new component with known tech | Skip to Step 4 (Codebase Analysis) | | **Skip research** | Extends existing functionality, uses patterns already in codebase, straightforward new component with known tech | Skip to Step 4 (Codebase Analysis) |
Present the assessment to the user: Present the full assessment to the user:
``` ```
══════════════════════════════════════ ══════════════════════════════════════
COMPLEXITY ASSESSMENT COMPLEXITY ASSESSMENT
══════════════════════════════════════ ══════════════════════════════════════
Scope: [low / medium / high] Scope: [low / medium / high]
Novelty: [low / medium / high] Novelty: [low / medium / high]
Risk: [low / medium / high] Risk: [low / medium / high]
Points: [1 / 2 / 3 / 5 / 8 / 13] (project aim: 23, rarely 5)
Routing: [Continue in /new-task | Hand off to /decompose]
══════════════════════════════════════ ══════════════════════════════════════
Recommendation: [Research needed / Skip research] Recommendation: [Research needed | Skip research | Decompose required]
Reason: [one-line justification] Reason: [one-line justification, including any LESSONS.md influence]
══════════════════════════════════════ ══════════════════════════════════════
``` ```
**BLOCKING**: Ask the user to confirm or override the recommendation before proceeding. **BLOCKING**:
- If points ≤ 5 → ask the user to confirm or override the research recommendation before proceeding.
- If points = 8 → ask the user to choose between hand-off to /decompose (recommended) and continuing in /new-task with explicit risk acknowledgement.
- If points = 13 → STOP and present the handoff plan; do not offer a continue-anyway override.
--- ---
@@ -203,7 +240,13 @@ Apply the four shared-task triggers from `.cursor/skills/decompose/SKILL.md` Ste
2. Add the layout edit to the task's deliverables; the implementer writes it alongside the code change. 2. Add the layout edit to the task's deliverables; the implementer writes it alongside the code change.
3. If `module-layout.md` does not exist, STOP and instruct the user to run `/document` first (existing-code flow) or `/decompose` default mode (greenfield). Do not guess. 3. If `module-layout.md` does not exist, STOP and instruct the user to run `/document` first (existing-code flow) or `/decompose` default mode (greenfield). Do not guess.
Record the classification and any contract/layout deliverables in the working notes; they feed Step 5 (Validate Assumptions) and Step 6 (Create Task). - **ADR cross-check** — runs unconditionally for every new-task in any of the three classifications above:
1. If `_docs/02_document/adr/` exists, scan every `Status: Accepted` ADR. For each, ask: "would the proposed task either contradict this ADR's `Decision` or materially affect its `Consequences`?"
2. **Conflict** (task contradicts an Accepted ADR) → STOP and Choose A/B/C: **A)** Re-scope the task to comply with the ADR, **B)** Propose superseding the ADR — the task spec then includes a deliverable to invoke `/plan --adr-only` (or the next `/plan` cycle's Step 4.5) with `Supersedes: ADR-NNN`, and the new task does NOT proceed until that supersede ADR is `Accepted`, **C)** Park the task in `backlog/` with a `Blocked-By: ADR-NNN review` note. Do not silently approve a contradictory task.
3. **Drift** (task changes assumptions an ADR depends on but does not directly contradict it) → record the affected ADR(s) under a new `### ADR Impact` section in the task spec with `> Affects ADR NNN_<slug>: <one-line summary>`. The implementer surfaces this at code-review Phase 7 (which then classifies it as ADR-Drift if not addressed).
4. **Aligned** (task implements something an Accepted ADR mandates) → cite the ADR(s) under `### ADR Compliance` in the task spec with `> Implements ADR NNN_<slug>`. Code-review Phase 7 then expects matching evidence in the implemented code.
Record the classification, any contract/layout deliverables, and any ADR cross-check outcomes in the working notes; they feed Step 5 (Validate Assumptions) and Step 6 (Create Task).
**BLOCKING**: none — this step surfaces findings; the user confirms them in Step 5. **BLOCKING**: none — this step surfaces findings; the user confirms them in Step 5.
@@ -263,6 +306,9 @@ Present using the Choose format for each decision that has meaningful alternativ
- [ ] If Step 4.5 classified the task as producer, the `## Contract` section exists and points at a contract file - [ ] If Step 4.5 classified the task as producer, the `## Contract` section exists and points at a contract file
- [ ] If Step 4.5 classified the task as consumer, `### Document Dependencies` lists the relevant contract file - [ ] If Step 4.5 classified the task as consumer, `### Document Dependencies` lists the relevant contract file
- [ ] If Step 4.5 flagged a layout delta, the task's Scope.Included names the `module-layout.md` edit - [ ] If Step 4.5 flagged a layout delta, the task's Scope.Included names the `module-layout.md` edit
- [ ] If Step 4.5 flagged an ADR conflict, the task is either re-scoped (A), explicitly blocked on a supersede ADR (B), or parked in backlog (C) — never silently bypassed
- [ ] If Step 4.5 flagged ADR drift, the task spec has an `### ADR Impact` section listing the affected ADR(s)
- [ ] If Step 4.5 flagged ADR alignment, the task spec has an `### ADR Compliance` section citing the implemented ADR(s)
--- ---
+15 -3
View File
@@ -15,7 +15,7 @@ disable-model-invocation: true
# Solution Planning # Solution Planning
Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, tests, and work item epics through a systematic 6-step workflow. Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, ADRs, tests, and work item epics through a systematic workflow with seven step files (1, 2, 3, 4, 4.5, 5, 6) plus a Final quality checklist.
## Core Principles ## Core Principles
@@ -55,7 +55,7 @@ Read `steps/01_artifact-management.md` for directory structure, save timing, sav
## Progress Tracking ## Progress Tracking
At the start of execution, create a TodoWrite with all steps (1 through 6 plus Final). Update status as each step completes. At the start of execution, create a TodoWrite with all steps (1, 2, 3, 4, 4.5, 5, 6 plus Final). Update status as each step completes. The fractional Step 4.5 (ADR Capture) sits between Architecture Review (Step 4) and Test Specifications (Step 5).
## Workflow ## Workflow
@@ -85,6 +85,16 @@ Read and follow `steps/04_review-risk.md`.
--- ---
### Step 4.5: Architecture Decision Records (ADRs)
Read and follow `steps/04-5_adr-capture.md`.
This step captures the architecture and tech-stack decisions that were made (or revised) in Steps 24 as durable, dated, immutable records under `_docs/02_document/adr/`. ADRs are the single thing in `_docs/` that explain the **why** of each major decision after the conversation history is gone. They are consumed by `decompose` (when bootstrapping module layout), `new-task` (when assessing a new feature against existing decisions), `refactor` (when proposing replacements), and any future code-review cycle that needs to confirm a structural choice was deliberate.
This step is **BLOCKING**: the ADR set must be reviewed and confirmed by the user before Step 5 begins.
---
### Step 5: Test Specifications ### Step 5: Test Specifications
Read and follow `steps/05_test-specifications.md`. Read and follow `steps/05_test-specifications.md`.
@@ -120,7 +130,7 @@ Read and follow `steps/07_quality-checklist.md`.
|-----------|--------| |-----------|--------|
| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — planning cannot proceed | | Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — planning cannot proceed |
| Ambiguous requirements | ASK user | | Ambiguous requirements | ASK user |
| Input data coverage below 75% | Search internet for supplementary data, ASK user to validate | | Input data coverage below the canonical threshold (`cursor-meta.mdc` Quality Thresholds) | Search internet for supplementary data, ASK user to validate |
| Technology choice with multiple valid options | ASK user | | Technology choice with multiple valid options | ASK user |
| Component naming | PROCEED, confirm at next BLOCKING gate | | Component naming | PROCEED, confirm at next BLOCKING gate |
| File structure within templates | PROCEED | | File structure within templates | PROCEED |
@@ -146,6 +156,8 @@ Read and follow `steps/07_quality-checklist.md`.
│ [BLOCKING: user confirms components] │ │ [BLOCKING: user confirms components] │
│ 4. Review & Risk → risk register, iterations │ │ 4. Review & Risk → risk register, iterations │
│ [BLOCKING: user confirms mitigations] │ │ [BLOCKING: user confirms mitigations] │
│ 4.5 ADR Capture → _docs/02_document/adr/NNN_*.md │
│ [BLOCKING: user confirms ADR set] │
│ 5. Test Specifications → per-component test specs │ │ 5. Test Specifications → per-component test specs │
│ 6. Work Item Epics → epic per component + bootstrap │ │ 6. Work Item Epics → epic per component + bootstrap │
│ ───────────────────────────────────────────────── │ │ ───────────────────────────────────────────────── │
@@ -26,6 +26,10 @@ DOCUMENT_DIR/
│ └── deployment_procedures.md │ └── deployment_procedures.md
├── risk_mitigations.md ├── risk_mitigations.md
├── risk_mitigations_02.md (iterative, ## as sequence) ├── risk_mitigations_02.md (iterative, ## as sequence)
├── adr/
│ ├── 001_[decision_slug].md
│ ├── 002_[decision_slug].md
│ └── ...
├── components/ ├── components/
│ ├── 01_[name]/ │ ├── 01_[name]/
│ │ ├── description.md │ │ ├── description.md
@@ -66,6 +70,8 @@ DOCUMENT_DIR/
| Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` | | Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
| Step 3 | Diagrams generated | `diagrams/` | | Step 3 | Diagrams generated | `diagrams/` |
| Step 4 | Risk assessment complete | `risk_mitigations.md` | | Step 4 | Risk assessment complete | `risk_mitigations.md` |
| Step 4.5 | Each ADR captured | `adr/NNN_[decision_slug].md` |
| Step 4.5 | ADR index updated | `adr/README.md` |
| Step 5 | Tests written per component | `components/[##]_[name]/tests.md` | | Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
| Step 6 | Epics created in work item tracker | Tracker via MCP | | Step 6 | Epics created in work item tracker | Tracker via MCP |
| Final | All steps complete | `FINAL_report.md` | | Final | All steps complete | `FINAL_report.md` |
@@ -85,3 +91,15 @@ If DOCUMENT_DIR already contains artifacts:
2. Identify the last completed step based on which artifacts exist 2. Identify the last completed step based on which artifacts exist
3. Resume from the next incomplete step 3. Resume from the next incomplete step
4. Inform the user which steps are being skipped 4. Inform the user which steps are being skipped
#### Step 4.5 (ADR Capture) resumption rule
ADR files have a `Status` field that disambiguates "step in progress" from "step done":
- `Status: Proposed` → Step 4.5 is **in progress**. The user has not yet hit the BLOCKING gate (or hit it and chose B/C/D, which kept files at `Proposed`). Resume Step 4.5 at Phase 4.5f and re-present the BLOCKING Choose to the user. Do NOT skip to Step 5.
- `Status: Accepted` AND `adr/README.md` index exists AND every Accepted ADR is referenced in the index → Step 4.5 is **done**. Skip to Step 5.
- `Status: Accepted` but `adr/README.md` is missing or out of date → Step 4.5 is **partially complete**. Resume at Phase 4.5d (Maintain the ADR Index) before moving on.
- Mixed `Proposed` + `Accepted` files in the same directory → Step 4.5 is **in progress** with prior partial confirmations. Resume at Phase 4.5f and re-present only the still-`Proposed` ADRs.
- Empty `adr/` directory or no `adr/` directory → Step 4.5 has not started yet. Begin at Phase 4.5a.
The `Date` field on every Accepted ADR is the date the user confirmed it; do not regenerate it during resumption.
@@ -0,0 +1,187 @@
# Step 4.5: Architecture Decision Records (ADRs)
**Role**: Architect / technical writer
**Goal**: Capture every major architecture, tech-stack, data-model, and integration decision made during Steps 24 as a durable, dated, immutable record under `_docs/02_document/adr/`.
**Constraints**: ADRs only — do not re-open architecture; do not make new decisions in this step. Document what has been decided, not what is still open.
ADRs are the single thing in `_docs/` that explains the **why** of each major decision after the conversation history is gone. They are consumed by:
- `decompose` Step 1.5 (`steps/01-5_module-layout.md`) — every Accepted ADR is cross-checked against the module-layout proposal; conflicts trigger an explicit Choose between supersede / exception / re-open.
- `new-task` Step 4.5 (`SKILL.md` § "Step 4.5: Contract & Layout Check") — every new task is classified against Accepted ADRs as Conflict / Drift / Aligned; conflicts STOP the task with a Choose A/B/C; drift adds an `### ADR Impact` section; alignment adds an `### ADR Compliance` section.
- `refactor` Phase 2b.1 (`phases/02-analysis.md`) — every Accepted ADR is diffed against the proposed roadmap; Violations trigger a BLOCKING supersede gate that produces a `supersede_adr_NNN.md` task before any refactor task is created.
- `code-review` Phase 7 (`SKILL.md` § "Phase 7: Architecture Compliance") — every changed-files batch is checked against Accepted ADRs; ADR-Violation findings are Critical, ADR-Drift findings are High.
Discipline that still relies on the human: when a downstream skill detects a Drift case, the resulting task spec MUST land its `## ADR Impact` / `## ADR Compliance` section; the implementer must address it; the next code-review batch then has the context it needs. Drift left undocumented is the silent-failure path — every consumer hook above is designed to make it visible.
## Inputs
- `_docs/02_document/architecture.md` (incl. confirmed `## Architecture Vision`)
- `_docs/02_document/glossary.md`
- `_docs/02_document/data_model.md`
- `_docs/02_document/system-flows.md`
- `_docs/02_document/risk_mitigations.md` (and any `risk_mitigations_NN.md` iterations from Step 4)
- `_docs/02_document/components/[##]_[name]/description.md`
- `_docs/02_document/deployment/` (CI/CD, environments, observability)
- `_docs/00_problem/restrictions.md` and `_docs/00_problem/acceptance_criteria.md` (each ADR must reference relevant constraints / AC by ID)
- Optional: `_docs/01_solution/solution.md` and `_docs/01_solution/tech_stack.md` (research output)
- Optional: `_docs/LESSONS.md` — surface any lesson categories of `architecture` / `dependencies` that bias the recommendation
## What is an ADR (and what is not)
Capture an ADR when **all** of the following hold:
1. The decision picks between two or more genuinely valid approaches with meaningful trade-offs.
2. The decision has **downstream consequences** that other decisions, code, or tasks inherit from.
3. The decision is **non-obvious** to a future reader who only sees the final code — they would ask "why was it built this way?" rather than discovering the answer by reading the source.
Do NOT create an ADR for:
- Naming, formatting, or purely cosmetic choices.
- A choice that is fully implied by a single explicit restriction (`restrictions.md` is itself the record — link to it from the architecture doc instead).
- A choice the team has not actually made yet — open questions live in `risk_mitigations.md` or `_docs/_process_leftovers/`, not in ADRs.
- A technology selection where research already produced an exact-fit selection with one viable option (the research doc is the record — link to the relevant `solution_draft*.md` section).
## Process
### Phase 4.5a: Decision Inventory
Walk the inputs and list candidate decisions. For each candidate, record a one-liner:
```
- [decision] — [trade-off summary] — [downstream consumers] — [evidence file:section]
```
Inspect at minimum:
| Inspection target | Typical decisions surfaced |
|-------------------|----------------------------|
| `architecture.md` § layering | Layering style (clean vs hex vs n-tier), which layer owns transactions, how cross-cutting concerns enter |
| `architecture.md` § Architecture Vision | The North Star principle (e.g., "edge-first, sync-second"); ADR captures the implication for one specific subsystem |
| `data_model.md` | Datastore choice (Postgres vs Mongo), partitioning, soft vs hard deletes, schema evolution strategy |
| `system-flows.md` | Sync vs async boundaries, idempotency strategy, retry policy ownership, error envelope shape |
| `components/*/description.md` § interfaces | Public-API style (REST vs RPC vs event), versioning strategy, auth/authorization placement |
| `deployment/containerization.md` | Single container vs sidecar vs init container, base image lineage |
| `deployment/ci_cd_pipeline.md` | Trunk-based vs feature-branch, gate ordering, deploy strategy (blue-green / canary / all-at-once) |
| `deployment/observability.md` | Logging stack, metric backend, sampling rate decisions, retention |
| `risk_mitigations.md` | Risk-acceptance trade-offs (e.g., "we accept N% data loss in exchange for sub-100ms p99") |
| Tech-stack from `_docs/01_solution/tech_stack.md` | Anything where research recorded ≥2 candidates and a winner |
Drop any candidate that fails the three "what is an ADR" criteria above. Keep the rest.
### Phase 4.5b: Numbering and Slugs
ADRs are numbered globally per project, monotonically, never re-used.
1. List existing files under `_docs/02_document/adr/` matching `^[0-9]{3}_.+\.md$`.
2. The next ADR number is `max(existing) + 1`, zero-padded to 3 digits.
3. The slug is kebab-case, ≤6 words, derived from the decision summary. Example: `001_use-postgres-for-transactional-data.md`, `004_event-driven-cross-component-comms.md`.
### Phase 4.5c: Render One ADR Per Decision
For each kept candidate, render the ADR using `templates/adr.md`. Required sections (do NOT omit any):
| Section | Content |
|---------|---------|
| **Number** | `NNN` |
| **Title** | One-line decision statement (matches slug) |
| **Status** | `Proposed` (only during Step 4.5 iteration) → `Accepted` (after user confirmation at the BLOCKING gate) |
| **Date** | YYYY-MM-DD (the date the user confirmed) |
| **Deciders** | The user (project owner) — the AI is not a decider |
| **Context** | The problem this decision addresses, including links to AC IDs, restriction IDs, risks, and (where relevant) the research draft section |
| **Decision** | The chosen approach in one sentence, then the supporting detail |
| **Alternatives Considered** | Each alternative with a one-line "rejected because…" |
| **Consequences** | Positive (what becomes easier / cheaper / faster) and negative (what becomes harder / locked in / costly to undo). Be honest — every decision has a downside. |
| **Supersedes / Superseded by** | Empty initially; updated when a future ADR overturns this one |
| **Evidence** | File-and-section pointers into `_docs/` showing where the decision is reflected (architecture.md § layering, components/02_*/description.md § interface, etc.) |
After rendering, write each file to `_docs/02_document/adr/NNN_<slug>.md`. Keep `Status: Proposed` until the BLOCKING gate.
### Phase 4.5d: Maintain the ADR Index
Write or update `_docs/02_document/adr/README.md` with this exact shape:
```markdown
# Architecture Decision Records
This index lists every ADR for this project, in number order. ADRs are immutable once `Accepted`
new decisions that overturn a prior ADR are recorded as new ADRs whose `Supersedes` field points
back, and the original ADR's `Superseded by` field is updated.
| # | Title | Status | Date | Supersedes |
|---|-------|--------|------|------------|
| 001 | Use Postgres for transactional data | Accepted | 2026-05-21 | — |
| 002 | Event-driven cross-component comms | Accepted | 2026-05-21 | — |
| ... | ... | ... | ... | ... |
```
Sort by `#` ascending. Include all ADRs ever written, even superseded ones — the audit trail is the point.
### Phase 4.5e: Cross-Link from architecture.md
In `architecture.md`, every section that reflects an ADR decision gets a one-line trailing reference:
```markdown
> See ADR 001 (Use Postgres for transactional data), ADR 003 (Event-driven cross-component comms).
```
Place the reference at the end of the section, after the prose. This lets a future reader of `architecture.md` jump straight to the rationale.
### Phase 4.5f: BLOCKING Gate — User Confirmation
Present the ADR set to the user using the Choose format from `.cursor/skills/autodev/protocols.md` (or plain text if AskQuestion is unavailable):
```
══════════════════════════════════════
DECISION REQUIRED: ADR set captured (N records)
══════════════════════════════════════
001 — [title]
002 — [title]
...
══════════════════════════════════════
A) Accept all ADRs as written
B) Edit specific ADRs (numbers and edits)
C) Add a missed decision (description)
D) Remove an ADR (number and reason)
══════════════════════════════════════
Recommendation: A — review the rendered set and confirm; corrections are quick on Round 2
══════════════════════════════════════
```
Loop:
- **A** → flip every ADR's `Status` from `Proposed` to `Accepted`, set `Date` to today's date, save, exit step.
- **B** → apply edits, re-present the modified ADRs, loop.
- **C** → run Phase 4.5a4.5e for the missed decision only, append to the set, re-present, loop.
- **D** → confirm with the user that the candidate fails the three "what is an ADR" criteria, remove the file, update the index, loop.
Do NOT mark `Accepted` without an explicit user A.
## Self-verification
- [ ] Every kept candidate from Phase 4.5a has a corresponding file under `adr/`
- [ ] Every ADR has all required sections (none empty except `Supersedes` / `Superseded by`)
- [ ] `Decision` sections are one-sentence-then-detail, not "we'll figure it out"
- [ ] `Alternatives Considered` lists at least one rejected alternative per ADR
- [ ] `Consequences` lists both positive AND negative consequences (an ADR with no negatives is suspect)
- [ ] `Evidence` points at real `_docs/` sections that exist on disk
- [ ] `adr/README.md` index lists every file in the directory and matches their `Status` / `Date`
- [ ] `architecture.md` has a trailing `See ADR …` reference at every section that an ADR reflects
- [ ] The user confirmed the set via Choose A; every ADR is `Accepted` with today's date
## Common mistakes
- **Re-opening architecture**: Step 4.5 records, it does not decide. If a candidate decision turns out to be unsettled, that's a Step 2 / Step 4 gap — return there, do not paper over it with a wishy-washy ADR.
- **Decision-of-the-week**: do not write an ADR for every minor pattern choice. The bar is "non-obvious to a future reader". 515 ADRs is typical for a planning round; 40+ is over-capture.
- **Negative consequences left empty**: every real decision has costs. If you cannot name one, the decision was not actually weighed.
- **Vague evidence**: `architecture.md` is not enough — point at the specific section. `architecture.md § Layering``architecture.md`.
- **Numbering reuse**: never recycle a number from a deleted ADR. The audit trail is more important than tidy numbering.
- **Superseding without recording**: when a later cycle overturns an ADR, the new ADR must point at the old one via `Supersedes`, AND the old ADR's `Superseded by` field must be updated. Index reflects both. (This is enforced when `decompose` or `refactor` later updates ADRs.)
## Escalation
| Situation | Action |
|-----------|--------|
| Candidate decision is unsettled (the team has not actually decided) | Return to the originating step (2 / 3 / 4); do NOT write a placeholder ADR |
| Two candidates in Phase 4.5a turn out to be the same decision phrased differently | Merge into one ADR, list both phrasings in `Context` |
| User picks D (remove an ADR) and the AI judges the decision is genuinely worth recording | Surface the disagreement, ASK why the user wants it removed, defer to user |
| Existing `adr/` directory has files but `adr/README.md` is missing or stale | Rebuild the index from the directory before adding new ADRs |
@@ -2,7 +2,7 @@
**Role**: Professional Quality Assurance Engineer **Role**: Professional Quality Assurance Engineer
**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage **Goal**: Write test specs for each component achieving the canonical minimum acceptance-criteria coverage (currently 75% — see `.cursor/rules/cursor-meta.mdc` Quality Thresholds; do not restate a different number here)
**Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion. **Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
+67
View File
@@ -0,0 +1,67 @@
# ADR-{NNN}: {decision-title}
- **Status**: {Proposed | Accepted | Deprecated | Superseded}
- **Date**: {YYYY-MM-DD}
- **Deciders**: {user / project owner}
- **Supersedes**: {ADR-NNN | —}
- **Superseded by**: {ADR-NNN | —}
## Context
What problem does this decision address? Cite the relevant constraint(s), acceptance criterion / criteria, and risk(s) by ID.
- Acceptance criteria addressed: AC-{ID-1}, AC-{ID-2}
- Restrictions addressed: R-{ID-1}, R-{ID-2}
- Risks addressed: RISK-{ID-1}
- Research source (if any): `_docs/01_solution/solution_draftN.md` § {section}
A short paragraph (36 sentences) explaining why a choice is required now and what makes it non-trivial. Do not pre-announce the decision here — that goes in `Decision`. Focus on the forces at play (load, scale, team familiarity, hardware constraints, regulatory drivers, third-party limits).
## Decision
One declarative sentence: **"We will …"** Then 13 paragraphs of supporting detail explaining how the decision will be implemented at the boundaries between components.
Be specific. "We will use Postgres" is too thin; "We will use Postgres 16 with logical replication for read scaling, restricting JSONB columns to top-level metadata only, with all transactional data in normalized tables" is the right resolution.
## Alternatives Considered
| Alternative | Rejected because |
|-------------|------------------|
| {Alt 1 — short label} | {one line: the cost / mismatch / risk that ruled it out, ideally referencing a measurable criterion} |
| {Alt 2 — short label} | {one line} |
| {Alt 3 — short label} | {one line} |
At least one rejected alternative is mandatory. If only one option was ever considered, this is not an ADR — link to the source restriction or research selection from the parent doc instead.
## Consequences
### Positive
- {What becomes easier / cheaper / faster, with concrete examples where possible}
- {…}
### Negative
- {What becomes harder / locked in / costly to undo}
- {…}
Every real decision has both. If the negatives section is hard to fill, the alternatives were probably not weighed seriously — return to the prior step.
### Neutral / Open
- {What is unchanged but worth flagging for future readers (e.g., "this does not change the auth boundary; auth remains in component 02_user_management as decided in ADR-003")}
## Evidence
Where this decision is reflected on disk. Use `file:section` links so future readers can jump.
- `_docs/02_document/architecture.md` § {section}
- `_docs/02_document/data_model.md` § {section}
- `_docs/02_document/components/{##_name}/description.md` § {section}
- `_docs/02_document/system-flows.md` § {flow name}
- `_docs/02_document/deployment/{file}.md` § {section}
- {add more as needed}
## Notes
Optional. Use for caveats that did not fit above, links to external research, or follow-ups that the team agreed to revisit on a known trigger ("re-evaluate after 6 months in production" / "re-evaluate when load exceeds 10× baseline").
@@ -1,6 +1,6 @@
# Final Planning Report Template # Final Planning Report Template
Use this template after completing all 6 steps and the quality checklist. Save as `_docs/02_document/FINAL_report.md`. Use this template after completing all steps (1, 2, 3, 4, 4.5, 5, 6) and the quality checklist. Save as `_docs/02_document/FINAL_report.md`.
--- ---
@@ -39,6 +39,44 @@ Write `RUN_DIR/analysis/research_findings.md`:
4. Prioritize changes by impact and effort 4. Prioritize changes by impact and effort
5. Reject or escalate any proposed refactor that improves code structure while weakening required behavior, integration contracts, runtime constraints, safety/security posture, or acceptance criteria 5. Reject or escalate any proposed refactor that improves code structure while weakening required behavior, integration contracts, runtime constraints, safety/security posture, or acceptance criteria
### 2b.1. ADR Superseding Gate (BLOCKING)
A refactor that improves code structure while overturning a documented architecture decision is the silent-drift class the project repeatedly burns on (see `meta-rule.mdc` § GPS-passthrough postmortem and the auto-lessons it produced). This gate makes drift visible and forces a deliberate ADR update.
1. **List candidate ADRs**: read every `Status: Accepted` file in `_docs/02_document/adr/`. If the directory does not exist or contains only the index, log `No ADRs in scope` to `RUN_DIR/analysis/adr_impact.md` and skip the rest of this gate.
2. **Diff each candidate against the proposed refactor roadmap**: for each ADR, ask the same two questions as code-review Phase 7:
- **Violation**: does any roadmap item do the *opposite* of the ADR's `Decision`?
- **Drift**: does any roadmap item materially affect the ADR's `Consequences` (positive or negative) without contradicting the Decision outright?
3. **Classify each impacted ADR** in `RUN_DIR/analysis/adr_impact.md`:
| ADR | Roadmap item | Impact | Required action |
|-----|--------------|--------|-----------------|
| NNN | `roadmap-item-NN` | Violation / Drift / Aligned | (filled by Choose A/B/C below) |
4. **For every Violation row, present a BLOCKING Choose**:
```
══════════════════════════════════════
DECISION REQUIRED: Refactor would violate ADR-NNN (<title>)
══════════════════════════════════════
A) Update the ADR via supersede: the refactor produces a NEW ADR
(`Supersedes: NNN`) capturing the new Decision, and ADR-NNN's
`Superseded by` field is updated. The supersede ADR is itself a
deliverable of this refactor run (added to RUN_DIR/analysis/adr_impact.md
and to TASKS_DIR as a task) and must be `Accepted` before Phase 4.
B) Reduce the refactor scope to NOT violate ADR-NNN
C) Re-evaluate ADR-NNN: keep the refactor but only after ADR-NNN is
formally re-opened in a new /plan Step 4.5 round
══════════════════════════════════════
Recommendation: A — supersede is the only path that keeps the audit
trail intact while letting the refactor land
══════════════════════════════════════
```
5. **For every Drift row**: do not block, but the roadmap item must include a `## ADR Impact` section in its task spec citing the affected ADR(s). The implementer surfaces this at code-review Phase 7, which would otherwise classify the change as ADR-Drift (High) without context.
6. **For every Aligned row**: cite the ADR in the roadmap item's task spec under `## ADR Compliance`. No further action.
7. **Self-supersede deliverable**: any Choose A path adds a `[##]_supersede_adr_NNN.md` task file to the refactor run's TASKS_DIR with the new ADR text drafted (using `.cursor/skills/plan/templates/adr.md`). The task's only Acceptance Criterion is "ADR file exists at `_docs/02_document/adr/<next>_<slug>.md` with `Status: Accepted`, ADR-NNN's `Superseded by` field updated, and `_docs/02_document/adr/README.md` index reflects both."
Present optional hardening tracks for user to include in the roadmap: Present optional hardening tracks for user to include in the roadmap:
``` ```
@@ -67,6 +105,8 @@ Write `RUN_DIR/analysis/refactoring_roadmap.md`:
**BLOCKING applicability gate**: Before 2c and 2d, every recommendation in the roadmap must be `Selected`. Items marked `Rejected` are excluded. Items marked `Experimental only` or `Needs user decision` require a user decision before task creation. **BLOCKING applicability gate**: Before 2c and 2d, every recommendation in the roadmap must be `Selected`. Items marked `Rejected` are excluded. Items marked `Experimental only` or `Needs user decision` require a user decision before task creation.
**BLOCKING ADR-supersede gate**: Before 2c and 2d, every Violation row in `RUN_DIR/analysis/adr_impact.md` (from 2b.1) must be resolved via Choose A, B, or C. A Violation row with no chosen path blocks task creation.
## 2c. Create Epic ## 2c. Create Epic
Create a work item tracker epic for this refactoring run: Create a work item tracker epic for this refactoring run:
@@ -111,6 +151,10 @@ Convert the finalized `RUN_DIR/list-of-changes.md` into implementable task files
- [ ] Task dependencies are consistent (no circular dependencies) - [ ] Task dependencies are consistent (no circular dependencies)
- [ ] `_dependencies_table.md` includes all refactoring tasks - [ ] `_dependencies_table.md` includes all refactoring tasks
- [ ] Every task has a work item ticket (or PENDING placeholder) - [ ] Every task has a work item ticket (or PENDING placeholder)
- [ ] If `_docs/02_document/adr/` exists with Accepted ADRs, `RUN_DIR/analysis/adr_impact.md` has been written and every Violation row is resolved (A/B/C) — no implicit overrides
- [ ] For every Violation resolved via Choose A, a `[##]_supersede_adr_NNN.md` task exists in TASKS_DIR with the drafted supersede ADR
- [ ] For every Drift row, the corresponding roadmap-item task spec has a `## ADR Impact` section
- [ ] For every Aligned row, the corresponding roadmap-item task spec has a `## ADR Compliance` section
**Save action**: Write analysis artifacts to RUN_DIR, task files to TASKS_DIR **Save action**: Write analysis artifacts to RUN_DIR, task files to TASKS_DIR
@@ -15,9 +15,9 @@ Before designing or implementing any new tests, check what already exists:
1. Scan the project for existing test files (unit tests, integration tests, blackbox tests) 1. Scan the project for existing test files (unit tests, integration tests, blackbox tests)
2. Run the existing test suite — record pass/fail counts 2. Run the existing test suite — record pass/fail counts
3. Measure current coverage against the areas being refactored (from `RUN_DIR/list-of-changes.md` file paths) 3. Measure current coverage against the areas being refactored (from `RUN_DIR/list-of-changes.md` file paths)
4. Assess coverage against thresholds: 4. Assess coverage against thresholds (canonical: see `.cursor/rules/cursor-meta.mdc` Quality Thresholds — never hardcode a different number):
- Minimum overall coverage: 75% - Minimum overall coverage: 75%
- Critical path coverage: 90% - Critical path coverage: **90% floor / 100% aim** — 90% is the enforcement floor (blocks Phase 4 if not met); 100% is the aspirational target. Refactors are NOT permitted to drop below 90% on the critical paths covered by the in-scope changes.
- All public APIs must have blackbox tests - All public APIs must have blackbox tests
- All error handling paths must be tested - All error handling paths must be tested
@@ -47,7 +47,7 @@ For each uncovered critical area, write test specs to `RUN_DIR/test_specs/[##]_[
4. Document any discovered issues 4. Document any discovered issues
**Self-verification**: **Self-verification**:
- [ ] Coverage requirements met (75% overall, 90% critical paths) across existing + new tests - [ ] Coverage requirements met (75% overall, 90% critical-path floor — 100% aim — per canonical `cursor-meta.mdc` Quality Thresholds) across existing + new tests
- [ ] All tests pass on current codebase - [ ] All tests pass on current codebase
- [ ] All public APIs in refactoring scope have blackbox tests - [ ] All public APIs in refactoring scope have blackbox tests
- [ ] Test data fixtures are configured - [ ] Test data fixtures are configured
@@ -45,7 +45,7 @@ Write `RUN_DIR/test_sync/new_tests.md`:
- [ ] All obsolete tests removed or merged - [ ] All obsolete tests removed or merged
- [ ] All pre-existing tests pass after updates - [ ] All pre-existing tests pass after updates
- [ ] New code from Phase 4 has test coverage - [ ] New code from Phase 4 has test coverage
- [ ] Overall coverage meets or exceeds Phase 3 baseline (75% overall, 90% critical paths) - [ ] Overall coverage meets or exceeds Phase 3 baseline (75% overall, 90% critical-path floor / 100% aim — per `.cursor/rules/cursor-meta.mdc` Quality Thresholds)
- [ ] No tests reference removed or renamed code - [ ] No tests reference removed or renamed code
**Save action**: Write test_sync artifacts; implemented tests go into the project's test folder **Save action**: Write test_sync artifacts; implemented tests go into the project's test folder
+290
View File
@@ -0,0 +1,290 @@
---
name: release
description: |
Executes the deployment plan produced by /deploy against a target environment.
Closes the loop between "we have a plan" and "the new version is running in production with a verdict on disk."
6-phase workflow: pre-release gate, strategy select, execute, smoke test, watch window, commit-or-rollback.
Outputs _docs/04_release/release_<version>.md with a definitive Released / Rolled-Back / Aborted verdict.
Trigger phrases:
- "release", "ship", "go live", "release this version"
- "deploy to prod", "promote to staging", "roll out"
- "rollback", "abort the release"
category: ship
tags: [release, deployment, rollback, smoke-test, observability, production]
disable-model-invocation: true
---
# Release Execution
The `/deploy` skill produces a plan and scripts. The `/release` skill **runs** them, verifies the live system, watches it for a defined window, and produces a definitive verdict on disk.
## Core Principles
- **Real execution, not simulation**: every phase must actually run against the target environment. If a phase cannot be executed (missing scripts, no SSH access, disabled secrets, registry auth failure), STOP — do not pretend a step succeeded. See `meta-rule.mdc` § "Real Results, Not Simulated Ones".
- **Verifiable rollback path**: the release does not start until rollback is proven viable for this version. "We can roll back" without evidence is not a rollback path.
- **Quiet failure is a release failure**: a deploy script that exits 0 but emits no observable signal in the watch window is treated as a regression, not a success.
- **One release per invocation**: a single `/release` execution targets exactly one version against exactly one environment. Multi-stage promotion (staging → prod) is two invocations, not one.
- **Never skip the watch window**: even successful deploys can degrade after 560 minutes (cache warm-up, scheduled jobs, downstream backpressure). The watch window is mandatory.
- **Autonomous rollback on hard regressions**: critical health-check failure, error-rate spike above threshold, or smoke-test failure → automatic rollback. Soft regressions (latency drift, capacity warnings) escalate to the user.
## Context Resolution
Fixed paths:
- DEPLOY_DIR: `_docs/04_deploy/`
- RELEASE_DIR: `_docs/04_release/`
- SCRIPTS_DIR: `scripts/`
- DEPLOY_SCRIPT: `scripts/deploy.sh`
- HEALTH_SCRIPT: `scripts/health-check.sh`
- ENV_TEMPLATE: `.env.example`
- OBSERVABILITY_DOC: `_docs/04_deploy/observability.md`
- ENVIRONMENT_DOC: `_docs/04_deploy/environment_strategy.md`
- PROCEDURES_DOC: `_docs/04_deploy/deployment_procedures.md`
- ARCHITECTURE: `_docs/02_document/architecture.md`
- RESTRICTIONS: `_docs/00_problem/restrictions.md`
Announce the resolved paths and the **target environment + version + strategy** to the user before any phase that touches the live system.
## Inputs (BLOCKING prerequisites)
| Input | Required | Source |
|-------|----------|--------|
| Target environment | Yes — ASK user | `environment_strategy.md` enumerates valid options |
| Target version / image tag | Yes — ASK user | Must exist in the registry; verified in Phase 1 |
| Rollback target version | Yes — ASK user | Defaults to currently-deployed version if discoverable |
| `scripts/deploy.sh` | Yes | Produced by `/deploy` Step 7. STOP if missing → run `/deploy` first |
| `scripts/health-check.sh` | Yes | Same |
| `_docs/04_deploy/deployment_procedures.md` | Yes | Defines per-environment runbook, manual approval rules, change-window restrictions |
| `_docs/04_deploy/observability.md` | Yes | Defines watch metrics, thresholds, and dashboards |
| `_docs/04_deploy/environment_strategy.md` | Yes | Defines target hostnames, registries, secrets, deploy strategy per env |
## Outputs
```
RELEASE_DIR/
├── release_<version>_<env>_<YYYY-MM-DD-HHmm>.md (mandatory; one per invocation)
├── rollback_<version>_<env>_<YYYY-MM-DD-HHmm>.md (only when rollback fires; pairs with the release file)
└── manual_approvals/
└── approval_<version>_<env>.md (when restrictions require manual approval, written before Phase 3)
```
The release report (`templates/release-report.md`) is appended to as each phase completes — it is durable across phase failures and reflects partial progress so the next operator can resume or audit.
## Phases
```
┌────────────────────────────────────────────────────────────────┐
│ Release Execution (6-Phase Method) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ: deploy artifacts on disk; tests green at HEAD │
│ │
│ 1. Pre-Release Gate → AC + change summary + readiness │
│ [BLOCKING: user confirms or aborts] │
│ 2. Strategy Select → all-at-once / blue-green / canary │
│ [BLOCKING: user picks strategy] │
│ 3. Execute → run deploy.sh, capture exit + logs │
│ [AUTO-ROLLBACK on non-zero exit] │
│ 4. Smoke Test → /test-run prod-smoke in target env │
│ [AUTO-ROLLBACK on failure] │
│ 5. Watch Window → poll observability for N minutes │
│ [AUTO-ROLLBACK on hard threshold breach] │
│ 6. Commit or Rollback → finalize verdict, update tracker │
│ [BLOCKING: user confirms only if soft regression escalated] │
├────────────────────────────────────────────────────────────────┤
│ Verdicts: Released · Rolled-Back · Aborted │
└────────────────────────────────────────────────────────────────┘
```
### Phase 1: Pre-Release Gate
**Goal**: Refuse to start if the system is not ready for a real release.
1. **Acceptance criteria check**: read `_docs/00_problem/acceptance_criteria.md`. If any AC is marked unmet OR if any AC has no associated test marked `Passed` in the latest `test-run` report, STOP and surface the unmet items. Do not let the user override with "ship anyway" without a recorded reason in the release report.
2. **Test status check**: read the most recent `_docs/06_metrics/perf_*.md` (if perf is required by restrictions) and the latest functional test report. Any failing or skipped test that maps to a critical-path AC blocks the release.
3. **Change summary**: read the git log between the version-tag-of-last-release and HEAD (or, if no prior release exists, from the project root commit). Render a short list grouped by component: features, fixes, breaking changes, security fixes. Cross-reference against the latest implementation reports under `_docs/03_implementation/`.
4. **Rollback readiness**:
- Confirm the previous version's image is still pullable from the registry (do not deploy without this).
- Confirm `scripts/deploy.sh --rollback` works as documented (read the script; if `--rollback` flag is missing, STOP — that is a deploy-skill bug).
- Confirm a rollback target exists (e.g., previously-deployed image tag) and is recorded in the release report under `Rollback Plan`.
5. **Restrictions**: read `_docs/00_problem/restrictions.md` for change-window rules, manual-approval rules, blackout windows, regulatory requirements (e.g., 4-eyes review, ITAR controls). If any apply, gate accordingly — write a `manual_approvals/approval_<version>_<env>.md` file once received.
6. **Tracker check**: list tracker tickets in the release scope (per `tracker.mdc` rules). Any ticket still in `In Progress` or `Code Review` that maps to a change in the release scope blocks Phase 1. Move-and-deploy is not allowed.
**BLOCKING gate**: present the assembled summary to the user using Choose A/B/C:
```
══════════════════════════════════════
PRE-RELEASE GATE
══════════════════════════════════════
Target env: {env}
Target version: {version} ({git-sha})
Rollback target: {previous-version}
Changes: N tickets, M components
- {summary list}
Open risks: {summary or "none"}
Blocking issues: {summary or "none"}
══════════════════════════════════════
A) Proceed to Strategy Select
B) Abort — fix blocking issue and re-invoke
C) Edit release scope — exclude a ticket and reassemble
══════════════════════════════════════
```
If A → write Phase 1 section to release report, proceed. If B → write `Aborted` verdict to release report with reason, exit. If C → loop back into Phase 1 with edited scope.
### Phase 2: Strategy Select
**Goal**: Pick the deployment strategy that fits the change risk and environment capability.
Read `environment_strategy.md` and `deployment_procedures.md` to learn which strategies the target env supports. Strategies and when each is appropriate:
| Strategy | When to pick | Risk if wrong |
|----------|--------------|---------------|
| **all-at-once** | Internal tools, low traffic, well-rehearsed change, env supports nothing else | All users hit the new version simultaneously — bug blast radius is 100% |
| **blue-green** | Stateless services with a load balancer, env has dual-stack capability | Cutover is binary — observability must be ready to detect issues fast |
| **canary** | Customer-facing, traffic-tier load balancer in place, gradual rollout possible | Canary metric thresholds must be well-tuned or canary fails for harmless reasons |
| **manual** | Non-automatable env (one-off VMs, regulated infrastructure, non-Docker host) | The whole release becomes a runbook and the watch window phases are operator-driven; the release skill records but does not execute |
Recommend a default based on:
- Risk level inferred from change summary (any breaking change → bias toward canary or blue-green)
- Restrictions (e.g., regulatory rules forcing manual approval at each step)
- Environment capability (some envs may only support all-at-once)
**BLOCKING gate**: Choose A/B/C/D between strategies. Record the choice in the release report.
### Phase 3: Execute
**Goal**: Actually run the deploy. Capture exit code and full stdout/stderr.
1. Validate environment file (`.env`) exists, all required vars from `.env.example` are set, no placeholder secrets remain.
2. Source the env file and run `scripts/deploy.sh` against the target host. The script produced by `/deploy` Step 7 is the point of execution; do NOT bypass it. If a strategy-specific flag is needed (e.g., `--canary 5%`), pass it through.
3. Stream stdout/stderr to the release report, with timestamps, in a fenced code block under `## Phase 3: Execute`.
4. Capture exit code.
5. **AUTO-ROLLBACK trigger**: non-zero exit code → immediately invoke Phase 6 with verdict `Rolled-Back: deploy script failure`. Do NOT continue to Phase 4.
If `deploy.sh` emits no output for more than the configured idle threshold (default 5 minutes; check `deployment_procedures.md` for an explicit value), treat it as hung — capture a snapshot of what's running on the target, kill the script, and AUTO-ROLLBACK with reason `Deploy hung — manual investigation required`.
**Manual strategy**: if Phase 2 picked `manual`, write a checklist of operator steps from `deployment_procedures.md` to the release report and pause until the user types `done` or `failed`. Phase 3 then records the user's report verbatim.
### Phase 4: Smoke Test
**Goal**: Verify the new version is *actually serving traffic correctly* in the target environment.
1. Resolve the smoke-test command from `_docs/02_document/tests/blackbox-tests.md` § Production Smoke Tests, OR delegate to `/test-run` in `--prod-smoke` mode against the target environment.
2. The smoke-test set must (a) hit each public endpoint of each component, (b) include at least one read AND one write per public endpoint where applicable, and (c) complete in under 5 minutes total.
3. Capture pass/fail per case to the release report.
4. **AUTO-ROLLBACK trigger**: any smoke-test failure → invoke Phase 6 with verdict `Rolled-Back: smoke test failure: <test-name>`.
If smoke tests are **missing** for the target environment (no production-mode test set), STOP — write a leftover entry to `_docs/_process_leftovers/` per `tracker.mdc`, do not proceed to watch window without smoke coverage. Write `Aborted: smoke tests missing for prod-mode target` and ASK the user.
### Phase 5: Watch Window
**Goal**: Observe the live system for a defined window to catch latent regressions.
1. Read `observability.md` for the project's metrics, dashboards, and threshold definitions. Required watch metrics for any production target (per cursor-meta convention) include error rate, request rate, p99 latency, and saturation (CPU/memory/queue-depth).
2. Compute the watch-window duration from `deployment_procedures.md`. If unspecified, default to **15 minutes** for staging and **60 minutes** for production.
3. Poll the observability backend at 1-minute intervals (or the configured cadence). For each interval, record metric snapshots to the release report.
4. Threshold rules:
- **Hard breach** (auto-rollback): error-rate ≥ 2× baseline, p99 latency ≥ 3× baseline, any health-check failure persisting for 2 consecutive intervals.
- **Soft breach** (escalate): metric drift between 1.5× and 2× baseline, single-interval health blip, queue-depth steady but elevated.
- **No data** (escalate): if metrics are not flowing within the first 3 minutes, treat the absence as a hard breach — observability is itself broken.
5. **AUTO-ROLLBACK trigger**: hard breach at any interval. Move to Phase 6 with verdict `Rolled-Back: <metric> breached <multiplier>× baseline at T+<minutes>`.
6. **ESCALATE trigger**: soft breach. Pause polling, surface the metric, and ask the user A/B/C:
- A) Continue watch — accept current drift, keep polling
- B) Roll back now — treat soft drift as hard
- C) Extend watch window by N minutes
7. End of watch window with no breach → proceed to Phase 6.
The watch window cannot be skipped. If the user explicitly demands skipping (e.g., emergency rollforward), record the override reason in the release report and continue, but mark the verdict as `Released-with-override` — this triggers an automatic incident retrospective per `retrospective/SKILL.md`.
### Phase 6: Commit or Rollback
**Goal**: Finalize the release with a definitive verdict on disk.
**Path A — Commit (clean release)**:
1. Update tracker tickets: every ticket in scope moves to `Released` (or `Done`, per project convention defined in `tracker.mdc` / `_docs/_repo-config.yaml`).
2. Tag the git HEAD with `release/<version>` (or the project's tag convention from `deployment_procedures.md`).
3. Write the final `Released` verdict to the release report with a summary table.
4. Trigger `/retrospective --cycle-end` with this release as the cycle terminus.
5. Auto-chain to autodev's next step (Retrospective in greenfield, or feature-cycle loop start in existing-code).
**Path B — Rollback (auto-fired or user-elected)**:
1. Run `scripts/deploy.sh --rollback` with the rollback target captured in Phase 1.
2. Stream output to a new file `RELEASE_DIR/rollback_<version>_<env>_<YYYY-MM-DD-HHmm>.md` AND append a summary to the original release report under `## Rollback`.
3. Re-run Phase 4 (smoke test) and a 5-minute mini watch window against the rolled-back version. If THAT also fails, escalate immediately — the system is in an unknown state and needs human takeover.
4. Update tracker tickets back to `Ready for Release` (or the project's pre-release status).
5. Write the final `Rolled-Back` verdict with full reason chain.
6. Auto-trigger `/retrospective --incident` with this release as the incident anchor (per `retrospective/SKILL.md` incident mode).
7. Do NOT auto-chain to anything else — the user owns the next step.
**Path C — Aborted**:
Reached only via Phase 1 Choose B, Phase 4 smoke-tests-missing escalation, or any phase that detects a precondition violation. Write `Aborted: <reason>` to the release report. Do not auto-chain.
## Self-verification
- [ ] Release report exists at `RELEASE_DIR/release_<version>_<env>_<timestamp>.md` with verdict (Released / Rolled-Back / Aborted)
- [ ] Every phase that ran has a section in the release report with timestamps and tool output
- [ ] On Released: tracker tickets moved to release status; git tag pushed (if convention)
- [ ] On Rolled-Back: rollback report exists at `RELEASE_DIR/rollback_<version>_<env>_<timestamp>.md`; tracker tickets moved back to pre-release status; incident retrospective scheduled
- [ ] On Aborted: reason recorded; no live-system changes attempted; no tracker movement
- [ ] No phase was skipped without an explicit reason recorded in the release report
## Escalation Rules
| Situation | Action |
|-----------|--------|
| `scripts/deploy.sh` missing or `--rollback` unsupported | STOP — return to `/deploy` Step 7, do not patch the script in `/release` |
| Registry auth failure during pre-release | STOP — fix credentials at infra layer (per `coderule.mdc`); do not embed creds in the script |
| Smoke tests missing for prod target | STOP — write a leftover; do not improvise smoke tests in `/release` |
| Observability backend unreachable | STOP — observability blindness is itself a release blocker |
| User asks to skip the watch window | Record override, mark verdict `Released-with-override`, fire incident retro |
| Rollback also fails its smoke test | ESCALATE to user — system is in unknown state; do not loop deploys |
| Tracker MCP returns Unauthorized during ticket movement | Per `tracker.mdc`, write a leftover entry; do NOT silently continue without confirming the move |
| Multiple environments named in user request | STOP — one release per invocation; ask user to pick one |
| Production smoke test would touch real customer data | STOP — that is a `coderule.mdc` violation; ask user to define a smoke endpoint or test account |
## Common Mistakes
- **Skipping the watch window when "everything looks fine after deploy"** — a deploy that exited 0 is not a release that's stable. Watch is mandatory.
- **Faking smoke tests** to pass the gate when the prod test set is incomplete. STOP and surface the gap; do not embed prod URLs into ad-hoc curl commands.
- **Rolling forward through a failure** ("the next deploy will fix it"). Roll back first, fix the cause, then deploy a real fix.
- **Treating the release report as optional** when only an internal tool changed. Every release writes a report — the audit trail is the value, not the prose volume.
- **Approving manual gates yourself** without the user's input when restrictions require human approval. The release skill records, the human approves.
- **Reusing `release_<version>` filenames** across attempted releases. Always include the timestamp in the filename so re-attempts are visible side-by-side.
- **Letting tracker drift silently** between release attempts. If Phase 6 cannot move tickets, the release is not complete — write a leftover and stop.
## Project Mode vs Standalone
- **Project mode** (default): autodev invokes `/release` after `/deploy`. State writes occur under `_docs/_autodev_state.md`. Full integration with retrospective and feature-cycle loop.
- **Standalone mode**: `/release` invoked directly with `@<artifact>` (rare; usually only for re-running a rollback against a specific version). All outputs still go to `RELEASE_DIR/`.
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Release (6 phases, 3 verdicts) │
├────────────────────────────────────────────────────────────────┤
│ Phase 1 Pre-Release Gate │
│ AC + tests + change summary + rollback path │
│ [BLOCKING — user A/B/C] │
│ Phase 2 Strategy Select │
│ all-at-once · blue-green · canary · manual │
│ [BLOCKING — user picks] │
│ Phase 3 Execute │
│ scripts/deploy.sh, capture exit code + logs │
│ [AUTO-ROLLBACK on non-zero or hang] │
│ Phase 4 Smoke Test │
│ /test-run --prod-smoke against target │
│ [AUTO-ROLLBACK on any failure] │
│ Phase 5 Watch Window │
│ Poll observability for N minutes │
│ [AUTO-ROLLBACK on hard breach; escalate on soft] │
│ Phase 6 Commit or Rollback │
│ Released → tracker, tag, retrospective │
│ Rolled-Back → tracker reset, incident retrospective │
│ Aborted → no live-system change │
├────────────────────────────────────────────────────────────────┤
│ Principles: real execution · verifiable rollback · │
│ quiet failure = release failure · │
│ watch window mandatory │
└────────────────────────────────────────────────────────────────┘
```
@@ -0,0 +1,114 @@
# Release Report — {version} → {env}
- **Date**: {YYYY-MM-DD HH:MM} {timezone}
- **Operator**: {user}
- **Strategy**: {all-at-once | blue-green | canary | manual}
- **Verdict**: {Released | Released-with-override | Rolled-Back | Aborted}
- **Verdict reason**: {one-line summary}
## Pre-Release Gate (Phase 1)
### Acceptance Criteria
| AC ID | Status | Evidence |
|-------|--------|----------|
| AC-001 | Met / Unmet | path:section, test report, etc. |
### Test Status
| Suite | Pass | Fail | Skip | Source |
|-------|------|------|------|--------|
| Functional | N | N | N | _docs/03_implementation/{batch}.md |
| Performance | N | N | N | _docs/06_metrics/perf_*.md |
### Change Summary
| Component | Tickets | Type |
|-----------|---------|------|
| {component} | TKT-001, TKT-002 | feature / fix / breaking / security |
### Rollback Plan
- Previous version: `{previous-version}` (registry digest: `{sha}`)
- Rollback script: `scripts/deploy.sh --rollback`
- Rollback target verified pullable: yes / no
- Rollback target verified bootable in target env: yes / no
### Restrictions / Approvals
- Change-window restrictions: {none | description}
- Manual approvals required: {none | reference to approval file}
### Tracker State at Gate
- Tickets in scope: {N}
- Tickets blocking release: {0 — list any}
## Strategy Select (Phase 2)
- Recommended: {strategy} — reasoning
- Chosen: {strategy} — reasoning (if differs from recommended)
## Execute (Phase 3)
- Start: {timestamp}
- End: {timestamp}
- Exit code: {0 / non-zero}
```
<scripts/deploy.sh stdout/stderr stream, with timestamps>
```
## Smoke Test (Phase 4)
- Mode: {/test-run --prod-smoke | manual smoke set}
- Start: {timestamp}
- End: {timestamp}
| Test | Result | Notes |
|------|--------|-------|
| {name} | Pass / Fail | response time, status, etc. |
## Watch Window (Phase 5)
- Duration: {minutes}
- Cadence: {minutes per poll}
- Backend: {observability source — Prometheus, CloudWatch, Datadog, etc.}
| T+min | error_rate | rps | p99_latency | saturation | health | notes |
|-------|------------|-----|-------------|------------|--------|-------|
| 0 | … | … | … | … | OK | … |
| 1 | … | … | … | … | OK | … |
| … | … | … | … | … | … | … |
### Threshold breaches
- {None | "p99 latency 1.7× baseline at T+8 — soft breach, user accepted continuation"}
## Commit or Rollback (Phase 6)
### If Released
- Tracker tickets moved: {list}
- Git tag pushed: {tag} → {sha}
- Retrospective scheduled: yes — {/retrospective --cycle-end output path}
### If Rolled-Back
- Trigger: {auto / user-elected}
- Reason: {phase + one-line cause}
- Rollback start: {timestamp}
- Rollback end: {timestamp}
- Post-rollback smoke: pass / fail
- Tracker tickets moved back: {list}
- Incident retrospective scheduled: yes — {/retrospective --incident output path}
### If Aborted
- Phase that aborted: {1 / 2 / 3 / 4 / 5}
- Reason: {one-line cause}
- No live-system changes attempted: yes / no (if live changes, document under Phase 3 above and treat as Rolled-Back instead)
## Lessons (one-liners; full incident retro if Rolled-Back / Released-with-override)
- {Optional: short one-liner observations the operator wants the next /retrospective to consider}
+4 -4
View File
@@ -2,9 +2,9 @@
name: retrospective name: retrospective
description: | description: |
Collect metrics from implementation batch reports and code review findings, analyze trends across cycles, Collect metrics from implementation batch reports and code review findings, analyze trends across cycles,
and produce improvement reports with actionable recommendations. and produce improvement reports plus a lessons-log update with actionable recommendations.
3-step workflow: collect metrics, analyze trends, produce report. 4-step workflow: collect metrics, analyze trends, produce report, update lessons log.
Outputs to _docs/06_metrics/. Outputs to _docs/06_metrics/ and appends to _docs/LESSONS.md (ring buffer, last 15).
Trigger phrases: Trigger phrases:
- "retrospective", "retro", "run retro" - "retrospective", "retro", "run retro"
- "metrics review", "feedback loop" - "metrics review", "feedback loop"
@@ -232,7 +232,7 @@ Present the report summary to the user.
``` ```
┌────────────────────────────────────────────────────────────────┐ ┌────────────────────────────────────────────────────────────────┐
│ Retrospective (3-Step Method) │ │ Retrospective (4-Step Method) │
├────────────────────────────────────────────────────────────────┤ ├────────────────────────────────────────────────────────────────┤
│ PREREQ: batch reports exist in _docs/03_implementation/ │ │ PREREQ: batch reports exist in _docs/03_implementation/ │
│ │ │ │
+4 -3
View File
@@ -202,12 +202,12 @@ If invoked in `cycle-update` mode (see "Invocation Modes" above), read and follo
| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed | | Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
| Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template | | Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template |
| Ambiguous requirements | ASK user | | Ambiguous requirements | ASK user |
| Input data coverage below 75% (Phase 1) | Search internet for supplementary data, ASK user to validate | | Input data coverage below the canonical threshold (Phase 1) | Search internet for supplementary data, ASK user to validate. See `.cursor/rules/cursor-meta.mdc` Quality Thresholds for the canonical 75% number — do not hardcode a different threshold here. |
| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding | | Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
| Test scenario conflicts with restrictions | ASK user to clarify intent | | Test scenario conflicts with restrictions | ASK user to clarify intent |
| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md | | System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test | | Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
| Final coverage below 75% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec | | Final coverage below the canonical threshold after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec (see `cursor-meta.mdc` Quality Thresholds) |
## Common Mistakes ## Common Mistakes
@@ -252,7 +252,8 @@ When the user wants to:
│ │ │ │
│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │ │ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │
│ → phases/03-data-validation-gate.md │ │ → phases/03-data-validation-gate.md │
│ [BLOCKING: coverage ≥ 75% required to pass] │ [BLOCKING: coverage ≥ canonical threshold required to pass —
│ see cursor-meta.mdc Quality Thresholds (75%)] │
│ │ │ │
│ Hardware-Dependency Assessment (BLOCKING, pre-Phase-4) │ │ Hardware-Dependency Assessment (BLOCKING, pre-Phase-4) │
│ → phases/hardware-assessment.md │ │ → phases/hardware-assessment.md │
@@ -1,7 +1,7 @@
# Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) # Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)
**Role**: Professional Quality Assurance Engineer **Role**: Professional Quality Assurance Engineer
**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 75%. **Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above the canonical threshold (currently 75% — see `.cursor/rules/cursor-meta.mdc` Quality Thresholds; never hardcode a different number in any phase).
**Constraints**: This phase is MANDATORY and cannot be skipped. **Constraints**: This phase is MANDATORY and cannot be skipped.
## Step 1 — Build the requirements checklist ## Step 1 — Build the requirements checklist
+5 -3
View File
@@ -1,8 +1,8 @@
# Dependencies Table # Dependencies Table
**Date**: 2026-05-19 (refreshed late-morning after 11:27 Jetson Tier-2 e2e run for AZ-618 — surfaced a NEW gap: replay-mode `Config` lacks `c6_tile_cache` block, so `build_pre_constructed → _build_c6_descriptor_index → _c6_config` raises `KeyError` for AC-1/2/5/6. Follow-up filed as AZ-687 (2pt) under E-AZ-602 with guard at the bootstrap layer (NOT silent fallback in `_c6_config`). Earlier same-day mid-day after AZ-618 split: per the spec author's own Sizing-note recommendation + user-rule cap on PBI complexity, AZ-618 was split into 6 subtasks AZ-619..AZ-624 in Jira (subtasks of AZ-618; epic AZ-602 stays grandparent). AZ-618 retained at 0pt as the umbrella tracker; aggregate actionable work is 16pt across the subtasks (vs. AZ-618's original 5pt filing — author's "likely a true 8" caveat was understated due to c5_isam2_graph_handle ordering + GPU builder unknowns). Earlier same-day refresh at start of Step-7 rewind for AZ-618 — Step-11 Jetson tier-2 e2e gate identified missing internal product implementation: `runtime_root.main()` does not build the airborne `pre_constructed` infrastructure dict before `compose_root()`; AZ-618 = 5pt cross-cutting follow-up to AZ-591, lives under E-AZ-602; all 12 dep tasks are in `done/`. Earlier 2026-05-16 (cycle-1 completeness-gate post-mortem): AZ-589 + AZ-590 closed Won't Fix — were wrong abstraction (OKVIS v1 `ThreadedKFVio` API doesn't exist in OKVIS2 upstream; VINS-Mono `cpp/vins_mono/upstream/` submodule never existed; the actual production gap is the empty central `_STRATEGY_REGISTRY` affecting EVERY component with a strategy-selecting config field, not just c1_vio); replaced by AZ-591 (cross-cutting compose_root per-binary bootstrap, todo/, 5pt) + AZ-592 (AZ-332 Tier-2 validation bundle, backlog/, 5pt placeholder) + AZ-593 (AZ-333 Tier-2 validation bundle, backlog/, 5pt placeholder); AZ-332 + AZ-333 re-classified in gate report from FAIL to BLOCKED-on-Tier-2 per the original tasks' Implementation Notes deferral handles; earlier same-day after end of cycle-1 gate: AZ-589 + AZ-590 created (now closed); earlier same-day after end of Batch 64: AZ-558 implementation closed — `MavlinkTransport` seam now routes every C8 outbound MAVLink byte; AZ-401 AC-9 + AZ-404 AC-4b unskipped together; encoder helpers extracted to `_outbound_mavlink_payloads.py`; live-mode `compose_root` injection deferred to whichever future batch registers AP/iNav strategies in an airborne binary; earlier 2026-05-14: refreshed at start of Batch 63: AZ-559 closed Won't Fix — gap was illusory; `TileSource.ONBOARD_INGEST` + `TileMetadata.quality_metadata` + `write_tile`'s `FreshnessRejectionError` already cover the AZ-389 mid-flight ingest semantic without any new API; AZ-389 dep restored to AZ-303; earlier same-day after Batch 61: AZ-558 follow-up added — routes C8 outbound encoder bytes through `MavlinkTransport` seam; closes AZ-401 AC-9 deferred during batch 61 due to encoder-side routing not being in the AZ-401 task envelope; earlier same-day after cumulative review batches 52-54: AZ-528 hygiene PBI added for c1_vio strategy facade orchestration-spine 3-way duplication (Medium); earlier same-day after Batch 53: AZ-333 VINS-Mono landed — first c1_vio strategy after the AZ-332 OKVIS2 production-default; consolidation hygiene for the strategy-facade duplication deferred to a post-AZ-334 PBI; earlier same-day after Batch 51: AZ-527 hygiene PBI added from cumulative review batches 49-51 F1; 2026-05-13: AZ-526 hygiene PBI added from cumulative review batches 46-48 F1+F3; same-day refresh after Batch 44 SRP refactor: AZ-317 superseded; AZ-329 + AZ-330 specs rewritten; AZ-523 + AZ-524 audit-trail tickets added; E-C12 epic renamed `Operator Pre-flight Tooling``Operator Pre-flight Orchestrator`; earlier same-day refresh: AZ-507 + AZ-508 hygiene PBIs from cumulative review batches 31-33; 2026-05-11: AZ-489 + AZ-490 ADR-010 operator-origin path) **Date**: 2026-05-21 (cycle-3 Step 9 New Task — added AZ-776 (3pt open-loop ESKF composition profile via `c4_pose.enabled` flag, no deps, epic AZ-602) + AZ-777 (5pt Derkachi C6 reference tile cache + FAISS descriptor index from OSM/CARTO basemap, depends on AZ-776, epic AZ-602). Both unblock the 7 currently-`@xfail`-masked Derkachi e2e tests on Jetson; AZ-776 unblocks 5 (AC-1, AC-2, AC-5, AC-6 realtime, AC-6 asap), AZ-777 unblocks the remaining 2 (AC-3 + AZ-699 real-flight verdict). Earlier 2026-05-19 (refreshed late-morning after 11:27 Jetson Tier-2 e2e run for AZ-618 — surfaced a NEW gap: replay-mode `Config` lacks `c6_tile_cache` block, so `build_pre_constructed → _build_c6_descriptor_index → _c6_config` raises `KeyError` for AC-1/2/5/6. Follow-up filed as AZ-687 (2pt) under E-AZ-602 with guard at the bootstrap layer (NOT silent fallback in `_c6_config`). Earlier same-day mid-day after AZ-618 split: per the spec author's own Sizing-note recommendation + user-rule cap on PBI complexity, AZ-618 was split into 6 subtasks AZ-619..AZ-624 in Jira (subtasks of AZ-618; epic AZ-602 stays grandparent). AZ-618 retained at 0pt as the umbrella tracker; aggregate actionable work is 16pt across the subtasks (vs. AZ-618's original 5pt filing — author's "likely a true 8" caveat was understated due to c5_isam2_graph_handle ordering + GPU builder unknowns). Earlier same-day refresh at start of Step-7 rewind for AZ-618 — Step-11 Jetson tier-2 e2e gate identified missing internal product implementation: `runtime_root.main()` does not build the airborne `pre_constructed` infrastructure dict before `compose_root()`; AZ-618 = 5pt cross-cutting follow-up to AZ-591, lives under E-AZ-602; all 12 dep tasks are in `done/`. Earlier 2026-05-16 (cycle-1 completeness-gate post-mortem): AZ-589 + AZ-590 closed Won't Fix — were wrong abstraction (OKVIS v1 `ThreadedKFVio` API doesn't exist in OKVIS2 upstream; VINS-Mono `cpp/vins_mono/upstream/` submodule never existed; the actual production gap is the empty central `_STRATEGY_REGISTRY` affecting EVERY component with a strategy-selecting config field, not just c1_vio); replaced by AZ-591 (cross-cutting compose_root per-binary bootstrap, todo/, 5pt) + AZ-592 (AZ-332 Tier-2 validation bundle, backlog/, 5pt placeholder) + AZ-593 (AZ-333 Tier-2 validation bundle, backlog/, 5pt placeholder); AZ-332 + AZ-333 re-classified in gate report from FAIL to BLOCKED-on-Tier-2 per the original tasks' Implementation Notes deferral handles; earlier same-day after end of cycle-1 gate: AZ-589 + AZ-590 created (now closed); earlier same-day after end of Batch 64: AZ-558 implementation closed — `MavlinkTransport` seam now routes every C8 outbound MAVLink byte; AZ-401 AC-9 + AZ-404 AC-4b unskipped together; encoder helpers extracted to `_outbound_mavlink_payloads.py`; live-mode `compose_root` injection deferred to whichever future batch registers AP/iNav strategies in an airborne binary; earlier 2026-05-14: refreshed at start of Batch 63: AZ-559 closed Won't Fix — gap was illusory; `TileSource.ONBOARD_INGEST` + `TileMetadata.quality_metadata` + `write_tile`'s `FreshnessRejectionError` already cover the AZ-389 mid-flight ingest semantic without any new API; AZ-389 dep restored to AZ-303; earlier same-day after Batch 61: AZ-558 follow-up added — routes C8 outbound encoder bytes through `MavlinkTransport` seam; closes AZ-401 AC-9 deferred during batch 61 due to encoder-side routing not being in the AZ-401 task envelope; earlier same-day after cumulative review batches 52-54: AZ-528 hygiene PBI added for c1_vio strategy facade orchestration-spine 3-way duplication (Medium); earlier same-day after Batch 53: AZ-333 VINS-Mono landed — first c1_vio strategy after the AZ-332 OKVIS2 production-default; consolidation hygiene for the strategy-facade duplication deferred to a post-AZ-334 PBI; earlier same-day after Batch 51: AZ-527 hygiene PBI added from cumulative review batches 49-51 F1; 2026-05-13: AZ-526 hygiene PBI added from cumulative review batches 46-48 F1+F3; same-day refresh after Batch 44 SRP refactor: AZ-317 superseded; AZ-329 + AZ-330 specs rewritten; AZ-523 + AZ-524 audit-trail tickets added; E-C12 epic renamed `Operator Pre-flight Tooling``Operator Pre-flight Orchestrator`; earlier same-day refresh: AZ-507 + AZ-508 hygiene PBIs from cumulative review batches 31-33; 2026-05-11: AZ-489 + AZ-490 ADR-010 operator-origin path)
**Total Tasks**: 163 (122 product + 41 blackbox-test) — AZ-317 retained in the table marked SUPERSEDED for audit; AZ-523 (C11 gate removal) + AZ-524 (C12 rename) added as 2 closed audit-trail tasks; AZ-526 = 2pt clock-helper hygiene; AZ-527 = 2pt c2 engine-dim helper hygiene; AZ-528 = 3pt c1_vio facade-spine hygiene; AZ-558 = 3pt MavlinkTransport routing follow-up; AZ-559 closed Won't Fix; AZ-589 + AZ-590 closed Won't Fix (kept in table as 0pt audit-trail rows); AZ-591 = 5pt cross-cutting compose_root bootstrap (todo/); AZ-592 = 5pt OKVIS2 Tier-2 placeholder (backlog/); AZ-593 = 5pt VINS-Mono Tier-2 placeholder (backlog/); AZ-618 = 0pt umbrella (split into AZ-619..AZ-624 on 2026-05-19); AZ-619..AZ-624 = 6 subtasks of AZ-618 covering Phase A..F of the airborne `pre_constructed` assembly, summing to 16pt actionable work; AZ-687 = 2pt replay-mode guard follow-up surfaced by AZ-618 Tier-2 run on 2026-05-19 **Total Tasks**: 165 (124 product + 41 blackbox-test) — AZ-317 retained in the table marked SUPERSEDED for audit; AZ-523 (C11 gate removal) + AZ-524 (C12 rename) added as 2 closed audit-trail tasks; AZ-526 = 2pt clock-helper hygiene; AZ-527 = 2pt c2 engine-dim helper hygiene; AZ-528 = 3pt c1_vio facade-spine hygiene; AZ-558 = 3pt MavlinkTransport routing follow-up; AZ-559 closed Won't Fix; AZ-589 + AZ-590 closed Won't Fix (kept in table as 0pt audit-trail rows); AZ-591 = 5pt cross-cutting compose_root bootstrap (todo/); AZ-592 = 5pt OKVIS2 Tier-2 placeholder (backlog/); AZ-593 = 5pt VINS-Mono Tier-2 placeholder (backlog/); AZ-618 = 0pt umbrella (split into AZ-619..AZ-624 on 2026-05-19); AZ-619..AZ-624 = 6 subtasks of AZ-618 covering Phase A..F of the airborne `pre_constructed` assembly, summing to 16pt actionable work; AZ-687 = 2pt replay-mode guard follow-up surfaced by AZ-618 Tier-2 run on 2026-05-19
**Total Complexity Points**: 535 (402 product + 133 blackbox-test) — AZ-523 = 3pt, AZ-524 = 2pt, AZ-526 = 2pt, AZ-527 = 2pt, AZ-528 = 3pt, AZ-558 = 3pt, AZ-589 + AZ-590 retained at 5pt each but closed Won't Fix (treated as 0 effective pts going forward), AZ-591 = 5pt, AZ-592 = 5pt placeholder, AZ-593 = 5pt placeholder, AZ-618 = 0pt umbrella post-split, AZ-619 = 2pt, AZ-620 = 3pt, AZ-621 = 3pt, AZ-622 = 3pt, AZ-623 = 3pt, AZ-624 = 2pt, AZ-687 = 2pt **Total Complexity Points**: 543 (410 product + 133 blackbox-test) — +3pt AZ-776 + 5pt AZ-777 added 2026-05-21 — AZ-523 = 3pt, AZ-524 = 2pt, AZ-526 = 2pt, AZ-527 = 2pt, AZ-528 = 3pt, AZ-558 = 3pt, AZ-589 + AZ-590 retained at 5pt each but closed Won't Fix (treated as 0 effective pts going forward), AZ-591 = 5pt, AZ-592 = 5pt placeholder, AZ-593 = 5pt placeholder, AZ-618 = 0pt umbrella post-split, AZ-619 = 2pt, AZ-620 = 3pt, AZ-621 = 3pt, AZ-622 = 3pt, AZ-623 = 3pt, AZ-624 = 2pt, AZ-687 = 2pt
Dependencies columns list only the tracker-ID portion (descriptive tail Dependencies columns list only the tracker-ID portion (descriptive tail
text in each task spec is omitted here for table-readability). The text in each task spec is omitted here for table-readability). The
@@ -183,6 +183,8 @@ are all declared and documented below under **Cycle Check**.
| AZ-700 | T4: Replay map visualization (estimated vs ground-truth tracks) | 3 | AZ-699 | AZ-696 | | AZ-700 | T4: Replay map visualization (estimated vs ground-truth tracks) | 3 | AZ-699 | AZ-696 |
| AZ-701 | T5: HTTP Replay API service (POST tlog+video, return GPS fixes + map) | 5 | AZ-699, AZ-700 | AZ-696 | | AZ-701 | T5: HTTP Replay API service (POST tlog+video, return GPS fixes + map) | 5 | AZ-699, AZ-700 | AZ-696 |
| AZ-702 | T6: Topotek KHP20S30 camera calibration (factory-sheet approximation) | 1 | None | AZ-696 | | AZ-702 | T6: Topotek KHP20S30 camera calibration (factory-sheet approximation) | 1 | None | AZ-696 |
| AZ-776 | Open-loop ESKF composition profile (c4_pose.enabled flag) | 3 | None | AZ-602 |
| AZ-777 | Derkachi C6 reference tile cache + FAISS descriptor index (OSM/CARTO) | 5 | AZ-776 | AZ-602 |
## Notes ## Notes
@@ -0,0 +1,168 @@
# Open-loop ESKF composition profile for the airborne binary
**Task**: AZ-776_eskf_open_loop_composition_profile
**Name**: Make the c5_state=eskf path runnable end-to-end by allowing c4_pose to be excluded from the airborne composition
**Description**: Introduce an explicit `c4_pose.enabled: bool` config flag (default True). When False, the airborne bootstrap skips C4 wiring entirely, the C5 ESKF estimator composes alongside C1 (and any other configured components) with no iSAM2 graph handle, and `gps-denied-replay` exits 0 on the Derkachi fixture with one EstimatorOutput per video frame. Unblocks five of the seven `@xfail`-masked Derkachi e2e tests on Jetson.
**Complexity**: 3 points
**Dependencies**: None
**Component**: runtime_root / c4_pose / c5_state
**Tracker**: AZ-776
**Epic**: AZ-602
## Problem
The c5_state strategy `eskf` is registered (`_STRATEGY_REGISTRY`),
documented as the IT-12 mandatory simple baseline (`eskf_baseline.py`
line 30), and has unit tests
(`tests/unit/c5_state/test_az386_eskf_baseline.py`). It has NEVER been
composition-tested end-to-end. When the airborne binary is configured
with `config.components.c5_state.strategy = eskf`, the bootstrap fails
at compose time:
```
PoseEstimatorConfigError: build_pose_estimator: isam2_graph_handle does
not satisfy the C4 ISam2GraphHandle Protocol (missing get_pose_key /
update / compute_marginals / last_anchor_age_ms?)
```
Root cause: `EskfStateEstimator._build_eskf_state_estimator` returns
`(estimator, handle=None)` by design — the ESKF has no GTSAM graph.
`airborne_bootstrap._build_c5_state_estimator_pair` (line 779) seeds
`pre_constructed['c5_isam2_graph_handle'] = None`.
`_c4_pose_wrapper` (line 371) extracts the None.
`pose_factory.build_pose_estimator` (line 127) rejects it.
The IT-12 "ESKF is the mandatory simple baseline" mandate has no
executable production path today — the baseline exists at the component
layer but not at the system layer. Every Derkachi e2e attempt with
`c5_state=eskf` exits non-zero at compose time, producing 0 JSONL rows
and masking every downstream behaviour the heavy ACs are supposed to
exercise.
## Outcome
- The airborne bootstrap can compose a binary with `c1_vio` + `c5_state(strategy=eskf)` end-to-end against a real `gps-denied-replay` invocation without composing `c4_pose` and without a stub or no-op pose estimator masking the absence.
- The composition-mode selection is explicit in YAML — operator sets `config.components.c4_pose.enabled = False` (or omits the block entirely) to opt into the open-loop profile.
- `_run_replay_loop` in `runtime_root/__init__.py` is exercised end-to-end on Jetson by a non-`xfail` integration test (AC-1, AC-2, AC-5, AC-6 realtime, AC-6 asap in `tests/e2e/replay/test_derkachi_1min.py` un-xfail and pass).
- The replay protocol document records the new open-loop ESKF variant alongside the existing full-GTSAM path so future readers do not have to reverse-engineer it from the code.
- AZ-602 (E2E Tier-1 harness rehabilitation epic) advances by 5/7 of the Derkachi xfails removed; the remaining 2 (AC-3 and AZ-699) are AZ-777 territory.
## Scope
### Included
- Add an `enabled: bool = True` field to `C4PoseConfig` (and the corresponding YAML schema in `config/schema.py`).
- In `airborne_bootstrap.build_pre_constructed`: when `_replay_omits_component_block(config, "c4_pose")` OR `config.components["c4_pose"].enabled is False`, skip `_build_c5_state_estimator_pair`'s iSAM2-handle requirement; the C5 ESKF estimator still builds, but the handle slot is absent from the returned dict.
- In `airborne_bootstrap._AIRBORNE_REGISTRATIONS` (or wherever `compose_root` walks the component slugs): exclude `c4_pose` from the topological walk when the config flag says so. The downstream edge `_C5_STATE_DEPENDS_ON = ("c1_vio", "c4_pose")` becomes `("c1_vio",)` for the open-loop profile (or the dependency-walker treats a deselected slug as a no-op edge).
- Validation in `compose_root`: when `c4_pose.enabled is False`, refuse `c5_state.strategy = gtsam_isam2` with a clear `CompositionError` — the gtsam_isam2 estimator needs a real handle, so the open-loop profile is only valid against ESKF. Symmetric refusal when `c4_pose.enabled is True` AND `c5_state.strategy = eskf` (the ESKF still returns `handle=None`, so c4 would still reject).
- `tests/unit/runtime_root/`: composition test that exercises the open-loop profile end-to-end with `compose_root`, asserting (i) the returned components dict contains `c1_vio` and `c5_state` but NOT `c4_pose`; (ii) `gps-denied-replay --config <open-loop yaml>` exits 0 against a minimal fixture and emits one EstimatorOutput per video frame.
- `tests/unit/runtime_root/`: regression test that the live (full-GTSAM) path still composes identically after the flag is added — `c4_pose.enabled` unset defaults to True and the existing topological walk is unchanged.
- `_docs/02_document/contracts/replay/replay_protocol.md`: add an `## Open-loop ESKF variant` section documenting the per-frame loop pseudocode (mirrors the existing GTSAM one minus C2C4 stages) plus the YAML shape that selects it.
- `_docs/02_document/adr/`: create the ADR folder if absent, then write `ADR-012_open_loop_eskf_composition.md` recording the `c4_pose.enabled` flag, the strategy-pairing rules, and why the per-component flag was chosen over a top-level `composition_profile` knob.
- `tests/e2e/replay/test_derkachi_1min.py`: remove the `@pytest.mark.xfail` decorators on AC-1 (line 61), AC-2 (line 138), AC-5 (line 413), AC-6 realtime (line 453), AC-6 asap (line 479). Update the test conftest or fixture to drive the open-loop YAML profile.
- `_docs/02_document/components/06_c4_pose.md`: amend the component doc to document the `enabled` flag.
### Excluded
- Building the Derkachi C6 reference tile cache + descriptor index (AZ-777 territory).
- Implementing C2/C3/C4 anchoring against Derkachi imagery (depends on AZ-777).
- Un-xfailing `test_ac3_within_100m_80pct_of_ticks` (`test_derkachi_1min.py` line 174) — that test needs satellite anchoring, requires AZ-777.
- Un-xfailing `test_az699_real_flight_validation_emits_verdict_and_report` (`test_derkachi_real_tlog.py` line 174) — also requires AZ-777 for accuracy gate.
- Changing the production default — this task makes ESKF *runnable*, not *the default*.
- Renaming any existing config field, YAML block, or factory function (per AZ-618 umbrella's "MUST NOT touch any per-component factory signature" constraint).
## Acceptance Criteria
**AC-1: Open-loop profile composes**
Given a YAML config with `config.components.c4_pose.enabled = False` and `config.components.c5_state.strategy = eskf`
When `compose_root(config, pre_constructed=build_pre_constructed(config))` runs
Then it returns a components dict containing `c1_vio` and `c5_state` but NOT `c4_pose`, and no `PoseEstimatorConfigError` is raised
**AC-2: Full GTSAM profile unchanged**
Given a YAML config with `config.components.c4_pose` absent or `enabled = True` and `config.components.c5_state.strategy = gtsam_isam2`
When `compose_root` runs
Then the returned components dict contains `c1_vio`, `c4_pose`, `c5_state` exactly as before this task
**AC-3: Invalid pairing rejected**
Given a YAML config that pairs `c4_pose.enabled = True` with `c5_state.strategy = eskf`, OR `c4_pose.enabled = False` with `c5_state.strategy = gtsam_isam2`
When `compose_root` runs
Then it raises `CompositionError` naming both the conflicting fields and the valid pairings
**AC-4: Replay binary exits 0 on Derkachi open-loop**
Given the Derkachi 1-min fixture and a YAML config selecting the open-loop ESKF profile
When `gps-denied-replay --config <open-loop.yaml> --video <derkachi> --tlog <derkachi> --output <out.jsonl>` runs on Jetson
Then it exits with code 0 and `<out.jsonl>` contains one EstimatorOutput line per video frame (±10 % to allow for VIO INIT-state skips on the first few frames)
**AC-5: Replay protocol documents the variant**
Given the updated `_docs/02_document/contracts/replay/replay_protocol.md`
When a reader looks for the open-loop ESKF composition
Then there is a dedicated `## Open-loop ESKF variant` section with per-frame loop pseudocode and the YAML shape that selects it
**AC-6: ADR records the design choice**
Given the new `_docs/02_document/adr/ADR-012_open_loop_eskf_composition.md`
When a reader looks for why the per-component flag was chosen
Then ADR-012 records the alternatives considered (per-component flag vs. top-level profile vs. implicit rule) and the rationale for the per-component flag
**AC-7: Five Derkachi e2e xfails removed**
Given `tests/e2e/replay/test_derkachi_1min.py` after this task
When the file is read
Then the `@pytest.mark.xfail` decorators on AC-1 (line 61), AC-2 (line 138), AC-5 (line 413), AC-6 realtime (line 453), AC-6 asap (line 479) are removed and the underlying tests pass on the Jetson harness
## Non-Functional Requirements
**Performance**
- No measurable latency regression on the full-GTSAM path (the new flag check is O(1) at compose time, zero overhead per-frame).
- Open-loop profile per-frame latency budget remains the same 400 ms p95 (the path is strictly shorter — fewer components — so this should improve, not regress).
**Compatibility**
- Default `c4_pose.enabled = True` so every existing config file behaves identically without modification.
- No changes to `compose_root` public signature, `build_pre_constructed` public signature, or any per-component factory signature (per AZ-618 umbrella constraint).
**Reliability**
- Invalid YAML pairings (open-loop + gtsam_isam2 OR full-GTSAM + eskf) MUST fail loud at compose time, never silently fall back. An operator who misconfigures the profile sees a clear error referencing both conflicting fields, not a downstream `KeyError` or stack trace.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | `compose_root` with open-loop YAML → returned components dict | Contains `c1_vio`, `c5_state`; no `c4_pose` key |
| AC-1 | `build_pre_constructed` with open-loop YAML → no `c5_isam2_graph_handle` key in result | Key absent from dict |
| AC-2 | `compose_root` with default full-GTSAM YAML → returned components dict | Contains `c1_vio`, `c4_pose`, `c5_state` (regression guard) |
| AC-3 | `compose_root` with `c4_pose.enabled=False` + `c5_state.strategy=gtsam_isam2` | `CompositionError` raised naming both fields |
| AC-3 | `compose_root` with `c4_pose.enabled=True` + `c5_state.strategy=eskf` | `CompositionError` raised naming both fields |
| C4PoseConfig | `enabled` field defaults to `True` when unset in YAML | Round-trips through `load_config` |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|--------------|-------------------|----------------|
| AC-4 | Derkachi 1-min fixture + open-loop YAML profile | `gps-denied-replay` invoked on Jetson via `scripts/run-tests-jetson.sh` with `RUN_REPLAY_E2E=1 GPS_DENIED_TIER=2` | Exit 0; one EstimatorOutput line per video frame ±10 % | Perf, Compat |
| AC-7 | `test_derkachi_1min.py` AC-1, AC-2, AC-5, AC-6 realtime, AC-6 asap | Tests run on Jetson after this task | All five pass (xfail decorators removed) | Reliability |
## Constraints
- The flag MUST be `c4_pose.enabled: bool`, not a top-level `composition_profile` knob and not an implicit ADR-only rule. User chose this shape during AZ-776 scoping (cycle-3 Step 9, 2026-05-21). Rationale recorded in ADR-012.
- The flag MUST NOT change the C4 estimator's own runtime behaviour — it ONLY affects whether the component is wired into the composition graph. The C4 estimator itself remains a fully functional component.
- The `_run_replay_loop` warning at `runtime_root/__init__.py` (`replay_loop.satellite_anchoring_not_wired`, emitted every 25 frames) is the existing honest signal for the open-loop path. Do NOT remove it; AZ-777 will replace the wiring it warns about. The warning remains visible in the JSONL output during AZ-776 → AZ-777 transition.
- ADR file naming follows the existing pattern (`ADR-NNN_<slug>.md`). Increment past the highest current ADR number; if `_docs/02_document/adr/` is empty (as today), start at ADR-012 to leave room for the ADR-001..ADR-011 references already in code+docs.
## Risks & Mitigation
**Risk 1: Topological-walk regression on the live path**
- *Risk*: Modifying `_AIRBORNE_REGISTRATIONS` or the walker logic could silently break the live GTSAM composition.
- *Mitigation*: AC-2 is a regression guard. Run the full `tests/unit/runtime_root/` suite before declaring done.
**Risk 2: ESKF estimator surfaces a latent bug under the open-loop wiring**
- *Risk*: Until this task lands, `EskfStateEstimator` has never been driven by a real composition root + replay binary. Latent bugs (state initialization, IMU preintegration handoff, FDR write paths) may surface only at AC-4.
- *Mitigation*: Reproduce on Tier-1 Colima first (no GPU needed for ESKF). Use the Tier-2 Jetson run only as the AC-4 confirmation, not as the debug environment. Any non-trivial ESKF fix surfaced this way becomes a sibling ticket — do not in-scope creep.
**Risk 3: Replay protocol doc inconsistency**
- *Risk*: The new `## Open-loop ESKF variant` section may contradict the existing "Wire C1C5 + C6 + C7 + C13 exactly as in the live composition" line at line 188.
- *Mitigation*: Amend line 188 to say "Wire C1C5 + C6 + C7 + C13 exactly as in the live composition **for the full-GTSAM profile**; see `## Open-loop ESKF variant` for the open-loop case". One sentence; no ambiguity.
### ADR Impact
> Affects ADR-001 (composition root is single registration site): unchanged — still one registration site. The new flag selects *what* is registered, not *where*.
> Affects ADR-002 (build-flag gate is the lazy-loading boundary): unchanged — `BUILD_*` flags still gate physical linkage; `c4_pose.enabled` is a *runtime* selection on top of a built strategy.
> Affects ADR-009 (interface-first DI): unchanged — the open-loop path still resolves through interfaces, just with the C4 interface unbound.
> Affects ADR-011 (replay is a configuration, not a separate composition root): unchanged — open-loop is a configuration of the same airborne composition root; the new flag is orthogonal to the live/replay split.
@@ -0,0 +1,193 @@
# Derkachi C6 reference tile cache + descriptor index (OSM/CARTO basemap)
**Task**: AZ-777_derkachi_c6_reference_fixture
**Name**: Build the C6 reference tile cache + FAISS descriptor index for the Derkachi flight bbox so the full-protocol C1+C2+C3+C4+C5 pipeline can produce satellite anchors during e2e replay
**Description**: Add a reproducible build script that downloads OSM/CARTO basemap tiles for the Derkachi flight bbox (approx 50.0550.15 lat, 36.0536.15 lon), pre-computes feature descriptors via the same C7 backbone the airborne binary uses (DINOv2 or the configured VPR backbone), populates the C6 tile store + FAISS HNSW index, and integrates them into the e2e replay harness. Unblocks the two remaining `@xfail`-masked Derkachi tests on Jetson (`test_ac3_within_100m_80pct_of_ticks` and `test_az699_real_flight_validation_emits_verdict_and_report`) and produces the first honest AZ-699 accuracy verdict.
**Complexity**: 5 points
**Dependencies**: AZ-776_eskf_open_loop_composition_profile
**Component**: c6_tile_cache / e2e fixtures / input_data
**Tracker**: AZ-777
**Epic**: AZ-602
## Problem
The Derkachi e2e fixture
(`_docs/00_problem/input_data/flight_derkachi/`) ships the real
flight inputs (video, tlog, IMU, camera calibration) but DOES NOT
ship the C6 tile-cache artifacts that the replay protocol requires
the operator's pre-flight C10 stage to produce:
- `c6_tile_store` — persistent JPEG tiles covering the flight area at the chosen zoom levels
- `c6_descriptor_index` — FAISS index of VPR-backbone descriptors over those tiles
Without these artifacts:
- C2 VPR has no haystack to look up against — `c2_vpr.lookup` returns empty.
- C3 matcher has nothing to match against (depends on C2 candidates).
- C4 pose has no anchors — cannot estimate satellite-frame pose.
- C5 state has no anchors to fuse — runs open-loop on VIO only.
When `c5_state.strategy = gtsam_isam2` (the default that AZ-699's e2e
exercises), the composition reaches the per-frame loop but
`iSAM2.update` crashes at frame 1 with:
```
EstimatorFatalError: compute_marginals failed: Attempting to at the
key 'x2', which does not exist in the Values.
```
— because no C4 anchor was ever inserted (C2/C3/C4 have nothing to
match against).
AZ-776 (sibling, prerequisite) makes the open-loop C1+C5(ESKF)
composition runnable, but that path skips C2C4 entirely and accepts
unbounded drift. To validate the FULL protocol-compliant pipeline
against Derkachi — i.e. AC-3 (`≤100 m for 80 % of ticks`) and the
AZ-699 horizontal-error verdict — we need real C6 fixtures.
The replay protocol (`replay_protocol.md` line 214) explicitly states
"`BUILD_FAISS_INDEX` is ON in the airborne binary (live and replay
alike). C2 in replay queries the **real** C6 `FaissDescriptorIndex`,
populated by the pre-flight C10 build. This is the architectural
change vs. v1.0.0 of this contract." We have no such build for
Derkachi.
## Outcome
- A reproducible build script under `scripts/` produces the C6 artifacts (`tile_store` + `descriptor_index`) given the Derkachi bbox + zoom levels + camera calibration, deterministically on a clean checkout, in under 30 minutes on a developer workstation.
- Reference imagery source is OSM-tile-server-distributed basemap (CARTO Voyager or equivalent CC-BY-licensed source). Each tile carries the source URL + license attribution in its metadata sidecar.
- The Derkachi fixture directory documents the build invocation; tiles + index are EITHER committed to the repo (if total size ≤ 100 MB) OR built on-demand from the script (if larger) — decision recorded in the fixture README.
- `tests/e2e/replay/conftest.py`'s `operator_pre_flight_setup` fixture is replaced (or extended) to mount the prebuilt artifacts into the e2e-runner container. The mock-suite-sat-service stub is retired for the C6-served paths (it remains for the C12 operator-workflow AC-8).
- After this task ships (with AZ-776), un-xfail `test_ac3_within_100m_80pct_of_ticks` (`test_derkachi_1min.py` line 174) AND `test_az699_real_flight_validation_emits_verdict_and_report` (`test_derkachi_real_tlog.py` line 174); both pass on the Jetson harness.
- The first honest AZ-699 verdict lands at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the full horizontal-error distribution. Whether the verdict is PASS or FAIL is the honest finding — this task's success is that the verdict is *produced* against the real pipeline, not that it is necessarily green.
## Scope
### Included
- `scripts/build_derkachi_c6_fixture.py` (or equivalent module under `e2e/fixtures/derkachi_c6/`): reproducible build pipeline that:
- Reads the Derkachi bbox + zoom levels from a small YAML config (`tests/fixtures/derkachi_c6/bbox.yaml`).
- Downloads OSM/CARTO basemap tiles into `<output>/tiles/{zoom}/{x}/{y}.jpg` mirroring `satellite-provider`'s on-disk layout (per architecture principle #5).
- Computes per-tile descriptors via the same C7 backbone the airborne binary uses (configurable; defaults to whatever `config.components.c2_vpr.strategy`'s feature dimension is — e.g. UltraVPR or NetVLAD).
- Builds a FAISS HNSW index over the descriptors, writes via `faiss.write_index` + atomicwrites + SHA-256 content-hash gate (per D-C10-3).
- Emits a manifest JSON recording tile count, bbox, zoom levels, backbone, descriptor dimension, FAISS index parameters, source URL template, license, and the SHA-256 of every artifact.
- `tests/fixtures/derkachi_c6/bbox.yaml`: the bbox + zoom + backbone config consumed by the build script. Committed.
- `tests/fixtures/derkachi_c6/README.md`: how to rebuild + license attribution + estimated artifact size.
- Build the artifacts once, decide commit vs on-demand:
- If total size ≤ 100 MB → commit to `_docs/00_problem/input_data/flight_derkachi/c6_cache/` (under LFS).
- If > 100 MB → keep build-on-demand only, document the build invocation in the fixture README, and add a `scripts/run-tests-jetson.sh` pre-step that builds if absent.
- `tests/e2e/replay/conftest.py`: replace `operator_pre_flight_setup`'s mock with a real fixture that mounts the prebuilt artifacts into the e2e-runner container at the expected paths (`/opt/tiles/`, `/opt/descriptor_index.index`).
- `docker-compose.test.yml` + `docker-compose.test.jetson.yml`: mount the artifacts into the `e2e-runner` service (bind mount or named volume), set `c6_tile_store.path` + `c6_descriptor_index.path` env vars.
- `tests/e2e/replay/test_derkachi_1min.py`: remove the `@pytest.mark.xfail` decorator on AC-3 (line 174).
- `tests/e2e/replay/test_derkachi_real_tlog.py`: remove the `@pytest.mark.xfail` decorator on AZ-699 (line 174).
- `_docs/00_problem/input_data/flight_derkachi/README.md`: document the new C6 artifacts + build invocation + license attribution.
- `_docs/02_document/contracts/c6_tile_cache/`: if a contract file exists for the descriptor-index format, append a Consumer entry naming this fixture; if not, no new contract needed.
### Excluded
- Multi-flight fixtures — just Derkachi. (Other flights would each need their own C6 build invocation.)
- Online tile download at test time — the e2e harness MUST remain offline (per replay protocol Invariant 5 / RESTRICT-SAT-1 / NFT-SEC-02; the docker compose `internal: true` network). The build script downloads tiles AT BUILD TIME from the developer workstation; the e2e harness only sees the prebuilt artifacts.
- Replacing the mock-suite-sat-service stub for the C12 operator-workflow `test_ac8_operator_workflow` test — that test exercises the D-PROJ-2 ingest contract which is parent-suite work, not in scope here.
- Building tiles for any backbone other than the airborne-default. If the operator wants a different backbone, they re-run the script with a different `--backbone` flag; this task only commits the default-backbone artifacts.
- Switching the airborne C6 backend from Postgres-mirroring to anything else — the build script writes the same on-disk layout the production C6 expects.
- AZ-776 (sibling): this task does NOT introduce the `c4_pose.enabled` flag or the open-loop composition profile. AZ-776 must land first to unblock the open-loop xfails (AC-1, AC-2, AC-5, AC-6); this task targets the full-GTSAM xfails (AC-3, AZ-699).
## Acceptance Criteria
**AC-1: Reproducible build**
Given a clean checkout
When `python scripts/build_derkachi_c6_fixture.py --output tests/fixtures/derkachi_c6/out --bbox tests/fixtures/derkachi_c6/bbox.yaml` runs
Then it produces a `tiles/` directory in the documented `{zoom}/{x}/{y}.jpg` layout, a FAISS `.index` file with a SHA-256-verified content hash, and a `manifest.json` recording tile count, bbox, backbone, descriptor dimension, FAISS parameters, source URL template, license, and per-artifact SHA-256, in under 30 minutes on a developer workstation
**AC-2: License attribution**
Given the produced artifacts
When the manifest is inspected
Then it records the tile source URL template, the license name (CC-BY-3.0 or CC-BY-4.0 as applicable), and the attribution string the operator must surface in any derived publication
**AC-3: Offline e2e harness**
Given the prebuilt C6 artifacts mounted into the e2e-runner container
When `scripts/run-tests-jetson.sh` runs on Jetson with `RUN_REPLAY_E2E=1 GPS_DENIED_TIER=2` and the Docker compose network is `internal: true`
Then the test harness never reaches out to any external host; all C6 queries are served from the mounted artifacts
**AC-4: Full-protocol e2e passes**
Given AZ-776 has landed AND the C6 artifacts are mounted AND the YAML config selects `c5_state.strategy = gtsam_isam2` with `c4_pose.enabled = True`
When `gps-denied-replay` runs the Derkachi 1-min fixture on Jetson
Then it exits with code 0, emits one EstimatorOutput per video frame, `test_ac3_within_100m_80pct_of_ticks` un-xfails and passes (≥80 % of ticks within 100 m of ground truth), and the per-frame loop emits `replay.satellite_anchor_inserted` log lines (not the existing `satellite_anchoring_not_wired` warning)
**AC-5: AZ-699 produces an honest verdict**
Given AZ-776 has landed AND the C6 artifacts are mounted AND the real flight video + factory calibration are present (already are)
When `test_az699_real_flight_validation_emits_verdict_and_report` runs on Jetson
Then it un-xfails, the test runs to completion within the 15-min NFR budget, and `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` records the horizontal-error distribution with the honest PASS/FAIL verdict against the ≥80 % within 100 m gate
**AC-6: Fixture README documents rebuild**
Given the updated `_docs/00_problem/input_data/flight_derkachi/README.md`
When a new contributor reads it
Then it documents (i) what C6 artifacts now exist, (ii) the exact `python scripts/build_derkachi_c6_fixture.py …` invocation to rebuild, (iii) the license attribution operators must propagate, (iv) the size-on-disk decision (committed vs. build-on-demand)
## Non-Functional Requirements
**Performance**
- Build script completes in ≤ 30 minutes on a developer workstation (Apple Silicon or x86 Linux, no GPU required for OSM tile download + descriptor pre-compute via the CPU-fallback path of the backbone).
- Built artifacts do not regress the airborne C2 lookup latency budget — the FAISS HNSW parameters MUST match what production C6 expects (M, efConstruction, efSearch); the index is built once and never rebuilt at runtime.
**Compatibility**
- Tile on-disk layout `{zoom}/{x}/{y}.jpg` MUST be byte-equivalent to `satellite-provider`'s layout (architecture principle #5) so a future post-landing upload would be byte-identical.
- FAISS index format MUST be loadable by the airborne `c6_descriptor_index.FaissDescriptorIndex` impl without code changes.
- Descriptor dimension MUST match the configured C7 backbone's output dimension — the build script asserts this at start.
**Reliability**
- Build script MUST fail loud on partial downloads (network error, HTTP 429/500, malformed tile) rather than silently producing an incomplete tile store. Resume-from-partial is allowed but each resumed run re-verifies SHA-256 of every committed tile.
- The SHA-256 content-hash gate on the FAISS index (per D-C10-3) MUST be enforced — operator can verify a downloaded fixture matches what was built.
**Security**
- Reference imagery URLs MUST be HTTPS. Tile metadata MUST record the exact source URL so license auditors can verify attribution.
- No API keys committed to the repo — if the chosen tile source requires registration, the build script reads the key from an env var and documents the env var name in the fixture README.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Build script produces `tiles/`, `descriptor_index.index`, `manifest.json` on a small mock bbox | All three artifacts exist, manifest fields populated |
| AC-1 | SHA-256 of `descriptor_index.index` recorded in manifest matches actual file hash | Hashes match |
| AC-2 | Manifest records source URL template + license + attribution | All three fields non-empty |
| AC-2 | License field matches the source's documented license | Round-trips against an enum |
| AC-6 | Fixture README documents the build invocation | Invocation string greps cleanly |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|--------------|-------------------|----------------|
| AC-3 | Prebuilt C6 artifacts + e2e-runner with `internal: true` network | Run `scripts/run-tests-jetson.sh` end-to-end | No outbound network calls observed by Docker network logs; all C6 queries return from local index | Security, Reliability |
| AC-4 | AZ-776 landed + C6 artifacts mounted + full-GTSAM YAML | `test_ac3_within_100m_80pct_of_ticks` un-xfailed | Test passes (≥80 % of ticks within 100 m); `satellite_anchor_inserted` log lines visible | Perf, Compat |
| AC-5 | AZ-776 landed + C6 artifacts mounted + real flight video + factory calibration | `test_az699_real_flight_validation_emits_verdict_and_report` un-xfailed | Test runs to completion ≤ 15 min, verdict report written to `_docs/06_metrics/` | Perf |
## Constraints
- Reference imagery source MUST be OSM/CARTO basemap (CC-BY-licensed). Operator chose this during AZ-777 scoping (cycle-3 Step 9, 2026-05-21) over Maxar Open Data (license uncertainty for in-repo redistribution) and video-self-orthorectification (self-referential, makes AC-3 a smoke test rather than a real accuracy gate). The trade-off — lower-resolution reference imagery may produce a higher residual on the AC-3 horizontal-error metric than satellite imagery would — is an HONEST finding the AZ-699 verdict will surface.
- The build script MUST NOT depend on `satellite-provider` running. The script's only network dependency is the chosen OSM/CARTO tile server (HTTPS, public, no auth).
- The committed artifact size budget (if AC-6 chooses commit-to-repo) is 100 MB total across `tiles/` + `descriptor_index.index`. Over budget → switch to build-on-demand, document in README.
- The `mock-suite-sat-service` stub stays in place for `test_ac8_operator_workflow` — that test exercises the D-PROJ-2 contract which this task does not address.
- Per replay protocol Invariant 5: ZERO outbound network from the e2e-runner. The build script runs on the developer workstation; the harness only sees prebuilt artifacts.
## Risks & Mitigation
**Risk 1: OSM basemap residual is too coarse for the AC-3 threshold**
- *Risk*: AC-3's `≤100 m for 80 %` gate may be physically unmeetable when the reference imagery is OSM rasterized basemap (street-level features, not satellite features) — the visual descriptors may not lock against the aerial nav-camera frames at all.
- *Mitigation*: This is an honest discovery. If AC-3 still fails after this task lands, the failure mode shifts from "no anchors at all" (current) to "anchors exist but VPR similarity is too low to produce ≥80 % within 100 m". The AZ-699 verdict report will surface the actual horizontal-error distribution; if it lands at e.g. p50 = 250 m, that becomes evidence for a follow-up ticket to switch to satellite imagery. The xfail is removed in either case because the test now exercises the real pipeline — the verdict, not the xfail, becomes the honest signal.
**Risk 2: Tile source rate-limits or goes offline mid-build**
- *Risk*: Public OSM/CARTO tile servers may rate-limit or temporarily go down, breaking reproducibility on a re-build.
- *Mitigation*: Build script implements exponential backoff + resume-from-partial. Document the chosen tile-server URL in the fixture README so an operator can swap to a mirror if needed. If commit-to-repo is chosen for the artifacts, future re-builds are unnecessary — the committed artifacts are the source of truth.
**Risk 3: Repo size pressure if artifacts are committed**
- *Risk*: Tile store + FAISS index could exceed 100 MB depending on bbox + zoom levels; committing them under LFS still costs LFS storage and bandwidth.
- *Mitigation*: First build run measures the size. If under 100 MB → commit. If over → build-on-demand documented in README + `scripts/run-tests-jetson.sh` pre-step. Either choice is acceptable per AC-6.
**Risk 4: Backbone descriptor dimension mismatch**
- *Risk*: If the operator changes the airborne C2 backbone (UltraVPR → NetVLAD, etc.) without rebuilding the index, the FAISS load will fail at runtime with a dimension mismatch.
- *Mitigation*: Manifest records the descriptor dimension. C6 loader asserts the manifest's dimension matches the configured backbone's output dimension at compose time; mismatch surfaces as an `AirborneBootstrapError` naming both numbers + the rebuild invocation.
### ADR Impact
> Affects ADR-001 (composition root is single registration site): unchanged — C6 is built outside the composition root by the operator-side build script; the airborne binary still just loads what's on disk.
> Implements architecture principle #4 (no in-air network I/O) and principle #5 (all persistent imagery in `satellite-provider` on-disk layout) — this is the FIRST executable artifact that demonstrates both principles end-to-end against a real flight.
+5 -5
View File
@@ -2,14 +2,14 @@
## Current Step ## Current Step
flow: existing-code flow: existing-code
step: 12 step: 10
name: Test-Spec Sync name: Implement
status: not_started status: in_progress
sub_step: sub_step:
phase: 0 phase: 0
name: awaiting-invocation name: awaiting-invocation
detail: "" detail: "cycle 3 implement: AZ-776 first (no deps), then AZ-777 (depends on AZ-776)"
retry_count: 0 retry_count: 0
cycle: 2 cycle: 3
tracker: jira tracker: jira
last_completed_batch: 102 last_completed_batch: 102
@@ -1,11 +1,11 @@
# D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block # D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block
**Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv) **Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv)
**Last replay attempt**: 2026-05-21T12:14+03:00 (Europe/Kyiv) — replay re-checked **Last replay attempt**: 2026-05-21T13:10+03:00 (Europe/Kyiv) — replay re-checked
at start of next `/autodev` invocation (~19h after prior check at 2026-05-20 at start of next `/autodev` invocation (~56 min after prior check at
17:34). PyPI re-queried via `pip index versions gtsam`: only `gtsam 4.2` 2026-05-21 12:14). PyPI re-queried via `python3 -m pip index versions
is published. Replay condition (numpy>=2 stable wheels) still NOT met. gtsam`: only `gtsam 4.2` is published. Replay condition (numpy>=2 stable
Leftover remains open. wheels) still NOT met. Leftover remains open.
**Status**: deferred-non-user (replay when upstream gtsam wheels target numpy>=2) **Status**: deferred-non-user (replay when upstream gtsam wheels target numpy>=2)
## What is blocked ## What is blocked