admin/.cursor/skills/autopilot/protocols.md

# Autopilot Protocols

## User Interaction Protocol

Every time the autopilot or a sub-skill needs a user decision, use the **Choose A / B / C / D** format. This applies to:

- State transitions where multiple valid next actions exist
- Sub-skill BLOCKING gates that require user judgment
- Any fork where the autopilot cannot confidently pick the right path
- Trade-off decisions (tech choices, scope, risk acceptance)

### When to Ask (MUST ask)

- The next action is ambiguous (e.g., "another research round or proceed?")
- The decision has irreversible consequences (e.g., architecture choices, skipping a step)
- The user's intent or preference cannot be inferred from existing artifacts
- A sub-skill's BLOCKING gate explicitly requires user confirmation
- Multiple valid approaches exist with meaningfully different trade-offs

### When NOT to Ask (auto-transition)

- Only one logical next step exists (e.g., Problem complete → Research is the only option)
- The transition is deterministic from the state (e.g., Plan complete → Decompose)
- The decision is low-risk and reversible
- Existing artifacts or prior decisions already imply the answer

### Choice Format

Always present decisions in this format:

```
══════════════════════════════════════
 DECISION REQUIRED: [brief context]
══════════════════════════════════════
 A) [Option A — short description]
 B) [Option B — short description]
 C) [Option C — short description, if applicable]
 D) [Option D — short description, if applicable]
══════════════════════════════════════
 Recommendation: [A/B/C/D] — [one-line reason]
══════════════════════════════════════
```

Rules:
1. Always provide 2–4 concrete options (never open-ended questions)
2. Always include a recommendation with a brief justification
3. Keep option descriptions to one line each
4. If only 2 options make sense, use A/B only — do not pad with filler options
5. Play the notification sound (per `human-attention-sound.mdc`) before presenting the choice
6. Record every user decision in the state file's `Key Decisions` section
7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive

## Work Item Tracker Authentication

Several workflow steps create work items (epics, tasks, links). The system supports **Jira MCP** and **Azure DevOps MCP** as interchangeable backends. Detect which is configured by listing available MCP servers.

### Tracker Detection

1. Check for available MCP servers: Jira MCP (`user-Jira-MCP-Server`) or Azure DevOps MCP (`user-AzureDevops`)
2. If both are available, ask the user which to use (Choose format)
3. Record the choice in the state file: `tracker: jira` or `tracker: ado`
4. If neither is available, set `tracker: local` and proceed without external tracking

### Steps That Require Work Item Tracker

| Flow | Step | Sub-Step | Tracker Action |
|------|------|----------|----------------|
| greenfield | 3 (Plan) | Step 6 — Epics | Create epics for each component |
| greenfield | 5 (Decompose) | Step 1–3 — All tasks | Create ticket per task, link to epic |
| existing-code | 3 (Decompose Tests) | Step 1t + Step 3 — All test tasks | Create ticket per task, link to epic |
| existing-code | 7 (New Task) | Step 7 — Ticket | Create ticket per task, link to epic |

### Authentication Gate

Before entering a step that requires work item tracking (see table above) for the first time, the autopilot must:

1. Call `mcp_auth` on the detected tracker's MCP server
2. If authentication succeeds → proceed normally
3. If the user **skips** or authentication fails → present using Choose format:

```
══════════════════════════════════════
 Tracker authentication failed
══════════════════════════════════════
 A) Retry authentication (retry mcp_auth)
 B) Continue without tracker (tasks saved locally only)
══════════════════════════════════════
 Recommendation: A — Tracker IDs drive task referencing,
 dependency tracking, and implementation batching.
 Without tracker, task files use numeric prefixes instead.
══════════════════════════════════════
```

If user picks **B** (continue without tracker):
- Set a flag in the state file: `tracker: local`
- All skills that would create tickets instead save metadata locally in the task/epic files with `Tracker: pending` status
- Task files keep numeric prefixes (e.g., `01_initial_structure.md`) instead of tracker ID prefixes
- The workflow proceeds normally in all other respects

### Re-Authentication

If the tracker MCP was already authenticated in a previous invocation (verify by listing available tools beyond `mcp_auth`), skip the auth gate.

## Error Handling

All error situations that require user input MUST use the **Choose A / B / C / D** format.

| Situation | Action |
|-----------|--------|
| State detection is ambiguous (artifacts suggest two different steps) | Present findings and use Choose format with the candidate steps as options |
| Sub-skill fails or hits an unrecoverable blocker | Use Choose format: A) retry, B) skip with warning, C) abort and fix manually |
| User wants to skip a step | Use Choose format: A) skip (with dependency warning), B) execute the step |
| User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
| User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |

## Skill Failure Retry Protocol

Sub-skills can return a **failed** result. Failures are often caused by missing user input, environment issues, or transient errors that resolve on retry. The autopilot auto-retries before escalating.

### Retry Flow

```
Skill execution → FAILED
  │
  ├─ retry_count < 3 ?
  │    YES → increment retry_count in state file
  │         → log failure reason in state file (Retry Log section)
  │         → re-read the sub-skill's SKILL.md
  │         → re-execute from the current sub_step
  │         → (loop back to check result)
  │
  │    NO (retry_count = 3) →
  │         → set status: failed in Current Step
  │         → add entry to Blockers section:
  │             "[Skill Name] failed 3 consecutive times at sub_step [M].
  │              Last failure: [reason]. Auto-retry exhausted."
  │         → present warning to user (see Escalation below)
  │         → do NOT auto-retry again until user intervenes
```

### Retry Rules

1. **Auto-retry immediately**: when a skill fails, retry it without asking the user — the failure is often transient (missing user confirmation in a prior step, docker not running, file lock, etc.)
2. **Preserve sub_step**: retry from the last recorded `sub_step`, not from the beginning of the skill — unless the failure indicates corruption, in which case restart from sub_step 1
3. **Increment `retry_count`**: update `retry_count` in the state file's `Current Step` section on each retry attempt
4. **Log each failure**: append the failure reason and timestamp to the state file's `Retry Log` section
5. **Reset on success**: when the skill eventually succeeds, reset `retry_count: 0` and clear the `Retry Log` for that step

### Escalation (after 3 consecutive failures)

After 3 failed auto-retries of the same skill, the failure is likely not user-related. Stop retrying and escalate:

1. Update the state file:
   - Set `status: failed` in `Current Step`
   - Set `retry_count: 3`
   - Add a blocker entry describing the repeated failure
2. Play notification sound (per `human-attention-sound.mdc`)
3. Present using Choose format:

```
══════════════════════════════════════
 SKILL FAILED: [Skill Name] — 3 consecutive failures
══════════════════════════════════════
 Step: [N] — [Name]
 SubStep: [M] — [sub-step name]
 Last failure reason: [reason]
══════════════════════════════════════
 A) Retry with fresh context (new conversation)
 B) Skip this step with warning
 C) Abort — investigate and fix manually
══════════════════════════════════════
 Recommendation: A — fresh context often resolves
 persistent failures
══════════════════════════════════════
```

### Re-Entry After Failure

On the next autopilot invocation (new conversation), if the state file shows `status: failed` and `retry_count: 3`:

- Present the blocker to the user before attempting execution
- If the user chooses to retry → reset `retry_count: 0`, set `status: in_progress`, and re-execute
- If the user chooses to skip → mark step as `skipped`, proceed to next step
- Do NOT silently auto-retry — the user must acknowledge the persistent failure first

## Error Recovery Protocol

### Stuck Detection

When executing a sub-skill, monitor for these signals:

- Same artifact overwritten 3+ times without meaningful change
- Sub-skill repeatedly asks the same question after receiving an answer
- No new artifacts saved for an extended period despite active execution

### Recovery Actions (ordered)

1. **Re-read state**: read `_docs/_autopilot_state.md` and cross-check against `_docs/` folders
2. **Retry current sub-step**: re-read the sub-skill's SKILL.md and restart from the current sub-step
3. **Escalate**: after 2 failed retries, present diagnostic summary to user using Choose format:

```
══════════════════════════════════════
 RECOVERY: [skill name] stuck at [sub-step]
══════════════════════════════════════
 A) Retry with fresh context (new conversation)
 B) Skip this sub-step with warning
 C) Abort and fix manually
══════════════════════════════════════
 Recommendation: A — fresh context often resolves stuck loops
══════════════════════════════════════
```

### Circuit Breaker

If the same autopilot step fails 3 consecutive times across conversations:

- Record the failure pattern in the state file's `Blockers` section
- Do NOT auto-retry on next invocation
- Present the blocker and ask user for guidance before attempting again

## Context Management Protocol

### Principle

Disk is memory. Never rely on in-context accumulation — read from `_docs/` artifacts, not from conversation history.

### Minimal Re-Read Set Per Skill

When re-entering a skill (new conversation or context refresh):

- Always read: `_docs/_autopilot_state.md`
- Always read: the active skill's `SKILL.md`
- Conditionally read: only the `_docs/` artifacts the current sub-step requires (listed in each skill's Context Resolution section)
- Never bulk-read: do not load all `_docs/` files at once

### Mid-Skill Interruption

If context is filling up during a long skill (e.g., document, implement):

1. Save current sub-step progress to the skill's artifact directory
2. Update `_docs/_autopilot_state.md` with exact sub-step position
3. Suggest a new conversation: "Context is getting long — recommend continuing in a fresh conversation for better results"
4. On re-entry, the skill's resumability protocol picks up from the saved sub-step

### Large Artifact Handling

When a skill needs to read large files (e.g., full solution.md, architecture.md):

- Read only the sections relevant to the current sub-step
- Use search tools (Grep, SemanticSearch) to find specific sections rather than reading entire files
- Summarize key decisions from prior steps in the state file so they don't need to be re-read

### Context Budget Heuristic

Agents cannot programmatically query context window usage. Use these heuristics to avoid degradation:

| Zone | Indicators | Action |
|------|-----------|--------|
| **Safe** | State file + SKILL.md + 2–3 focused artifacts loaded | Continue normally |
| **Caution** | 5+ artifacts loaded, or 3+ large files (architecture, solution, discovery), or conversation has 20+ tool calls | Complete current sub-step, then suggest session break |
| **Danger** | Repeated truncation in tool output, tool calls failing unexpectedly, responses becoming shallow or repetitive | Save immediately, update state file, force session boundary |

**Skill-specific guidelines**:

| Skill | Recommended session breaks |
|-------|---------------------------|
| **document** | After every ~5 modules in Step 1; between Step 4 (Verification) and Step 5 (Solution Extraction) |
| **implement** | Each batch is a natural checkpoint; if more than 2 batches completed in one session, suggest break |
| **plan** | Between Step 5 (Test Specifications) and Step 6 (Epics) for projects with many components |
| **research** | Between Mode A rounds; between Mode A and Mode B |

**How to detect caution/danger zone without API**:

1. Count tool calls made so far — if approaching 20+, context is likely filling up
2. If reading a file returns truncated content, context is under pressure
3. If the agent starts producing shorter or less detailed responses than earlier in the conversation, context quality is degrading
4. When in doubt, save and suggest a new conversation — re-entry is cheap thanks to the state file

## Rollback Protocol

### Implementation Steps (git-based)

Handled by `/implement` skill — each batch commit is a rollback checkpoint via `git revert`.

### Planning/Documentation Steps (artifact-based)

For steps that produce `_docs/` artifacts (problem, research, plan, decompose, document):

1. **Before overwriting**: if re-running a step that already has artifacts, the sub-skill's prerequisite check asks the user (resume/overwrite/skip)
2. **Rollback to previous step**: use Choose format:

```
══════════════════════════════════════
 ROLLBACK: Re-run [step name]?
══════════════════════════════════════
 A) Re-run the step (overwrites current artifacts)
 B) Stay on current step
══════════════════════════════════════
 Warning: This will overwrite files in _docs/[folder]/
══════════════════════════════════════
```

3. **Git safety net**: artifacts are committed with each autopilot step completion. To roll back: `git log --oneline _docs/` to find the commit, then `git checkout <commit> -- _docs/<folder>/`
4. **State file rollback**: when rolling back artifacts, also update `_docs/_autopilot_state.md` to reflect the rolled-back step (set it to `in_progress`, clear completed date)

## Status Summary

On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback). Use the Status Summary Template from the active flow file (`flows/greenfield.md` or `flows/existing-code.md`).

For re-entry (state file exists), also include:
- Key decisions from the state file's `Key Decisions` section
- Last session context from the `Last Session` section
- Any blockers from the `Blockers` section