[AZ-137] [AZ-138] Decompose test tasks and scaffold E2E test infrastructure

Made-with: Cursor
2026-04-23 05:56:32 +00:00 · 2026-03-23 14:07:54 +02:00
parent 091d9a8fb0
commit 86d8e7e22d
47 changed files with 1883 additions and 88 deletions
@@ -62,22 +62,26 @@ Every invocation follows this sequence:
 6. Present Status Summary (format in protocols.md)
 7. Execute:
   a. Delegate to current skill (see Skill Delegation below)
-   b. When skill completes → update state file (rules in state.md)
-   c. Re-detect next step from the active flow's detection rules
-   d. If next skill is ready → auto-chain (go to 7a with next skill)
-   e. If session boundary reached → update state, suggest new conversation (rules in state.md)
-   f. If all steps done → update state → report completion
+   b. If skill returns FAILED → apply Skill Failure Retry Protocol (see protocols.md):
+      - Auto-retry the same skill (failure may be caused by missing user input or environment issue)
+      - If 3 consecutive auto-retries fail → record in state file Blockers, warn user, stop auto-retry
+   c. When skill completes successfully → reset retry counter, update state file (rules in state.md)
+   d. Re-detect next step from the active flow's detection rules
+   e. If next skill is ready → auto-chain (go to 7a with next skill)
+   f. If session boundary reached → update state, suggest new conversation (rules in state.md)
+   g. If all steps done → update state → report completion
 ```

 ## Skill Delegation

 For each step, the delegation pattern is:

-1. Update state file: set `step` to the autopilot step number, status to `in_progress`, set `sub_step` to the sub-skill's current internal step/phase
+1. Update state file: set `step` to the autopilot step number, status to `in_progress`, set `sub_step` to the sub-skill's current internal step/phase, reset `retry_count: 0`
 2. Announce: "Starting [Skill Name]..."
 3. Read the skill file: `.cursor/skills/[name]/SKILL.md`
 4. Execute the skill's workflow exactly as written, including all BLOCKING gates, self-verification checklists, save actions, and escalation rules. Update `sub_step` in state each time the sub-skill advances.
-5. When complete: mark step `completed`, record date + key outcome, add key decisions to state file, return to auto-chain rules (from active flow file)
+5. If the skill **fails**: follow the Skill Failure Retry Protocol in `protocols.md` — increment `retry_count`, auto-retry up to 3 times, then escalate.
+6. When complete (success): reset `retry_count: 0`, mark step `completed`, record date + key outcome, add key decisions to state file, return to auto-chain rules (from active flow file)

 Do NOT modify, skip, or abbreviate any part of the sub-skill's workflow. The autopilot is a sequencer, not an optimizer.

@@ -106,6 +106,76 @@ All error situations that require user input MUST use the **Choose A / B / C / D
 | User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
 | User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |

+## Skill Failure Retry Protocol
+
+Sub-skills can return a **failed** result. Failures are often caused by missing user input, environment issues, or transient errors that resolve on retry. The autopilot auto-retries before escalating.
+
+### Retry Flow
+
+```
+Skill execution → FAILED
+  │
+  ├─ retry_count < 3 ?
+  │    YES → increment retry_count in state file
+  │         → log failure reason in state file (Retry Log section)
+  │         → re-read the sub-skill's SKILL.md
+  │         → re-execute from the current sub_step
+  │         → (loop back to check result)
+  │
+  │    NO (retry_count = 3) →
+  │         → set status: failed in Current Step
+  │         → add entry to Blockers section:
+  │             "[Skill Name] failed 3 consecutive times at sub_step [M].
+  │              Last failure: [reason]. Auto-retry exhausted."
+  │         → present warning to user (see Escalation below)
+  │         → do NOT auto-retry again until user intervenes
+```
+
+### Retry Rules
+
+1. **Auto-retry immediately**: when a skill fails, retry it without asking the user — the failure is often transient (missing user confirmation in a prior step, docker not running, file lock, etc.)
+2. **Preserve sub_step**: retry from the last recorded `sub_step`, not from the beginning of the skill — unless the failure indicates corruption, in which case restart from sub_step 1
+3. **Increment `retry_count`**: update `retry_count` in the state file's `Current Step` section on each retry attempt
+4. **Log each failure**: append the failure reason and timestamp to the state file's `Retry Log` section
+5. **Reset on success**: when the skill eventually succeeds, reset `retry_count: 0` and clear the `Retry Log` for that step
+
+### Escalation (after 3 consecutive failures)
+
+After 3 failed auto-retries of the same skill, the failure is likely not user-related. Stop retrying and escalate:
+
+1. Update the state file:
+   - Set `status: failed` in `Current Step`
+   - Set `retry_count: 3`
+   - Add a blocker entry describing the repeated failure
+2. Play notification sound (per `human-input-sound.mdc`)
+3. Present using Choose format:
+
+```
+══════════════════════════════════════
+ SKILL FAILED: [Skill Name] — 3 consecutive failures
+══════════════════════════════════════
+ Step: [N] — [Name]
+ SubStep: [M] — [sub-step name]
+ Last failure reason: [reason]
+══════════════════════════════════════
+ A) Retry with fresh context (new conversation)
+ B) Skip this step with warning
+ C) Abort — investigate and fix manually
+══════════════════════════════════════
+ Recommendation: A — fresh context often resolves
+ persistent failures
+══════════════════════════════════════
+```
+
+### Re-Entry After Failure
+
+On the next autopilot invocation (new conversation), if the state file shows `status: failed` and `retry_count: 3`:
+
+- Present the blocker to the user before attempting execution
+- If the user chooses to retry → reset `retry_count: 0`, set `status: in_progress`, and re-execute
+- If the user chooses to skip → mark step as `skipped`, proceed to next step
+- Do NOT silently auto-retry — the user must acknowledge the persistent failure first
+
 ## Error Recovery Protocol

 ### Stuck Detection
@@ -211,17 +281,18 @@ On every invocation, before executing any skill, present a status summary built
 ═══════════════════════════════════════════════════
 AUTOPILOT STATUS (greenfield)
 ═══════════════════════════════════════════════════
- Step 0   Problem             [DONE / IN PROGRESS / NOT STARTED]
- Step 1   Research            [DONE (N drafts) / IN PROGRESS / NOT STARTED]
- Step 2   Plan                [DONE / IN PROGRESS / NOT STARTED]
- Step 3   Decompose           [DONE (N tasks) / IN PROGRESS / NOT STARTED]
- Step 4   Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
- Step 5   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED]
- Step 5b  Security Audit      [DONE / SKIPPED / IN PROGRESS / NOT STARTED]
- Step 6   Deploy              [DONE / IN PROGRESS / NOT STARTED]
+ Step 0   Problem             [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 1   Research            [DONE (N drafts) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2   Plan                [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 3   Decompose           [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 4   Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
+ Step 5   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 5b  Security Audit      [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 6   Deploy              [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
 ═══════════════════════════════════════════════════
 Current: Step N — Name
 SubStep: M — [sub-skill internal step name]
+ Retry:   [N/3 if retrying, omit if 0]
 Action:  [what will happen next]
 ═══════════════════════════════════════════════════
 ```
@@ -232,19 +303,20 @@ On every invocation, before executing any skill, present a status summary built
 ═══════════════════════════════════════════════════
 AUTOPILOT STATUS (existing-code)
 ═══════════════════════════════════════════════════
- Pre      Document            [DONE / IN PROGRESS / NOT STARTED]
- Step 2b  Blackbox Test Spec  [DONE / IN PROGRESS / NOT STARTED]
- Step 2c  Decompose Tests     [DONE (N tasks) / IN PROGRESS / NOT STARTED]
- Step 2d  Implement Tests     [DONE / IN PROGRESS (batch M) / NOT STARTED]
- Step 2e  Refactor            [DONE / IN PROGRESS (phase N) / NOT STARTED]
- Step 2f  New Task            [DONE (N tasks) / IN PROGRESS / NOT STARTED]
- Step 2g  Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
- Step 2h  Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED]
- Step 2hb Security Audit      [DONE / SKIPPED / IN PROGRESS / NOT STARTED]
- Step 2i  Deploy              [DONE / IN PROGRESS / NOT STARTED]
+ Pre      Document            [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2b  Blackbox Test Spec  [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2c  Decompose Tests     [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2d  Implement Tests     [DONE / IN PROGRESS (batch M) / NOT STARTED / FAILED (retry N/3)]
+ Step 2e  Refactor            [DONE / IN PROGRESS (phase N) / NOT STARTED / FAILED (retry N/3)]
+ Step 2f  New Task            [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2g  Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
+ Step 2h  Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2hb Security Audit      [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2i  Deploy              [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
 ═══════════════════════════════════════════════════
 Current: Step N — Name
 SubStep: M — [sub-skill internal step name]
+ Retry:   [N/3 if retrying, omit if 0]
 Action:  [what will happen next]
 ═══════════════════════════════════════════════════
 ```
@@ -12,8 +12,9 @@ The autopilot persists its state to `_docs/_autopilot_state.md`. This file is th
 ## Current Step
 step: [0-6 or "2b" / "2c" / "2d" / "2e" / "2f" / "2g" / "2h" / "2hb" / "2i" or "5b" or "done"]
 name: [Problem / Research / Plan / Blackbox Test Spec / Decompose Tests / Implement Tests / Refactor / New Task / Implement / Run Tests / Security Audit / Deploy / Decompose / Done]
-status: [not_started / in_progress / completed / skipped]
+status: [not_started / in_progress / completed / skipped / failed]
 sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
+retry_count: [0-3 — number of consecutive auto-retry attempts for current step, reset to 0 on success]

 ## Step ↔ SubStep Reference
 (include the step reference table from the active flow file)
@@ -21,11 +22,19 @@ sub_step: [optional — sub-skill internal step number + name if interrupted mid
 When updating `Current Step`, always write it as:
  step: N          ← autopilot step (0–6 or 2b/2c/2d/2e/2f/2g/2h/2hb/2i or 5b)
  sub_step: M      ← sub-skill's own internal step/phase number + name
+  retry_count: 0   ← reset on new step or success; increment on each failed retry
 Example:
  step: 2
  name: Plan
  status: in_progress
  sub_step: 4 — Architecture Review & Risk Assessment
+  retry_count: 0
+Example (failed after 3 retries):
+  step: 2b
+  name: Blackbox Test Spec
+  status: failed
+  sub_step: 1b — Test Case Generation
+  retry_count: 3

 ## Completed Steps

@@ -45,6 +54,14 @@ ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
 reason: [completed step / session boundary / user paused / context limit]
 notes: [any context for next session]

+## Retry Log
+| Attempt | Step | Name | SubStep | Failure Reason | Timestamp |
+|---------|------|------|---------|----------------|-----------|
+| 1 | [step] | [name] | [sub_step] | [reason] | [date-time] |
+| ... | ... | ... | ... | ... | ... |
+
+(Clear this table when the step succeeds or user resets. Append a row on each failed auto-retry.)
+
 ## Blockers
 - [blocker 1, if any]
 - [none]
@@ -53,10 +70,12 @@ notes: [any context for next session]
 ### State File Rules

 1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 0)
-2. **Update** the state file after every step completion, every session boundary, and every BLOCKING gate confirmation
+2. **Update** the state file after every step completion, every session boundary, every BLOCKING gate confirmation, and every failed retry attempt
 3. **Read** the state file as the first action on every invocation — before folder scanning
 4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 2 but `_docs/02_document/architecture.md` already exists), trust the folder structure and update the state file to match
 5. **Never delete** the state file. It accumulates history across the entire project lifecycle
+6. **Retry tracking**: increment `retry_count` on each failed auto-retry; reset to `0` when the step succeeds or the user manually resets. If `retry_count` reaches 3, set `status: failed` and add an entry to `Blockers`
+7. **Failed state on re-entry**: if the state file shows `status: failed` with `retry_count: 3`, do NOT auto-retry — present the blocker to the user and wait for their decision before proceeding

 ## State Detection