[AZ-137] [AZ-138] Decompose test tasks and scaffold E2E test infrastructure

Made-with: Cursor
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-23 14:07:54 +02:00
parent 091d9a8fb0
commit 86d8e7e22d
47 changed files with 1883 additions and 88 deletions
+90 -18
View File
@@ -106,6 +106,76 @@ All error situations that require user input MUST use the **Choose A / B / C / D
| User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
| User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |
## Skill Failure Retry Protocol
Sub-skills can return a **failed** result. Failures are often caused by missing user input, environment issues, or transient errors that resolve on retry. The autopilot auto-retries before escalating.
### Retry Flow
```
Skill execution → FAILED
├─ retry_count < 3 ?
│ YES → increment retry_count in state file
│ → log failure reason in state file (Retry Log section)
│ → re-read the sub-skill's SKILL.md
│ → re-execute from the current sub_step
│ → (loop back to check result)
│ NO (retry_count = 3) →
│ → set status: failed in Current Step
│ → add entry to Blockers section:
│ "[Skill Name] failed 3 consecutive times at sub_step [M].
│ Last failure: [reason]. Auto-retry exhausted."
│ → present warning to user (see Escalation below)
│ → do NOT auto-retry again until user intervenes
```
### Retry Rules
1. **Auto-retry immediately**: when a skill fails, retry it without asking the user — the failure is often transient (missing user confirmation in a prior step, docker not running, file lock, etc.)
2. **Preserve sub_step**: retry from the last recorded `sub_step`, not from the beginning of the skill — unless the failure indicates corruption, in which case restart from sub_step 1
3. **Increment `retry_count`**: update `retry_count` in the state file's `Current Step` section on each retry attempt
4. **Log each failure**: append the failure reason and timestamp to the state file's `Retry Log` section
5. **Reset on success**: when the skill eventually succeeds, reset `retry_count: 0` and clear the `Retry Log` for that step
### Escalation (after 3 consecutive failures)
After 3 failed auto-retries of the same skill, the failure is likely not user-related. Stop retrying and escalate:
1. Update the state file:
- Set `status: failed` in `Current Step`
- Set `retry_count: 3`
- Add a blocker entry describing the repeated failure
2. Play notification sound (per `human-input-sound.mdc`)
3. Present using Choose format:
```
══════════════════════════════════════
SKILL FAILED: [Skill Name] — 3 consecutive failures
══════════════════════════════════════
Step: [N] — [Name]
SubStep: [M] — [sub-step name]
Last failure reason: [reason]
══════════════════════════════════════
A) Retry with fresh context (new conversation)
B) Skip this step with warning
C) Abort — investigate and fix manually
══════════════════════════════════════
Recommendation: A — fresh context often resolves
persistent failures
══════════════════════════════════════
```
### Re-Entry After Failure
On the next autopilot invocation (new conversation), if the state file shows `status: failed` and `retry_count: 3`:
- Present the blocker to the user before attempting execution
- If the user chooses to retry → reset `retry_count: 0`, set `status: in_progress`, and re-execute
- If the user chooses to skip → mark step as `skipped`, proceed to next step
- Do NOT silently auto-retry — the user must acknowledge the persistent failure first
## Error Recovery Protocol
### Stuck Detection
@@ -211,17 +281,18 @@ On every invocation, before executing any skill, present a status summary built
═══════════════════════════════════════════════════
AUTOPILOT STATUS (greenfield)
═══════════════════════════════════════════════════
Step 0 Problem [DONE / IN PROGRESS / NOT STARTED]
Step 1 Research [DONE (N drafts) / IN PROGRESS / NOT STARTED]
Step 2 Plan [DONE / IN PROGRESS / NOT STARTED]
Step 3 Decompose [DONE (N tasks) / IN PROGRESS / NOT STARTED]
Step 4 Implement [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
Step 5 Run Tests [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED]
Step 5b Security Audit [DONE / SKIPPED / IN PROGRESS / NOT STARTED]
Step 6 Deploy [DONE / IN PROGRESS / NOT STARTED]
Step 0 Problem [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 1 Research [DONE (N drafts) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 2 Plan [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 3 Decompose [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 4 Implement [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
Step 5 Run Tests [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 5b Security Audit [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 6 Deploy [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
═══════════════════════════════════════════════════
Current: Step N — Name
SubStep: M — [sub-skill internal step name]
Retry: [N/3 if retrying, omit if 0]
Action: [what will happen next]
═══════════════════════════════════════════════════
```
@@ -232,19 +303,20 @@ On every invocation, before executing any skill, present a status summary built
═══════════════════════════════════════════════════
AUTOPILOT STATUS (existing-code)
═══════════════════════════════════════════════════
Pre Document [DONE / IN PROGRESS / NOT STARTED]
Step 2b Blackbox Test Spec [DONE / IN PROGRESS / NOT STARTED]
Step 2c Decompose Tests [DONE (N tasks) / IN PROGRESS / NOT STARTED]
Step 2d Implement Tests [DONE / IN PROGRESS (batch M) / NOT STARTED]
Step 2e Refactor [DONE / IN PROGRESS (phase N) / NOT STARTED]
Step 2f New Task [DONE (N tasks) / IN PROGRESS / NOT STARTED]
Step 2g Implement [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
Step 2h Run Tests [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED]
Step 2hb Security Audit [DONE / SKIPPED / IN PROGRESS / NOT STARTED]
Step 2i Deploy [DONE / IN PROGRESS / NOT STARTED]
Pre Document [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 2b Blackbox Test Spec [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 2c Decompose Tests [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 2d Implement Tests [DONE / IN PROGRESS (batch M) / NOT STARTED / FAILED (retry N/3)]
Step 2e Refactor [DONE / IN PROGRESS (phase N) / NOT STARTED / FAILED (retry N/3)]
Step 2f New Task [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 2g Implement [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
Step 2h Run Tests [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 2hb Security Audit [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
Step 2i Deploy [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
═══════════════════════════════════════════════════
Current: Step N — Name
SubStep: M — [sub-skill internal step name]
Retry: [N/3 if retrying, omit if 0]
Action: [what will happen next]
═══════════════════════════════════════════════════
```