Enhance autopilot documentation and workflows: Add assumptions regarding single project per workspace, update notification sound references, and introduce context budget heuristics for managing session limits. Revise various skill documents to reflect changes in task management, including ticketing and testing processes, ensuring clarity and consistency across the system.

2026-06-22 16:21:07 +00:00 · 2026-03-24 05:56:12 +02:00
parent 749217bbb6
commit a5fc4fe073
14 changed files with 341 additions and 57 deletions
@@ -37,6 +37,7 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det
 - **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here
 - **Sound on pause**: follow `.cursor/rules/human-attention-sound.mdc` — play a notification sound before every pause that requires human input
 - **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically
+- **Single project per workspace**: all `_docs/` paths are relative to workspace root; for monorepos, each service needs its own Cursor workspace

 ## Flow Resolution

@@ -83,13 +83,13 @@ If `_docs/03_implementation/` has batch reports, the implement skill detects com
 ---

 **Step 2e — Refactor**
-Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactor/FINAL_refactor_report.md` does not exist
+Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactoring/FINAL_report.md` does not exist

 Action: Read and execute `.cursor/skills/refactor/SKILL.md`

 The refactor skill runs the full 6-phase method using the implemented tests as a safety net.

-If `_docs/04_refactor/` has phase reports, the refactor skill detects completed phases and continues.
+If `_docs/04_refactoring/` has phase reports, the refactor skill detects completed phases and continues.

 ---

@@ -147,8 +147,8 @@ Condition: the autopilot state shows Step 2g (Implement) is completed AND the au

 Action: Run the full test suite to verify the implementation before deployment.

-1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
-2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
+1. If `scripts/run-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+2. Otherwise, detect the project's test runner manually (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests; if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests

 If all tests pass → auto-chain to Step 2hb (Security Audit).
@@ -208,12 +208,11 @@ Action: Present using Choose format:
 ```

 - If user picks A → Run performance tests:
-  1. Check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios
-  2. Detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks)
-  3. Execute performance test scenarios against the running system
-  4. Present results vs acceptance criteria thresholds
-  5. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
-  6. After completion, auto-chain to Step 2i (Deploy)
+  1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
+  3. Present results vs acceptance criteria thresholds
+  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
+  5. After completion, auto-chain to Step 2i (Deploy)
 - If user picks B → Mark Step 2hc as `skipped` in the state file, auto-chain to Step 2i (Deploy).

 ---
@@ -132,8 +132,8 @@ Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND t

 Action: Run the full test suite to verify the implementation before deployment.

-1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
-2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
+1. If `scripts/run-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+2. Otherwise, detect the project's test runner manually (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests; if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests

 If all tests pass → auto-chain to Step 5b (Security Audit).
@@ -193,12 +193,11 @@ Action: Present using Choose format:
 ```

 - If user picks A → Run performance tests:
-  1. Check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios
-  2. Detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks)
-  3. Execute performance test scenarios against the running system
-  4. Present results vs acceptance criteria thresholds
-  5. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
-  6. After completion, auto-chain to Step 6 (Deploy)
+  1. If `scripts/run-performance-tests.sh` exists (generated by the test-spec skill Phase 4), execute it
+  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
+  3. Present results vs acceptance criteria thresholds
+  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
+  5. After completion, auto-chain to Step 6 (Deploy)
 - If user picks B → Mark Step 5c as `skipped` in the state file, auto-chain to Step 6 (Deploy).

 ---
@@ -46,7 +46,7 @@ Rules:
 2. Always include a recommendation with a brief justification
 3. Keep option descriptions to one line each
 4. If only 2 options make sense, use A/B only — do not pad with filler options
-5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice
+5. Play the notification sound (per `human-attention-sound.mdc`) before presenting the choice
 6. Record every user decision in the state file's `Key Decisions` section
 7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive

@@ -154,7 +154,7 @@ After 3 failed auto-retries of the same skill, the failure is likely not user-re
   - Set `status: failed` in `Current Step`
   - Set `retry_count: 3`
   - Add a blocker entry describing the repeated failure
-2. Play notification sound (per `human-input-sound.mdc`)
+2. Play notification sound (per `human-attention-sound.mdc`)
 3. Present using Choose format:

 ```
@@ -251,6 +251,32 @@ When a skill needs to read large files (e.g., full solution.md, architecture.md)
 - Use search tools (Grep, SemanticSearch) to find specific sections rather than reading entire files
 - Summarize key decisions from prior steps in the state file so they don't need to be re-read

+### Context Budget Heuristic
+
+Agents cannot programmatically query context window usage. Use these heuristics to avoid degradation:
+
+| Zone | Indicators | Action |
+|------|-----------|--------|
+| **Safe** | State file + SKILL.md + 2–3 focused artifacts loaded | Continue normally |
+| **Caution** | 5+ artifacts loaded, or 3+ large files (architecture, solution, discovery), or conversation has 20+ tool calls | Complete current sub-step, then suggest session break |
+| **Danger** | Repeated truncation in tool output, tool calls failing unexpectedly, responses becoming shallow or repetitive | Save immediately, update state file, force session boundary |
+
+**Skill-specific guidelines**:
+
+| Skill | Recommended session breaks |
+|-------|---------------------------|
+| **document** | After every ~5 modules in Step 1; between Step 4 (Verification) and Step 5 (Solution Extraction) |
+| **implement** | Each batch is a natural checkpoint; if more than 2 batches completed in one session, suggest break |
+| **plan** | Between Step 5 (Test Specifications) and Step 6 (Epics) for projects with many components |
+| **research** | Between Mode A rounds; between Mode A and Mode B |
+
+**How to detect caution/danger zone without API**:
+
+1. Count tool calls made so far — if approaching 20+, context is likely filling up
+2. If reading a file returns truncated content, context is under pressure
+3. If the agent starts producing shorter or less detailed responses than earlier in the conversation, context quality is degrading
+4. When in doubt, save and suggest a new conversation — re-entry is cheap thanks to the state file
+
 ## Rollback Protocol

 ### Implementation Steps (git-based)
@@ -20,7 +20,7 @@ retry_count: [0-3 — number of consecutive auto-retry attempts for current step
 (include the step reference table from the active flow file)

 When updating `Current Step`, always write it as:
-  step: N          ← autopilot step (0–6 or 2b/2c/2d/2e/2f/2g/2h/2hb/2i or 5b)
+  step: N          ← autopilot step (0–6 or 2b/2c/2d/2e/2ea/2f/2g/2h/2hb/2hc/2i or 5b/5c)
  sub_step: M      ← sub-skill's own internal step/phase number + name
  retry_count: 0   ← reset on new step or success; increment on each failed retry
 Example: