Update test results directory structure and enhance Docker configurations

- Modified `.gitignore` to reflect the new path for test results. - Updated `docker-compose.test.yml` to mount the correct test results directory. - Adjusted `Dockerfile.test` to set the `PYTHONPATH` and ensure test results are saved in the updated location. - Added `boto3` and `netron` to `requirements-test.txt` to support new functionalities. - Updated `pytest.ini` to include the new `pythonpath` for test discovery. These changes streamline the testing process and ensure compatibility with the updated directory structure.
2026-04-22 11:36:36 +00:00 · 2026-03-28 00:13:08 +02:00
parent c20018745b
commit 243b69656b
48 changed files with 707 additions and 581 deletions
@@ -2,81 +2,51 @@

 ## State File: `_docs/_autopilot_state.md`

-The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
+The autopilot persists its position to `_docs/_autopilot_state.md`. This is a lightweight pointer — only the current step. All history lives in `_docs/` artifacts and git log. Folder scanning is the fallback when the state file doesn't exist.

-### Format
+### Template

 ```markdown
 # Autopilot State

 ## Current Step
 flow: [greenfield | existing-code]
-step: [1-10 for greenfield, 1-12 for existing-code, or "done"]
+step: [1-10 for greenfield, 1-13 for existing-code, or "done"]
 name: [step name from the active flow's Step Reference Table]
 status: [not_started / in_progress / completed / skipped / failed]
-sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
-retry_count: [0-3 — number of consecutive auto-retry attempts for current step, reset to 0 on success]
+sub_step: [0, or sub-skill internal step number + name if interrupted mid-step]
+retry_count: [0-3 — consecutive auto-retry attempts, reset to 0 on success]
+```

-When updating `Current Step`, always write it as:
-  flow: existing-code   ← active flow
-  step: N               ← autopilot step (sequential integer)
-  sub_step: M           ← sub-skill's own internal step/phase number + name
-  retry_count: 0        ← reset on new step or success; increment on each failed retry
-Example:
-  flow: greenfield
-  step: 3
-  name: Plan
-  status: in_progress
-  sub_step: 4 — Architecture Review & Risk Assessment
-  retry_count: 0
-Example (failed after 3 retries):
-  flow: existing-code
-  step: 2
-  name: Test Spec
-  status: failed
-  sub_step: 1b — Test Case Generation
-  retry_count: 3
+### Examples

-## Completed Steps
+```
+flow: greenfield
+step: 3
+name: Plan
+status: in_progress
+sub_step: 4 — Architecture Review & Risk Assessment
+retry_count: 0
+```

-| Step | Name | Completed | Key Outcome |
-|------|------|-----------|-------------|
-| 1 | [name] | [date] | [one-line summary] |
-| 2 | [name] | [date] | [one-line summary] |
-| ... | ... | ... | ... |
-
-## Key Decisions
- [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
- [decision N]
-
-## Last Session
-date: [date]
-ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
-reason: [completed step / session boundary / user paused / context limit]
-notes: [any context for next session]
-
-## Retry Log
-| Attempt | Step | Name | SubStep | Failure Reason | Timestamp |
-|---------|------|------|---------|----------------|-----------|
-| 1 | [step] | [name] | [sub_step] | [reason] | [date-time] |
-| ... | ... | ... | ... | ... | ... |
-
-(Clear this table when the step succeeds or user resets. Append a row on each failed auto-retry.)
-
-## Blockers
- [blocker 1, if any]
- [none]
+```
+flow: existing-code
+step: 2
+name: Test Spec
+status: failed
+sub_step: 1b — Test Case Generation
+retry_count: 3
 ```

 ### State File Rules

-1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 1)
-2. **Update** the state file after every step completion, every session boundary, every BLOCKING gate confirmation, and every failed retry attempt
-3. **Read** the state file as the first action on every invocation — before folder scanning
-4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 3 but `_docs/02_document/architecture.md` already exists), trust the folder structure and update the state file to match
-5. **Never delete** the state file. It accumulates history across the entire project lifecycle
-6. **Retry tracking**: increment `retry_count` on each failed auto-retry; reset to `0` when the step succeeds or the user manually resets. If `retry_count` reaches 3, set `status: failed` and add an entry to `Blockers`
-7. **Failed state on re-entry**: if the state file shows `status: failed` with `retry_count: 3`, do NOT auto-retry — present the blocker to the user and wait for their decision before proceeding
+1. **Create** on the first autopilot invocation (after state detection determines Step 1)
+2. **Update** after every step completion, session boundary, or failed retry
+3. **Read** as the first action on every invocation — before folder scanning
+4. **Cross-check**: verify against actual `_docs/` folder contents. If they disagree, trust the folder structure and update the state file
+5. **Never delete** the state file
+6. **Retry tracking**: increment `retry_count` on each failed auto-retry; reset to `0` on success. If `retry_count` reaches 3, set `status: failed`
+7. **Failed state on re-entry**: if `status: failed` with `retry_count: 3`, do NOT auto-retry — present the issue to the user first

 ## State Detection

@@ -92,8 +62,8 @@ When the user invokes `/autopilot` and work already exists:

 1. Read `_docs/_autopilot_state.md`
 2. Cross-check against `_docs/` folder structure
-3. Present Status Summary with context from state file (key decisions, last session, blockers)
-4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
+3. Present Status Summary (use the active flow's Status Summary Template)
+4. If the detected step has a sub-skill with built-in resumability, the sub-skill handles mid-step recovery
 5. Continue execution from detected state

 ## Session Boundaries
@@ -101,12 +71,11 @@ When the user invokes `/autopilot` and work already exists:
 After any decompose/planning step completes, **do not auto-chain to implement**. Instead:

 1. Update state file: mark the step as completed, set current step to the next implement step with status `not_started`
-   - Existing-code flow: After Step 3 (Decompose Tests) → set current step to 4 (Implement Tests)
-   - Existing-code flow: After Step 7 (New Task) → set current step to 8 (Implement)
+   - Existing-code flow: After Step 4 (Decompose Tests) → set current step to 5 (Implement Tests)
+   - Existing-code flow: After Step 8 (New Task) → set current step to 9 (Implement)
   - Greenfield flow: After Step 5 (Decompose) → set current step to 6 (Implement)
-2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
-3. Present a summary: number of tasks, estimated batches, total complexity points
-4. Use Choose format:
+2. Present a summary: number of tasks, estimated batches, total complexity points
+3. Use Choose format:

 ```
 ══════════════════════════════════════