Update test results directory structure and enhance Docker configurations

- Modified `.gitignore` to reflect the new path for test results. - Updated `docker-compose.test.yml` to mount the correct test results directory. - Adjusted `Dockerfile.test` to set the `PYTHONPATH` and ensure test results are saved in the updated location. - Added `boto3` and `netron` to `requirements-test.txt` to support new functionalities. - Updated `pytest.ini` to include the new `pythonpath` for test discovery. These changes streamline the testing process and ensure compatibility with the updated directory structure.
2026-04-22 09:06:35 +00:00 · 2026-03-28 00:13:08 +02:00
parent c20018745b
commit 243b69656b
48 changed files with 707 additions and 581 deletions
@@ -1,6 +1,6 @@
 # Existing Code Workflow

-Workflow for projects with an existing codebase. Starts with documentation, produces test specs, decomposes and implements tests, verifies them, refactors with that safety net, then adds new functionality and deploys.
+Workflow for projects with an existing codebase. Starts with documentation, produces test specs, checks code testability (refactoring if needed), decomposes and implements tests, verifies them, refactors with that safety net, then adds new functionality and deploys.

 ## Step Reference Table

@@ -8,18 +8,19 @@ Workflow for projects with an existing codebase. Starts with documentation, prod
 |------|------|-----------|-------------------|
 | 1 | Document | document/SKILL.md | Steps 1–8 |
 | 2 | Test Spec | test-spec/SKILL.md | Phase 1a–1b |
-| 3 | Decompose Tests | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4 |
-| 4 | Implement Tests | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
-| 5 | Run Tests | test-run/SKILL.md | Steps 1–4 |
-| 6 | Refactor | refactor/SKILL.md | Phases 0–6 (7-phase method) (optional) |
-| 7 | New Task | new-task/SKILL.md | Steps 1–8 (loop) |
-| 8 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
-| 9 | Run Tests | test-run/SKILL.md | Steps 1–4 |
-| 10 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
-| 11 | Performance Test | (autopilot-managed) | Load/stress tests (optional) |
-| 12 | Deploy | deploy/SKILL.md | Step 1–7 |
+| 3 | Code Testability Revision | refactor/SKILL.md (guided mode) | Phases 0–7 (conditional) |
+| 4 | Decompose Tests | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4 |
+| 5 | Implement Tests | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
+| 6 | Run Tests | test-run/SKILL.md | Steps 1–4 |
+| 7 | Refactor | refactor/SKILL.md | Phases 0–7 (optional) |
+| 8 | New Task | new-task/SKILL.md | Steps 1–8 (loop) |
+| 9 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
+| 10 | Run Tests | test-run/SKILL.md | Steps 1–4 |
+| 11 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
+| 12 | Performance Test | (autopilot-managed) | Load/stress tests (optional) |
+| 13 | Deploy | deploy/SKILL.md | Step 1–7 |

-After Step 12, the existing-code workflow is complete.
+After Step 13, the existing-code workflow is complete.

 ## Detection Rules

@@ -35,7 +36,7 @@ Action: An existing codebase without documentation was detected. Read and execut
 ---

 **Step 2 — Test Spec**
-Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/tests/traceability-matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry)
+Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/tests/traceability-matrix.md` does not exist AND the autopilot state shows `step >= 2` (Document already ran)

 Action: Read and execute `.cursor/skills/test-spec/SKILL.md`

@@ -43,20 +44,51 @@ This step applies when the codebase was documented via the `/document` skill. Te

 ---

-**Step 3 — Decompose Tests**
-Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files)
+**Step 3 — Code Testability Revision**
+Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND the autopilot state shows Test Spec (Step 2) is completed AND the autopilot state does NOT show Code Testability Revision (Step 3) as completed or skipped
+
+Action: Analyze the codebase against the test specs to determine whether the code can be tested as-is.
+
+1. Read `_docs/02_document/tests/traceability-matrix.md` and all test scenario files in `_docs/02_document/tests/`
+2. For each test scenario, check whether the code under test can be exercised in isolation. Look for:
+   - Hardcoded file paths or directory references
+   - Hardcoded configuration values (URLs, credentials, magic numbers)
+   - Global mutable state that cannot be overridden
+   - Tight coupling to external services without abstraction
+   - Missing dependency injection or non-configurable parameters
+   - Direct file system operations without path configurability
+   - Inline construction of heavy dependencies (models, clients)
+3. If ALL scenarios are testable as-is:
+   - Mark Step 3 as `completed` with outcome "Code is testable — no changes needed"
+   - Auto-chain to Step 4 (Decompose Tests)
+4. If testability issues are found:
+   - Create `_docs/04_refactoring/01-testability-refactoring/`
+   - Write `list-of-changes.md` in that directory using the refactor skill template (`.cursor/skills/refactor/templates/list-of-changes.md`), with:
+     - **Mode**: `guided`
+     - **Source**: `autopilot-testability-analysis`
+     - One change entry per testability issue found (change ID, file paths, problem, proposed change, risk, dependencies)
+   - Invoke the refactor skill in **guided mode**: read and execute `.cursor/skills/refactor/SKILL.md` with the `list-of-changes.md` as input
+   - The refactor skill will create RUN_DIR (`01-testability-refactoring`), create tasks in `_docs/02_tasks/`, delegate to implement skill, and verify results
+   - Phase 3 (Safety Net) is automatically skipped by the refactor skill for testability runs
+   - After refactoring completes, mark Step 3 as `completed`
+   - Auto-chain to Step 4 (Decompose Tests)
+
+---
+
+**Step 4 — Decompose Tests**
+Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Step 3 (Code Testability Revision) is completed or skipped AND (`_docs/02_tasks/` does not exist or has no test task files)

 Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will:
 1. Run Step 1t (test infrastructure bootstrap)
 2. Run Step 3 (blackbox test task decomposition)
 3. Run Step 4 (cross-verification against test coverage)

-If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
+If `_docs/02_tasks/` has some task files already (e.g., refactoring tasks from Step 3), the decompose skill's resumability handles it — it appends test tasks alongside existing refactoring tasks.

 ---

-**Step 4 — Implement Tests**
-Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND the autopilot state shows Step 3 (Decompose Tests) is completed AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
+**Step 5 — Implement Tests**
+Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND the autopilot state shows Step 4 (Decompose Tests) is completed AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist

 Action: Read and execute `.cursor/skills/implement/SKILL.md`

@@ -66,8 +98,8 @@ If `_docs/03_implementation/` has batch reports, the implement skill detects com

 ---

-**Step 5 — Run Tests**
-Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 4 (Implement Tests) is completed AND the autopilot state does NOT show Step 5 (Run Tests) as completed
+**Step 6 — Run Tests**
+Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 5 (Implement Tests) is completed AND the autopilot state does NOT show Step 6 (Run Tests) as completed

 Action: Read and execute `.cursor/skills/test-run/SKILL.md`

@@ -75,8 +107,8 @@ Verifies the implemented test suite passes before proceeding to refactoring. The

 ---

-**Step 6 — Refactor (optional)**
-Condition: the autopilot state shows Step 5 (Run Tests) is completed AND the autopilot state does NOT show Step 6 (Refactor) as completed or skipped AND `_docs/04_refactoring/FINAL_report.md` does not exist
+**Step 7 — Refactor (optional)**
+Condition: the autopilot state shows Step 6 (Run Tests) is completed AND the autopilot state does NOT show Step 7 (Refactor) as completed or skipped AND no `_docs/04_refactoring/` run folder contains a `FINAL_report.md` for a non-testability run

 Action: Present using Choose format:

@@ -93,13 +125,13 @@ Action: Present using Choose format:
 ══════════════════════════════════════
 ```

- If user picks A → Read and execute `.cursor/skills/refactor/SKILL.md`. The refactor skill runs the full method using the implemented tests as a safety net. If `_docs/04_refactoring/` has phase reports, the refactor skill detects completed phases and continues. After completion, auto-chain to Step 7 (New Task).
- If user picks B → Mark Step 6 as `skipped` in the state file, auto-chain to Step 7 (New Task).
+- If user picks A → Read and execute `.cursor/skills/refactor/SKILL.md` in automatic mode. The refactor skill creates a new run folder in `_docs/04_refactoring/` (e.g., `02-coupling-refactoring`), runs the full method using the implemented tests as a safety net. After completion, auto-chain to Step 8 (New Task).
+- If user picks B → Mark Step 7 as `skipped` in the state file, auto-chain to Step 8 (New Task).

 ---

-**Step 7 — New Task**
-Condition: the autopilot state shows Step 6 (Refactor) is completed or skipped AND the autopilot state does NOT show Step 7 (New Task) as completed
+**Step 8 — New Task**
+Condition: the autopilot state shows Step 7 (Refactor) is completed or skipped AND the autopilot state does NOT show Step 8 (New Task) as completed

 Action: Read and execute `.cursor/skills/new-task/SKILL.md`

@@ -107,26 +139,26 @@ The new-task skill interactively guides the user through defining new functional

 ---

-**Step 8 — Implement**
-Condition: the autopilot state shows Step 7 (New Task) is completed AND `_docs/03_implementation/` does not contain a FINAL report covering the new tasks (check state for distinction between test implementation and feature implementation)
+**Step 9 — Implement**
+Condition: the autopilot state shows Step 8 (New Task) is completed AND `_docs/03_implementation/` does not contain a FINAL report covering the new tasks (check state for distinction between test implementation and feature implementation)

 Action: Read and execute `.cursor/skills/implement/SKILL.md`

-The implement skill reads the new tasks from `_docs/02_tasks/` and implements them. Tasks already implemented in Step 4 are skipped (the implement skill tracks completed tasks in batch reports).
+The implement skill reads the new tasks from `_docs/02_tasks/` and implements them. Tasks already implemented in Step 5 are skipped (the implement skill tracks completed tasks in batch reports).

 If `_docs/03_implementation/` has batch reports from this phase, the implement skill detects completed tasks and continues.

 ---

-**Step 9 — Run Tests**
-Condition: the autopilot state shows Step 8 (Implement) is completed AND the autopilot state does NOT show Step 9 (Run Tests) as completed
+**Step 10 — Run Tests**
+Condition: the autopilot state shows Step 9 (Implement) is completed AND the autopilot state does NOT show Step 10 (Run Tests) as completed

 Action: Read and execute `.cursor/skills/test-run/SKILL.md`

 ---

-**Step 10 — Security Audit (optional)**
-Condition: the autopilot state shows Step 9 (Run Tests) is completed AND the autopilot state does NOT show Step 10 (Security Audit) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+**Step 11 — Security Audit (optional)**
+Condition: the autopilot state shows Step 10 (Run Tests) is completed AND the autopilot state does NOT show Step 11 (Security Audit) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Present using Choose format:

@@ -141,13 +173,13 @@ Action: Present using Choose format:
 ══════════════════════════════════════
 ```

- If user picks A → Read and execute `.cursor/skills/security/SKILL.md`. After completion, auto-chain to Step 11 (Performance Test).
- If user picks B → Mark Step 10 as `skipped` in the state file, auto-chain to Step 11 (Performance Test).
+- If user picks A → Read and execute `.cursor/skills/security/SKILL.md`. After completion, auto-chain to Step 12 (Performance Test).
+- If user picks B → Mark Step 11 as `skipped` in the state file, auto-chain to Step 12 (Performance Test).

 ---

-**Step 11 — Performance Test (optional)**
-Condition: the autopilot state shows Step 10 (Security Audit) is completed or skipped AND the autopilot state does NOT show Step 11 (Performance Test) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+**Step 12 — Performance Test (optional)**
+Condition: the autopilot state shows Step 11 (Security Audit) is completed or skipped AND the autopilot state does NOT show Step 12 (Performance Test) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Present using Choose format:

@@ -168,13 +200,13 @@ Action: Present using Choose format:
  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
  3. Present results vs acceptance criteria thresholds
  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
-  5. After completion, auto-chain to Step 12 (Deploy)
- If user picks B → Mark Step 11 as `skipped` in the state file, auto-chain to Step 12 (Deploy).
+  5. After completion, auto-chain to Step 13 (Deploy)
+- If user picks B → Mark Step 12 as `skipped` in the state file, auto-chain to Step 13 (Deploy).

 ---

-**Step 12 — Deploy**
-Condition: the autopilot state shows Step 9 (Run Tests) is completed AND (Step 10 is completed or skipped) AND (Step 11 is completed or skipped) AND (`_docs/04_deploy/` does not exist or is incomplete)
+**Step 13 — Deploy**
+Condition: the autopilot state shows Step 10 (Run Tests) is completed AND (Step 11 is completed or skipped) AND (Step 12 is completed or skipped) AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Read and execute `.cursor/skills/deploy/SKILL.md`

@@ -183,7 +215,7 @@ After deployment completes, the existing-code workflow is done.
 ---

 **Re-Entry After Completion**
-Condition: the autopilot state shows `step: done` OR all steps through 12 (Deploy) are completed
+Condition: the autopilot state shows `step: done` OR all steps through 13 (Deploy) are completed

 Action: The project completed a full cycle. Present status and loop back to New Task:

@@ -199,7 +231,7 @@ Action: The project completed a full cycle. Present status and loop back to New
 ══════════════════════════════════════
 ```

- If user picks A → set `step: 7`, `status: not_started` in the state file, then auto-chain to Step 7 (New Task). Previous cycle history stays in Completed Steps.
+- If user picks A → set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task).
 - If user picks B → report final project status and exit.

 ## Auto-Chain Rules
@@ -207,17 +239,18 @@ Action: The project completed a full cycle. Present status and loop back to New
 | Completed Step | Next Action |
 |---------------|-------------|
 | Document (1) | Auto-chain → Test Spec (2) |
-| Test Spec (2) | Auto-chain → Decompose Tests (3) |
-| Decompose Tests (3) | **Session boundary** — suggest new conversation before Implement Tests |
-| Implement Tests (4) | Auto-chain → Run Tests (5) |
-| Run Tests (5, all pass) | Auto-chain → Refactor choice (6) |
-| Refactor (6, done or skipped) | Auto-chain → New Task (7) |
-| New Task (7) | **Session boundary** — suggest new conversation before Implement |
-| Implement (8) | Auto-chain → Run Tests (9) |
-| Run Tests (9, all pass) | Auto-chain → Security Audit choice (10) |
-| Security Audit (10, done or skipped) | Auto-chain → Performance Test choice (11) |
-| Performance Test (11, done or skipped) | Auto-chain → Deploy (12) |
-| Deploy (12) | **Workflow complete** — existing-code flow done |
+| Test Spec (2) | Auto-chain → Code Testability Revision (3) |
+| Code Testability Revision (3) | Auto-chain → Decompose Tests (4) |
+| Decompose Tests (4) | **Session boundary** — suggest new conversation before Implement Tests |
+| Implement Tests (5) | Auto-chain → Run Tests (6) |
+| Run Tests (6, all pass) | Auto-chain → Refactor choice (7) |
+| Refactor (7, done or skipped) | Auto-chain → New Task (8) |
+| New Task (8) | **Session boundary** — suggest new conversation before Implement |
+| Implement (9) | Auto-chain → Run Tests (10) |
+| Run Tests (10, all pass) | Auto-chain → Security Audit choice (11) |
+| Security Audit (11, done or skipped) | Auto-chain → Performance Test choice (12) |
+| Performance Test (12, done or skipped) | Auto-chain → Deploy (13) |
+| Deploy (13) | **Workflow complete** — existing-code flow done |

 ## Status Summary Template

@@ -225,18 +258,19 @@ Action: The project completed a full cycle. Present status and loop back to New
 ═══════════════════════════════════════════════════
 AUTOPILOT STATUS (existing-code)
 ═══════════════════════════════════════════════════
- Step 1   Document            [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 2   Test Spec           [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 3   Decompose Tests     [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 4   Implement Tests     [DONE / IN PROGRESS (batch M) / NOT STARTED / FAILED (retry N/3)]
- Step 5   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 6   Refactor            [DONE / SKIPPED / IN PROGRESS (phase N) / NOT STARTED / FAILED (retry N/3)]
- Step 7   New Task            [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 8   Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
- Step 9   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 10  Security Audit      [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 11  Performance Test    [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 12  Deploy              [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 1   Document                 [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 2   Test Spec                [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 3   Code Testability Rev.    [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 4   Decompose Tests          [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 5   Implement Tests          [DONE / IN PROGRESS (batch M) / NOT STARTED / FAILED (retry N/3)]
+ Step 6   Run Tests                [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 7   Refactor                 [DONE / SKIPPED / IN PROGRESS (phase N) / NOT STARTED / FAILED (retry N/3)]
+ Step 8   New Task                 [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 9   Implement                [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
+ Step 10  Run Tests                [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 11  Security Audit           [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 12  Performance Test         [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 13  Deploy                   [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
 ═══════════════════════════════════════════════════
 Current: Step N — Name
 SubStep: M — [sub-skill internal step name]