Update test results directory structure and enhance Docker configurations

- Modified `.gitignore` to reflect the new path for test results. - Updated `docker-compose.test.yml` to mount the correct test results directory. - Adjusted `Dockerfile.test` to set the `PYTHONPATH` and ensure test results are saved in the updated location. - Added `boto3` and `netron` to `requirements-test.txt` to support new functionalities. - Updated `pytest.ini` to include the new `pythonpath` for test discovery. These changes streamline the testing process and ensure compatibility with the updated directory structure.
2026-04-22 11:36:36 +00:00 · 2026-03-28 00:13:08 +02:00
parent c20018745b
commit 243b69656b
48 changed files with 707 additions and 581 deletions
@@ -44,31 +44,48 @@ Present a summary:

 ```
 ══════════════════════════════════════
- TEST RESULTS: [N passed, M failed, K skipped]
+ TEST RESULTS: [N passed, M failed, K skipped, E errors]
 ══════════════════════════════════════
 ```

-### 4. Handle Outcome
+**Important**: Collection errors (import failures, missing dependencies, syntax errors) count as failures — they are not "skipped" or ignorable.
+
+### 4. Diagnose Failures
+
+Before presenting choices, list every failing/erroring test with a one-line root cause:
+
+```
+Failures:
+ 1. test_foo.py::test_bar — missing dependency 'netron' (not installed)
+ 2. test_baz.py::test_qux — AssertionError: expected 5, got 3 (logic error)
+ 3. test_old.py::test_legacy — ImportError: no module 'removed_module' (possibly obsolete)
+```
+
+Categorize each as: **missing dependency**, **broken import**, **logic/assertion error**, **possibly obsolete**, or **environment-specific**.
+
+### 5. Handle Outcome

 **All tests pass** → return success to the autopilot for auto-chain.

-**Tests fail** → present using Choose format:
+**Any test fails or errors** → this is a **blocking gate**. Never silently ignore or skip failures. Present using Choose format:

 ```
 ══════════════════════════════════════
- TEST RESULTS: [N passed, M failed, K skipped]
+ TEST RESULTS: [N passed, M failed, K skipped, E errors]
 ══════════════════════════════════════
- A) Fix failing tests and re-run
- B) Proceed anyway (not recommended)
- C) Abort — fix manually
+ A) Investigate and fix failing tests/code, then re-run
+ B) Remove obsolete tests (if diagnosis shows they are no longer relevant)
+ C) Leave as-is — acknowledged tech debt (not recommended)
+ D) Abort — fix manually
 ══════════════════════════════════════
 Recommendation: A — fix failures before proceeding
 ══════════════════════════════════════
 ```

- If user picks A → attempt to fix failures, then re-run (loop back to step 2)
- If user picks B → return success with warning to the autopilot
- If user picks C → return failure to the autopilot
+- If user picks A → investigate root causes, attempt fixes, then re-run (loop back to step 2)
+- If user picks B → confirm which tests to remove, delete them, then re-run (loop back to step 2)
+- If user picks C → require explicit user confirmation; log as acknowledged tech debt in the report, then return success with warning to the autopilot
+- If user picks D → return failure to the autopilot

 ## Trigger Conditions