Refactor autopilot workflows and documentation: Update .gitignore to include binary and media file types, enhance agent command references in documentation, and modify annotation class for improved accessibility. Adjust inference processing to handle batch sizes and streamline test specifications for clarity and consistency across the system.

2026-06-22 11:31:07 +00:00 · 2026-03-25 05:26:19 +02:00
parent a5fc4fe073
commit 4afa1a4eec
29 changed files with 447 additions and 362 deletions
@@ -59,16 +59,16 @@ Every input data item MUST have a corresponding expected result that defines wha

 Expected results live inside `_docs/00_problem/input_data/` in one or both of:

-1. **Mapping file** (`input_data/expected_results.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md`
+1. **Mapping file** (`input_data/expected_results/results_report.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md`

 2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file

 ```
 input_data/
-├── expected_results.md          ← required: input→expected result mapping
-├── expected_results/            ← optional: complex reference files
-│   ├── image_01_detections.json
-│   └── batch_A_results.json
+├── expected_results/            ← required: expected results folder
+│   ├── results_report.md        ← required: input→expected result mapping
+│   ├── image_01_expected.csv    ← per-file expected detections
+│   └── video_01_expected.csv
 ├── image_01.jpg
 ├── empty_scene.jpg
 └── data_parameters.md
@@ -95,7 +95,7 @@ input_data/
 1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
 2. `restrictions.md` exists and is non-empty — **STOP if missing**
 3. `input_data/` exists and contains at least one file — **STOP if missing**
-4. `input_data/expected_results.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."*
+4. `input_data/expected_results/results_report.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results/results_report.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."*
 5. `problem.md` exists and is non-empty — **STOP if missing**
 6. `solution.md` exists and is non-empty — **STOP if missing**
 7. Create TESTS_OUTPUT_DIR if it does not exist
@@ -161,12 +161,12 @@ At the start of execution, create a TodoWrite with all three phases. Update stat
 2. Read `acceptance_criteria.md`, `restrictions.md`
 3. Read testing strategy from solution.md (if present)
 4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
-5. Read `input_data/expected_results.md` and any referenced files in `input_data/expected_results/`
+5. Read `input_data/expected_results/results_report.md` and any referenced files in `input_data/expected_results/`
 6. Analyze `input_data/` contents against:
   - Coverage of acceptance criteria scenarios
   - Coverage of restriction edge cases
   - Coverage of testing strategy requirements
-7. Analyze `input_data/expected_results.md` completeness:
+7. Analyze `input_data/expected_results/results_report.md` completeness:
   - Every input data item has a corresponding expected result row in the mapping
   - Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
   - Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
@@ -178,7 +178,7 @@ At the start of execution, create a TodoWrite with all three phases. Update stat
 | [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |

 9. Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
-10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results.md`
+10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
 11. If expected results are missing or not quantifiable, ask user to provide them before proceeding

 **BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient.
@@ -205,7 +205,7 @@ Based on all acquired data, acceptance_criteria, and restrictions, form detailed
 **Self-verification**:
 - [ ] Every acceptance criterion is covered by at least one test scenario
 - [ ] Every restriction is verified by at least one test scenario
- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results.md`
+- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results/results_report.md`
 - [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
 - [ ] Positive and negative scenarios are balanced
 - [ ] Consumer app has no direct access to system internals
@@ -251,7 +251,7 @@ For each row where **Input Provided?** is **No** OR **Expected Result Provided?*

 > **Option A — Provide the missing items**: Supply what is missing:
 > - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
-> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
+> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results/results_report.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
 >
 > Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
 > - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
@@ -273,7 +273,7 @@ For each item where the user chose **Option A**:
 3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)

 **Expected result validation**:
-4. Verify the expected result exists in `input_data/expected_results.md` or as a referenced file in `input_data/expected_results/`
+4. Verify the expected result exists in `input_data/expected_results/results_report.md` or as a referenced file in `input_data/expected_results/`
 5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
   - Exact values (counts, strings, status codes)
   - Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
@@ -392,7 +392,7 @@ Create `scripts/run-performance-tests.sh` at the project root. The script must:
 | Situation | Action |
 |-----------|--------|
 | Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
-| Missing input_data/expected_results.md | **STOP** — ask user to provide expected results mapping using the template |
+| Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template |
 | Ambiguous requirements | ASK user |
 | Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
 | Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
@@ -1,7 +1,7 @@
 # Expected Results Template

-Save as `_docs/00_problem/input_data/expected_results.md`.
-For complex expected outputs, create `_docs/00_problem/input_data/expected_results/` and place reference files there.
+Save as `_docs/00_problem/input_data/expected_results/results_report.md`.
+For complex expected outputs, place reference CSV files alongside it in `_docs/00_problem/input_data/expected_results/`.
 Referenced by the test-spec skill (`.cursor/skills/test-spec/SKILL.md`).

 ---