Refine coding standards and testing guidelines

- Updated the coding rule descriptions to emphasize readability, meaningful comments, and test verification. - Revised guidelines to clarify the importance of avoiding boilerplate while maintaining readability. - Enhanced the testing rules to set a minimum coverage threshold of 75% for business logic and specified criteria for test scenarios. - Introduced a mechanism for handling skipped tests, categorizing them as legitimate or illegitimate, and outlined resolution steps. These changes aim to improve code quality, maintainability, and testing effectiveness.
2026-06-22 12:51:11 +00:00 · 2026-04-17 20:27:45 +03:00
parent 4b52c0be3b
commit 06b47c17c3
17 changed files with 275 additions and 90 deletions
@@ -27,8 +27,11 @@ Analyze input data completeness and produce detailed black-box test specificatio
 - **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
 - **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
 - **Spec, don't code**: this workflow produces test specifications, never test implementation code
- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed
- **No test without expected result**: every test scenario MUST pair input data with a quantifiable expected result; a test that cannot compare actual output against a known-correct answer is not verifiable and must be removed
+- **Every test must have a pass/fail criterion**. Two acceptable shapes:
+  - **Input/output shape**: concrete input data paired with a quantifiable expected result (exact value, tolerance, threshold, pattern, reference file). Typical for functional blackbox tests, performance tests with load data, data-processing pipelines.
+  - **Behavioral shape**: a trigger condition + observable system behavior + quantifiable pass/fail criterion, with no input data required. Typical for startup/shutdown tests, retry/backoff policies, state transitions, logging/metrics emission, resilience scenarios. Example criteria: "startup logs `service ready` within 5s", "retry emits 3 attempts with exponential backoff (base 100ms ± 20ms)", "on SIGTERM, service drains in-flight requests within 30s grace period", "health endpoint returns 503 while migrations run".
+- For behavioral tests the observable (log line, metric value, state transition, emitted event, elapsed time) must still be quantifiable — the test must programmatically decide pass/fail.
+- A test that cannot produce a pass/fail verdict through either shape is not verifiable and must be removed.

 ## Context Resolution

@@ -177,7 +180,7 @@ At the start of execution, create a TodoWrite with all four phases. Update statu
 |------------|--------------------------|---------------|----------------|
 | [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |

-9. Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
+9. Threshold: at least 75% coverage of scenarios AND every covered scenario has a quantifiable expected result (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds table)
 10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results/results_report.md`
 11. If expected results are missing or not quantifiable, ask user to provide them before proceeding

@@ -232,18 +235,26 @@ Capture any new questions, findings, or insights that arise during test specific
 ### Phase 3: Test Data Validation Gate (HARD GATE)

 **Role**: Professional Quality Assurance Engineer
-**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%.
+**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 75%.
 **Constraints**: This phase is MANDATORY and cannot be skipped.

-#### Step 1 — Build the test-data and expected-result requirements checklist
+#### Step 1 — Build the requirements checklist

-Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, extract:
+Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, classify its shape (input/output or behavioral) and extract:
+
+**Input/output tests:**

 | # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
 |---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
 | 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |

-Present this table to the user.
+**Behavioral tests:**
+
+| # | Test Scenario ID | Test Name | Trigger Condition | Observable Behavior | Pass/Fail Criterion | Quantifiable? |
+|---|-----------------|-----------|-------------------|--------------------|--------------------|---------------|
+| 1 | [ID] | [name] | [e.g., service receives SIGTERM] | [e.g., drain logs emitted, port closed] | [e.g., drain completes ≤30s] | [Yes/No] |
+
+Present both tables to the user.

 #### Step 2 — Ask user to provide missing test data AND expected results

@@ -315,20 +326,20 @@ After all removals, recalculate coverage:

 **Decision**:

- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user.
- **Coverage < 70%** → Phase 3 **FAILED**. Report:
-  > ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
+- **Coverage ≥ 75%** → Phase 3 **PASSED**. Present final summary to user.
+- **Coverage < 75%** → Phase 3 **FAILED**. Report:
+  > ❌ Test coverage dropped to **X%** (minimum 75% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
  >
  > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
  > |---|---|---|
  >
  > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.

-  **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%.
+  **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 75%.

 #### Phase 3 Completion

-When coverage ≥ 70% and all remaining tests have validated data AND quantifiable expected results:
+When coverage ≥ 75% and all remaining tests have validated data AND quantifiable expected results:

 1. Present the final coverage report
 2. List all removed tests (if any) with reasons
@@ -479,23 +490,23 @@ Create `scripts/run-performance-tests.sh` at the project root. The script must:
 | Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
 | Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template |
 | Ambiguous requirements | ASK user |
-| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
+| Input data coverage below 75% (Phase 1) | Search internet for supplementary data, ASK user to validate |
 | Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
 | Test scenario conflicts with restrictions | ASK user to clarify intent |
 | System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
 | Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
-| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
+| Final coverage below 75% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |

 ## Common Mistakes

 - **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
 - **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
- **Missing expected results**: input data without a paired expected result is useless — the test cannot determine pass/fail without knowing what "correct" looks like
- **Non-quantifiable expected results**: "should return good results" is not verifiable; expected results must have exact values, tolerances, thresholds, or pattern matches that code can evaluate
+- **Missing pass/fail criterion**: input/output tests without an expected result, OR behavioral tests without a measurable observable — both are unverifiable and must be removed
+- **Non-quantifiable criteria**: "should return good results", "works correctly", "behaves properly" — not verifiable. Use exact values, tolerances, thresholds, pattern matches, or timing bounds that code can evaluate.
+- **Forcing the wrong shape**: do not invent fake input data for a behavioral test (e.g., "input: SIGTERM signal") just to fit the input/output shape. Classify the test correctly and use the matching checklist.
 - **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
 - **Untraceable tests**: every test should trace to at least one AC or restriction
 - **Writing test code**: this skill produces specifications, never implementation code
- **Tests without data**: every test scenario MUST have concrete test data AND a quantifiable expected result; a test spec without either is not executable and must be removed

 ## Trigger Conditions

@@ -516,7 +527,7 @@ When the user wants to:
 │   → verify AC, restrictions, input_data (incl. expected_results.md)  │
 │                                                                      │
 │ Phase 1: Input Data & Expected Results Completeness Analysis         │
-│   → assess input_data/ coverage vs AC scenarios (≥70%)               │
+│   → assess input_data/ coverage vs AC scenarios (≥75%)               │
 │   → verify every input has a quantifiable expected result            │
 │   → present input→expected-result pairing assessment                 │
 │   [BLOCKING: user confirms input data + expected results coverage]   │
@@ -538,8 +549,8 @@ When the user wants to:
 │   → validate input data (quality + quantity)                         │
 │   → validate expected results (quantifiable + comparison method)     │
 │   → remove tests without data or expected result, warn user          │
-│   → final coverage check (≥70% or FAIL + loop back)                  │
-│   [BLOCKING: coverage ≥ 70% required to pass]                        │
+│   → final coverage check (≥75% or FAIL + loop back)                  │
+│   [BLOCKING: coverage ≥ 75% required to pass]                        │
 │                                                                      │
 │ Phase 4: Test Runner Script Generation                               │
 │   → detect test runner + docker-compose + load tool                  │