diff --git a/.cursor/rules/human-input-sound.mdc b/.cursor/rules/human-attention-sound.mdc similarity index 52% rename from .cursor/rules/human-input-sound.mdc rename to .cursor/rules/human-attention-sound.mdc index e7e3aa3..1fbb3fc 100644 --- a/.cursor/rules/human-input-sound.mdc +++ b/.cursor/rules/human-attention-sound.mdc @@ -1,10 +1,12 @@ --- -description: "Play a notification sound whenever the AI agent needs human input, confirmation, or approval" +description: "Play a notification sound when the AI agent needs human input or when AI generation is finished" alwaysApply: true --- -# Sound Notification on Human Input +# Sound Notification for Human Attention -Whenever you are about to ask the user a question, request confirmation, present options for a decision, or otherwise pause and wait for human input, you MUST first run the appropriate shell command for the current OS: +Play a notification sound whenever human attention is needed. This includes waiting for input AND completing generation. + +## Commands by OS - **macOS**: `afplay /System/Library/Sounds/Glass.aiff &` - **Linux**: `paplay /usr/share/sounds/freedesktop/stereo/bell.oga 2>/dev/null || aplay /usr/share/sounds/freedesktop/stereo/bell.oga 2>/dev/null || echo -e '\a' &` @@ -12,13 +14,15 @@ Whenever you are about to ask the user a question, request confirmation, present Detect the OS from the user's system info or by running `uname -s` if unknown. -This applies to: +## When to play the sound + - Asking clarifying questions - Presenting choices (e.g. via AskQuestion tool) - Requesting approval for destructive actions - Reporting that you are blocked and need guidance - Any situation where the conversation will stall without user response +- **When AI generation is complete** — play the sound as the very last action before ending your turn, so the user knows the response is ready -Do NOT play the sound when: -- You are providing a final answer that doesn't require a response -- You are in the middle of executing a multi-step task and just providing a status update +## When NOT to play the sound + +- In the middle of executing a multi-step task and just providing a status update (more tool calls will follow) diff --git a/.cursor/skills/autopilot/flows/existing-code.md b/.cursor/skills/autopilot/flows/existing-code.md index 1da7e83..2207644 100644 --- a/.cursor/skills/autopilot/flows/existing-code.md +++ b/.cursor/skills/autopilot/flows/existing-code.md @@ -7,13 +7,13 @@ Workflow for projects with an existing codebase. Starts with documentation, prod | Step | Name | Sub-Skill | Internal SubSteps | |------|-------------------------|---------------------------------|---------------------------------------| | — | Document (pre-step) | document/SKILL.md | Steps 1–8 | -| 2b | Blackbox Test Spec | blackbox-test-spec/SKILL.md | Phase 1a–1b | +| 2b | Blackbox Test Spec | test-spec/SKILL.md | Phase 1a–1b | | 2c | Decompose Tests | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4 | | 2d | Implement Tests | implement/SKILL.md | (batch-driven, no fixed sub-steps) | | 2e | Refactor | refactor/SKILL.md | Phases 0–5 (6-phase method) | | 2f | New Task | new-task/SKILL.md | Steps 1–8 (loop) | | 2g | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) | -| 2h | Run Tests | (autopilot-managed) | Unit tests → Integration/blackbox tests | +| 2h | Run Tests | (autopilot-managed) | Unit tests → Blackbox tests | | 2hb | Security Audit | security/SKILL.md | Phase 1–5 (optional) | | 2i | Deploy | deploy/SKILL.md | Steps 1–7 | @@ -49,20 +49,20 @@ Action: An existing codebase without documentation was detected. Present using C --- **Step 2b — Blackbox Test Spec** -Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/integration_tests/traceability_matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry) +Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/tests/traceability-matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry) -Action: Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md` +Action: Read and execute `.cursor/skills/test-spec/SKILL.md` This step applies when the codebase was documented via the `/document` skill. Test specifications must be produced before refactoring or further development. --- **Step 2c — Decompose Tests** -Condition: `_docs/02_document/integration_tests/traceability_matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files) +Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files) -Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/integration_tests/` as input). The decompose skill will: +Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will: 1. Run Step 1t (test infrastructure bootstrap) -2. Run Step 3 (integration test task decomposition) +2. Run Step 3 (blackbox test task decomposition) 3. Run Step 4 (cross-verification against test coverage) If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it. @@ -117,7 +117,7 @@ Condition: the autopilot state shows Step 2g (Implement) is completed AND the au Action: Run the full test suite to verify the implementation before deployment. 1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests -2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite +2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite 3. **Report results**: present a summary of passed/failed/skipped tests If all tests pass → auto-chain to Step 2hb (Security Audit). diff --git a/.cursor/skills/autopilot/flows/greenfield.md b/.cursor/skills/autopilot/flows/greenfield.md index 807a0af..37c6523 100644 --- a/.cursor/skills/autopilot/flows/greenfield.md +++ b/.cursor/skills/autopilot/flows/greenfield.md @@ -11,7 +11,7 @@ Workflow for new projects built from scratch. Flows linearly: Problem → Resear | 2 | Plan | plan/SKILL.md | Step 1–6 + Final | | 3 | Decompose | decompose/SKILL.md | Step 1–4 | | 4 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) | -| 5 | Run Tests | (autopilot-managed) | Unit tests → Integration/blackbox tests | +| 5 | Run Tests | (autopilot-managed) | Unit tests → Blackbox tests | | 5b | Security Audit | security/SKILL.md | Phase 1–5 (optional) | | 6 | Deploy | deploy/SKILL.md | Step 1–7 | @@ -100,7 +100,7 @@ Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND t Action: Run the full test suite to verify the implementation before deployment. 1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests -2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite +2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite 3. **Report results**: present a summary of passed/failed/skipped tests If all tests pass → auto-chain to Step 5b (Security Audit). diff --git a/.cursor/skills/blackbox-test-spec/SKILL.md b/.cursor/skills/blackbox-test-spec/SKILL.md deleted file mode 100644 index 7ddf953..0000000 --- a/.cursor/skills/blackbox-test-spec/SKILL.md +++ /dev/null @@ -1,321 +0,0 @@ ---- -name: blackbox-test-spec -description: | - Black-box integration test specification skill. Analyzes input data completeness and produces - detailed E2E test scenarios (functional + non-functional) that treat the system as a black box. - 3-phase workflow: input data completeness analysis, test scenario specification, test data validation gate. - Produces 5 artifacts under integration_tests/. - Trigger phrases: - - "blackbox test spec", "black box tests", "integration test spec" - - "test specification", "e2e test spec" - - "test scenarios", "black box scenarios" -category: build -tags: [testing, black-box, integration-tests, e2e, test-specification, qa] -disable-model-invocation: true ---- - -# Black-Box Test Scenario Specification - -Analyze input data completeness and produce detailed black-box integration test specifications. Tests describe what the system should do given specific inputs — they never reference internals. - -## Core Principles - -- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details -- **Traceability**: every test traces to at least one acceptance criterion or restriction -- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work -- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding -- **Spec, don't code**: this workflow produces test specifications, never test implementation code -- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed - -## Context Resolution - -Fixed paths — no mode detection needed: - -- PROBLEM_DIR: `_docs/00_problem/` -- SOLUTION_DIR: `_docs/01_solution/` -- DOCUMENT_DIR: `_docs/02_document/` -- TESTS_OUTPUT_DIR: `_docs/02_document/integration_tests/` - -Announce the resolved paths to the user before proceeding. - -## Input Specification - -### Required Files - -| File | Purpose | -|------|---------| -| `_docs/00_problem/problem.md` | Problem description and context | -| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria | -| `_docs/00_problem/restrictions.md` | Constraints and limitations | -| `_docs/00_problem/input_data/` | Reference data examples | -| `_docs/01_solution/solution.md` | Finalized solution | - -### Optional Files (used when available) - -| File | Purpose | -|------|---------| -| `DOCUMENT_DIR/architecture.md` | System architecture for environment design | -| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage | -| `DOCUMENT_DIR/components/` | Component specs for interface identification | - -### Prerequisite Checks (BLOCKING) - -1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing** -2. `restrictions.md` exists and is non-empty — **STOP if missing** -3. `input_data/` exists and contains at least one file — **STOP if missing** -4. `problem.md` exists and is non-empty — **STOP if missing** -5. `solution.md` exists and is non-empty — **STOP if missing** -6. Create TESTS_OUTPUT_DIR if it does not exist -7. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?** - -## Artifact Management - -### Directory Structure - -``` -TESTS_OUTPUT_DIR/ -├── environment.md -├── test_data.md -├── functional_tests.md -├── non_functional_tests.md -└── traceability_matrix.md -``` - -### Save Timing - -| Phase | Save immediately after | Filename | -|-------|------------------------|----------| -| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — | -| Phase 2 | Environment spec | `environment.md` | -| Phase 2 | Test data spec | `test_data.md` | -| Phase 2 | Functional tests | `functional_tests.md` | -| Phase 2 | Non-functional tests | `non_functional_tests.md` | -| Phase 2 | Traceability matrix | `traceability_matrix.md` | -| Phase 3 | Updated test data spec (if data added) | `test_data.md` | -| Phase 3 | Updated functional tests (if tests removed) | `functional_tests.md` | -| Phase 3 | Updated non-functional tests (if tests removed) | `non_functional_tests.md` | -| Phase 3 | Updated traceability matrix (if tests removed) | `traceability_matrix.md` | - -### Resumability - -If TESTS_OUTPUT_DIR already contains files: - -1. List existing files and match them to the save timing table above -2. Identify which phase/artifacts are complete -3. Resume from the next incomplete artifact -4. Inform the user which artifacts are being skipped - -## Progress Tracking - -At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes. - -## Workflow - -### Phase 1: Input Data Completeness Analysis - -**Role**: Professional Quality Assurance Engineer -**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios -**Constraints**: Analysis only — no test specs yet - -1. Read `_docs/01_solution/solution.md` -2. Read `acceptance_criteria.md`, `restrictions.md` -3. Read testing strategy from solution.md (if present) -4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows -5. Analyze `input_data/` contents against: - - Coverage of acceptance criteria scenarios - - Coverage of restriction edge cases - - Coverage of testing strategy requirements -6. Threshold: at least 70% coverage of the scenarios -7. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` -8. Present coverage assessment to user - -**BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient. - ---- - -### Phase 2: Black-Box Test Scenario Specification - -**Role**: Professional Quality Assurance Engineer -**Goal**: Produce detailed black-box test specifications covering functional and non-functional scenarios -**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built. - -Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios: - -1. Define test environment using `.cursor/skills/plan/templates/integration-environment.md` as structure -2. Define test data management using `.cursor/skills/plan/templates/integration-test-data.md` as structure -3. Write functional test scenarios (positive + negative) using `.cursor/skills/plan/templates/integration-functional-tests.md` as structure -4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `.cursor/skills/plan/templates/integration-non-functional-tests.md` as structure -5. Build traceability matrix using `.cursor/skills/plan/templates/integration-traceability-matrix.md` as structure - -**Self-verification**: -- [ ] Every acceptance criterion is covered by at least one test scenario -- [ ] Every restriction is verified by at least one test scenario -- [ ] Positive and negative scenarios are balanced -- [ ] Consumer app has no direct access to system internals -- [ ] Docker environment is self-contained (`docker compose up` sufficient) -- [ ] External dependencies have mock/stub services defined -- [ ] Traceability matrix has no uncovered AC or restrictions - -**Save action**: Write all files under TESTS_OUTPUT_DIR: -- `environment.md` -- `test_data.md` -- `functional_tests.md` -- `non_functional_tests.md` -- `traceability_matrix.md` - -**BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed. - -Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.). - ---- - -### Phase 3: Test Data Validation Gate (HARD GATE) - -**Role**: Professional Quality Assurance Engineer -**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%. -**Constraints**: This phase is MANDATORY and cannot be skipped. - -#### Step 1 — Build the test-data requirements checklist - -Scan `functional_tests.md` and `non_functional_tests.md`. For every test scenario, extract: - -| # | Test Scenario ID | Test Name | Required Data Description | Required Data Quality | Required Data Quantity | Data Provided? | -|---|-----------------|-----------|---------------------------|----------------------|----------------------|----------------| - -Present this table to the user. - -#### Step 2 — Ask user to provide test data - -For each row where **Data Provided?** is **No**, ask the user: - -> **Option A — Provide the data**: Supply the necessary test data files (with required quality and quantity as described in the table). Place them in `_docs/00_problem/input_data/` or indicate the location. -> -> **Option B — Skip this test**: If you cannot provide the data, this test scenario will be **removed** from the specification. - -**BLOCKING**: Wait for the user's response for every missing data item. - -#### Step 3 — Validate provided data - -For each item where the user chose **Option A**: - -1. Verify the data file(s) exist at the indicated location -2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges) -3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants) -4. If validation fails, report the specific issue and loop back to Step 2 for that item - -#### Step 4 — Remove tests without data - -For each item where the user chose **Option B**: - -1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data.` -2. Remove the test scenario from `functional_tests.md` or `non_functional_tests.md` -3. Remove corresponding rows from `traceability_matrix.md` -4. Update `test_data.md` to reflect the removal - -**Save action**: Write updated files under TESTS_OUTPUT_DIR: -- `test_data.md` -- `functional_tests.md` (if tests removed) -- `non_functional_tests.md` (if tests removed) -- `traceability_matrix.md` (if tests removed) - -#### Step 5 — Final coverage check - -After all removals, recalculate coverage: - -1. Count remaining test scenarios that trace to acceptance criteria -2. Count total acceptance criteria + restrictions -3. Calculate coverage percentage: `covered_items / total_items * 100` - -| Metric | Value | -|--------|-------| -| Total AC + Restrictions | ? | -| Covered by remaining tests | ? | -| **Coverage %** | **?%** | - -**Decision**: - -- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user. -- **Coverage < 70%** → Phase 3 **FAILED**. Report: - > ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions: - > - > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed | - > |---|---|---| - > - > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply. - - **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%. - -#### Phase 3 Completion - -When coverage ≥ 70% and all remaining tests have validated data: - -1. Present the final coverage report -2. List all removed tests (if any) with reasons -3. Confirm all artifacts are saved and consistent - ---- - -## Escalation Rules - -| Situation | Action | -|-----------|--------| -| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed | -| Ambiguous requirements | ASK user | -| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate | -| Test scenario conflicts with restrictions | ASK user to clarify intent | -| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md | -| Test data not provided for a test scenario (Phase 3) | WARN user and REMOVE the test | -| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec | - -## Common Mistakes - -- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test -- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values -- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests -- **Untraceable tests**: every test should trace to at least one AC or restriction -- **Writing test code**: this skill produces specifications, never implementation code -- **Tests without data**: every test scenario MUST have concrete test data; a test spec without data is not executable and must be removed - -## Trigger Conditions - -When the user wants to: -- Specify black-box integration tests before implementation or refactoring -- Analyze input data completeness for test coverage -- Produce E2E test scenarios from acceptance criteria - -**Keywords**: "blackbox test spec", "black box tests", "integration test spec", "test specification", "e2e test spec", "test scenarios" - -## Methodology Quick Reference - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Black-Box Test Scenario Specification (3-Phase) │ -├─────────────────────────────────────────────────────────────────┤ -│ PREREQ: Data Gate (BLOCKING) │ -│ → verify AC, restrictions, input_data, solution exist │ -│ │ -│ Phase 1: Input Data Completeness Analysis │ -│ → assess input_data/ coverage vs AC scenarios (≥70%) │ -│ [BLOCKING: user confirms input data coverage] │ -│ │ -│ Phase 2: Black-Box Test Scenario Specification │ -│ → environment.md │ -│ → test_data.md │ -│ → functional_tests.md (positive + negative) │ -│ → non_functional_tests.md (perf, resilience, security, limits)│ -│ → traceability_matrix.md │ -│ [BLOCKING: user confirms test coverage] │ -│ │ -│ Phase 3: Test Data Validation Gate (HARD GATE) │ -│ → build test-data requirements checklist │ -│ → ask user: provide data (Option A) or remove test (Option B) │ -│ → validate provided data (quality + quantity) │ -│ → remove tests without data, warn user │ -│ → final coverage check (≥70% or FAIL + loop back) │ -│ [BLOCKING: coverage ≥ 70% required to pass] │ -├─────────────────────────────────────────────────────────────────┤ -│ Principles: Black-box only · Traceability · Save immediately │ -│ Ask don't assume · Spec don't code │ -│ No test without data │ -└─────────────────────────────────────────────────────────────────┘ -``` diff --git a/.cursor/skills/code-review/SKILL.md b/.cursor/skills/code-review/SKILL.md index 1c5bd4f..44c190c 100644 --- a/.cursor/skills/code-review/SKILL.md +++ b/.cursor/skills/code-review/SKILL.md @@ -46,7 +46,7 @@ For each task, verify implementation satisfies every acceptance criterion: - Walk through each AC (Given/When/Then) and trace it in the code - Check that unit tests cover each AC -- Check that integration tests exist where specified in the task spec +- Check that blackbox tests exist where specified in the task spec - Flag any AC that is not demonstrably satisfied as a **Spec-Gap** finding (severity: High) - Flag any scope creep (implementation beyond what the spec asked for) as a **Scope** finding (severity: Low) diff --git a/.cursor/skills/decompose/SKILL.md b/.cursor/skills/decompose/SKILL.md index 3837814..ac1cb2c 100644 --- a/.cursor/skills/decompose/SKILL.md +++ b/.cursor/skills/decompose/SKILL.md @@ -2,7 +2,7 @@ name: decompose description: | Decompose planned components into atomic implementable tasks with bootstrap structure plan. - 4-step workflow: bootstrap structure plan, component task decomposition, integration test task decomposition, and cross-task verification. + 4-step workflow: bootstrap structure plan, component task decomposition, blackbox test task decomposition, and cross-task verification. Supports full decomposition (_docs/ structure), single component mode, and tests-only mode. Trigger phrases: - "decompose", "decompose features", "feature decomposition" @@ -36,7 +36,7 @@ Determine the operating mode based on invocation before any other logic runs. - DOCUMENT_DIR: `_docs/02_document/` - TASKS_DIR: `_docs/02_tasks/` - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, DOCUMENT_DIR -- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (integration tests) + Step 4 (cross-verification) +- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (blackbox tests) + Step 4 (cross-verification) **Single component mode** (provided file is within `_docs/02_document/` and inside a `components/` subdirectory): - DOCUMENT_DIR: `_docs/02_document/` @@ -45,12 +45,12 @@ Determine the operating mode based on invocation before any other logic runs. - Ask user for the parent Epic ID - Runs Step 2 (that component only, appending to existing task numbering) -**Tests-only mode** (provided file/directory is within `integration_tests/`, or `DOCUMENT_DIR/integration_tests/` exists and input explicitly requests test decomposition): +**Tests-only mode** (provided file/directory is within `tests/`, or `DOCUMENT_DIR/tests/` exists and input explicitly requests test decomposition): - DOCUMENT_DIR: `_docs/02_document/` - TASKS_DIR: `_docs/02_tasks/` -- TESTS_DIR: `DOCUMENT_DIR/integration_tests/` +- TESTS_DIR: `DOCUMENT_DIR/tests/` - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR -- Runs Step 1t (test infrastructure bootstrap) + Step 3 (integration test decomposition) + Step 4 (cross-verification against test coverage) +- Runs Step 1t (test infrastructure bootstrap) + Step 3 (blackbox test decomposition) + Step 4 (cross-verification against test coverage) - Skips Step 1 (project bootstrap) and Step 2 (component decomposition) — the codebase already exists Announce the detected mode and resolved paths to the user before proceeding. @@ -70,7 +70,7 @@ Announce the detected mode and resolved paths to the user before proceeding. | `DOCUMENT_DIR/architecture.md` | Architecture from plan skill | | `DOCUMENT_DIR/system-flows.md` | System flows from plan skill | | `DOCUMENT_DIR/components/[##]_[name]/description.md` | Component specs from plan skill | -| `DOCUMENT_DIR/integration_tests/` | Integration test specs from plan skill | +| `DOCUMENT_DIR/tests/` | Blackbox test specs from plan skill | **Single component mode:** @@ -84,10 +84,13 @@ Announce the detected mode and resolved paths to the user before proceeding. | File | Purpose | |------|---------| | `TESTS_DIR/environment.md` | Test environment specification (Docker services, networks, volumes) | -| `TESTS_DIR/test_data.md` | Test data management (seed data, mocks, isolation) | -| `TESTS_DIR/functional_tests.md` | Functional test scenarios (positive + negative) | -| `TESTS_DIR/non_functional_tests.md` | Non-functional test scenarios (perf, resilience, security, limits) | -| `TESTS_DIR/traceability_matrix.md` | AC/restriction coverage mapping | +| `TESTS_DIR/test-data.md` | Test data management (seed data, mocks, isolation) | +| `TESTS_DIR/blackbox-tests.md` | Blackbox functional scenarios (positive + negative) | +| `TESTS_DIR/performance-tests.md` | Performance test scenarios | +| `TESTS_DIR/resilience-tests.md` | Resilience test scenarios | +| `TESTS_DIR/security-tests.md` | Security test scenarios | +| `TESTS_DIR/resource-limit-tests.md` | Resource limit test scenarios | +| `TESTS_DIR/traceability-matrix.md` | AC/restriction coverage mapping | | `_docs/00_problem/problem.md` | Problem context | | `_docs/00_problem/restrictions.md` | Constraints for test design | | `_docs/00_problem/acceptance_criteria.md` | Acceptance criteria being verified | @@ -103,7 +106,7 @@ Announce the detected mode and resolved paths to the user before proceeding. 1. The provided component file exists and is non-empty — **STOP if missing** **Tests-only mode:** -1. `TESTS_DIR/functional_tests.md` exists and is non-empty — **STOP if missing** +1. `TESTS_DIR/blackbox-tests.md` exists and is non-empty — **STOP if missing** 2. `TESTS_DIR/environment.md` exists — **STOP if missing** 3. Create TASKS_DIR if it does not exist 4. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?** @@ -130,7 +133,7 @@ TASKS_DIR/ | Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` | | Step 1t | Test infrastructure bootstrap complete + Jira ticket created + file renamed | `[JIRA-ID]_test_infrastructure.md` | | Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` | -| Step 3 | Each integration test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` | +| Step 3 | Each blackbox test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` | | Step 4 | Cross-task verification complete | `_dependencies_table.md` | ### Resumability @@ -153,7 +156,7 @@ At the start of execution, create a TodoWrite with all applicable steps. Update **Goal**: Produce `01_test_infrastructure.md` — the first task describing the test project scaffold **Constraints**: This is a plan document, not code. The `/implement` skill executes it. -1. Read `TESTS_DIR/environment.md` and `TESTS_DIR/test_data.md` +1. Read `TESTS_DIR/environment.md` and `TESTS_DIR/test-data.md` 2. Read problem.md, restrictions.md, acceptance_criteria.md for domain context 3. Document the test infrastructure plan using `templates/test-infrastructure-task.md` @@ -162,20 +165,20 @@ The test infrastructure bootstrap must include: - Mock/stub service definitions for each external dependency - `docker-compose.test.yml` structure from environment.md - Test runner configuration (framework, plugins, fixtures) -- Test data fixture setup from test_data.md seed data sets +- Test data fixture setup from test-data.md seed data sets - Test reporting configuration (format, output path) - Data isolation strategy **Self-verification**: - [ ] Every external dependency from environment.md has a mock service defined - [ ] Docker Compose structure covers all services from environment.md -- [ ] Test data fixtures cover all seed data sets from test_data.md +- [ ] Test data fixtures cover all seed data sets from test-data.md - [ ] Test runner configuration matches the consumer app tech stack from environment.md - [ ] Data isolation strategy is defined **Save action**: Write `01_test_infrastructure.md` (temporary numeric name) -**Jira action**: Create a Jira ticket for this task under the "Integration Tests" epic. Write the Jira ticket ID and Epic ID back into the task header. +**Jira action**: Create a Jira ticket for this task under the "Blackbox Tests" epic. Write the Jira ticket ID and Epic ID back into the task header. **Rename action**: Rename the file from `01_test_infrastructure.md` to `[JIRA-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename. @@ -199,27 +202,27 @@ The bootstrap structure plan must include: - Shared models, interfaces, and DTOs - Dockerfile per component (multi-stage, non-root, health checks, pinned base images) - `docker-compose.yml` for local development (all components + database + dependencies) -- `docker-compose.test.yml` for integration test environment (black-box test runner) +- `docker-compose.test.yml` for blackbox test environment (blackbox test runner) - `.dockerignore` - CI/CD pipeline file (`.github/workflows/ci.yml` or `azure-pipelines.yml`) with stages from `deployment/ci_cd_pipeline.md` - Database migration setup and initial seed data scripts - Observability configuration: structured logging setup, health check endpoints (`/health/live`, `/health/ready`), metrics endpoint (`/metrics`) - Environment variable documentation (`.env.example`) -- Test structure with unit and integration test locations +- Test structure with unit and blackbox test locations **Self-verification**: - [ ] All components have corresponding folders in the layout - [ ] All inter-component interfaces have DTOs defined - [ ] Dockerfile defined for each component - [ ] `docker-compose.yml` covers all components and dependencies -- [ ] `docker-compose.test.yml` enables black-box integration testing +- [ ] `docker-compose.test.yml` enables blackbox testing - [ ] CI/CD pipeline file defined with lint, test, security, build, deploy stages - [ ] Database migration setup included - [ ] Health check endpoints specified for each service - [ ] Structured logging configuration included - [ ] `.env.example` with all required environment variables - [ ] Environment strategy covers dev, staging, production -- [ ] Test structure includes unit and integration test locations +- [ ] Test structure includes unit and blackbox test locations **Save action**: Write `01_initial_structure.md` (temporary numeric name) @@ -265,33 +268,33 @@ For each component (or the single provided component): --- -### Step 3: Integration Test Task Decomposition (default and tests-only modes) +### Step 3: Blackbox Test Task Decomposition (default and tests-only modes) **Role**: Professional Quality Assurance Engineer -**Goal**: Decompose integration test specs into atomic, implementable task specs +**Goal**: Decompose blackbox test specs into atomic, implementable task specs **Constraints**: Behavioral specs only — describe what, not how. No test code. **Numbering**: - In default mode: continue sequential numbering from where Step 2 left off. - In tests-only mode: start from 02 (01 is the test infrastructure bootstrap from Step 1t). -1. Read all test specs from `DOCUMENT_DIR/integration_tests/` (functional_tests.md, non_functional_tests.md) +1. Read all test specs from `DOCUMENT_DIR/tests/` (`blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, `resource-limit-tests.md`) 2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test) -3. Each task should reference the specific test scenarios it implements and the environment/test_data specs +3. Each task should reference the specific test scenarios it implements and the environment/test-data specs 4. Dependencies: - - In default mode: integration test tasks depend on the component implementation tasks they exercise - - In tests-only mode: integration test tasks depend on the test infrastructure bootstrap task (Step 1t) + - In default mode: blackbox test tasks depend on the component implementation tasks they exercise + - In tests-only mode: blackbox test tasks depend on the test infrastructure bootstrap task (Step 1t) 5. Write each task spec using `templates/task.md` 6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does 7. Note task dependencies (referencing Jira IDs of already-created dependency tasks) -8. **Immediately after writing each task file**: create a Jira ticket under the "Integration Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`. +8. **Immediately after writing each task file**: create a Jira ticket under the "Blackbox Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`. **Self-verification**: -- [ ] Every functional test scenario from `integration_tests/functional_tests.md` is covered by a task -- [ ] Every non-functional test scenario from `integration_tests/non_functional_tests.md` is covered by a task +- [ ] Every scenario from `tests/blackbox-tests.md` is covered by a task +- [ ] Every scenario from `tests/performance-tests.md`, `tests/resilience-tests.md`, `tests/security-tests.md`, and `tests/resource-limit-tests.md` is covered by a task - [ ] No task exceeds 5 complexity points - [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode) -- [ ] Every task has a Jira ticket linked to the "Integration Tests" epic +- [ ] Every task has a Jira ticket linked to the "Blackbox Tests" epic **Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`. @@ -306,7 +309,7 @@ For each component (or the single provided component): 1. Verify task dependencies across all tasks are consistent 2. Check no gaps: - In default mode: every interface in architecture.md has tasks covering it - - In tests-only mode: every test scenario in `traceability_matrix.md` is covered by a task + - In tests-only mode: every test scenario in `traceability-matrix.md` is covered by a task 3. Check no overlaps: tasks don't duplicate work 4. Check no circular dependencies in the task graph 5. Produce `_dependencies_table.md` using `templates/dependencies-table.md` @@ -320,7 +323,7 @@ Default mode: - [ ] `_dependencies_table.md` contains every task with correct dependencies Tests-only mode: -- [ ] Every test scenario from traceability_matrix.md "Covered" entries has a corresponding task +- [ ] Every test scenario from traceability-matrix.md "Covered" entries has a corresponding task - [ ] No circular dependencies in the task graph - [ ] Test task dependencies reference the test infrastructure bootstrap - [ ] `_dependencies_table.md` contains every task with correct dependencies @@ -366,14 +369,14 @@ Tests-only mode: │ 1. Bootstrap Structure → [JIRA-ID]_initial_structure.md │ │ [BLOCKING: user confirms structure] │ │ 2. Component Tasks → [JIRA-ID]_[short_name].md each │ -│ 3. Integration Tests → [JIRA-ID]_[short_name].md each │ +│ 3. Blackbox Tests → [JIRA-ID]_[short_name].md each │ │ 4. Cross-Verification → _dependencies_table.md │ │ [BLOCKING: user confirms dependencies] │ │ │ │ TESTS-ONLY MODE: │ │ 1t. Test Infrastructure → [JIRA-ID]_test_infrastructure.md │ │ [BLOCKING: user confirms test scaffold] │ -│ 3. Integration Tests → [JIRA-ID]_[short_name].md each │ +│ 3. Blackbox Tests → [JIRA-ID]_[short_name].md each │ │ 4. Cross-Verification → _dependencies_table.md │ │ [BLOCKING: user confirms dependencies] │ │ │ diff --git a/.cursor/skills/decompose/templates/initial-structure-task.md b/.cursor/skills/decompose/templates/initial-structure-task.md index 9642f65..371e5e0 100644 --- a/.cursor/skills/decompose/templates/initial-structure-task.md +++ b/.cursor/skills/decompose/templates/initial-structure-task.md @@ -49,7 +49,7 @@ project-root/ | Build | Compile/bundle the application | Every push | | Lint / Static Analysis | Code quality and style checks | Every push | | Unit Tests | Run unit test suite | Every push | -| Integration Tests | Run integration test suite | Every push | +| Blackbox Tests | Run blackbox test suite | Every push | | Security Scan | SAST / dependency check | Every push | | Deploy to Staging | Deploy to staging environment | Merge to staging branch | diff --git a/.cursor/skills/decompose/templates/task.md b/.cursor/skills/decompose/templates/task.md index d8547a9..f36ea38 100644 --- a/.cursor/skills/decompose/templates/task.md +++ b/.cursor/skills/decompose/templates/task.md @@ -64,7 +64,7 @@ Then [expected result] |--------|-------------|-----------------| | AC-1 | [test subject] | [expected result] | -## Integration Tests +## Blackbox Tests | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References | |--------|------------------------|-------------|-------------------|----------------| diff --git a/.cursor/skills/decompose/templates/test-infrastructure-task.md b/.cursor/skills/decompose/templates/test-infrastructure-task.md index 49b18ca..a07cb42 100644 --- a/.cursor/skills/decompose/templates/test-infrastructure-task.md +++ b/.cursor/skills/decompose/templates/test-infrastructure-task.md @@ -9,10 +9,10 @@ Use this template for the test infrastructure bootstrap (Step 1t in tests-only m **Task**: [JIRA-ID]_test_infrastructure **Name**: Test Infrastructure -**Description**: Scaffold the E2E test project — test runner, mock services, Docker test environment, test data fixtures, reporting +**Description**: Scaffold the Blackbox test project — test runner, mock services, Docker test environment, test data fixtures, reporting **Complexity**: [3|5] points **Dependencies**: None -**Component**: Integration Tests +**Component**: Blackbox Tests **Jira**: [TASK-ID] **Epic**: [EPIC-ID] @@ -124,6 +124,6 @@ Then a report file exists at the configured output path with correct columns - This is a PLAN document, not code. The `/implement` skill executes it. - Focus on test infrastructure decisions, not individual test implementations. -- Reference environment.md and test_data.md from the test specs — don't repeat everything. +- Reference environment.md and test-data.md from the test specs — don't repeat everything. - Mock services must be deterministic: same input always produces same output. - The Docker environment must be self-contained: `docker compose up` sufficient. diff --git a/.cursor/skills/deploy/SKILL.md b/.cursor/skills/deploy/SKILL.md index 022520f..d3bc3e6 100644 --- a/.cursor/skills/deploy/SKILL.md +++ b/.cursor/skills/deploy/SKILL.md @@ -20,7 +20,7 @@ Plan and document the full deployment lifecycle: check deployment status and env ## Core Principles -- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker +- **Docker-first**: every component runs in a container; local dev, blackbox tests, and production all use Docker - **Infrastructure as code**: all deployment configuration is version-controlled - **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts - **Environment parity**: dev, staging, and production environments mirror each other as closely as possible @@ -157,7 +157,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda ### Step 2: Containerization **Role**: DevOps / Platform engineer -**Goal**: Define Docker configuration for every component, local development, and integration test environments +**Goal**: Define Docker configuration for every component, local development, and blackbox test environments **Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain. 1. Read architecture.md and all component specs @@ -176,7 +176,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda - Any message queues, caches, or external service mocks - Shared network - Environment variable files (`.env`) -6. Define `docker-compose.test.yml` for integration tests: +6. Define `docker-compose.test.yml` for blackbox tests: - Application components under test - Test runner container (black-box, no internal imports) - Isolated database with seed data @@ -189,7 +189,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda - [ ] Non-root user for all containers - [ ] Health checks defined for every service - [ ] docker-compose.yml covers all components + dependencies -- [ ] docker-compose.test.yml enables black-box integration testing +- [ ] docker-compose.test.yml enables black-box testing - [ ] `.dockerignore` defined **Save action**: Write `containerization.md` using `templates/containerization.md` @@ -212,7 +212,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda | Stage | Trigger | Steps | Quality Gate | |-------|---------|-------|-------------| | **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors | -| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage | +| **Test** | Every push | Unit tests, blackbox tests, coverage report | 75%+ coverage | | **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs | | **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds | | **Push** | After build | Push to container registry | Push succeeds | @@ -458,7 +458,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda - **Implementing during planning**: Steps 1–6 produce documents, not code (Step 7 is the exception — it creates scripts) - **Hardcoding secrets**: never include real credentials in deployment documents or scripts -- **Ignoring integration test containerization**: the test environment must be containerized alongside the app +- **Ignoring blackbox test containerization**: the test environment must be containerized alongside the app - **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation - **Using `:latest` tags**: always pin base image versions - **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions diff --git a/.cursor/skills/deploy/templates/ci_cd_pipeline.md b/.cursor/skills/deploy/templates/ci_cd_pipeline.md index 57b8b41..16102e3 100644 --- a/.cursor/skills/deploy/templates/ci_cd_pipeline.md +++ b/.cursor/skills/deploy/templates/ci_cd_pipeline.md @@ -28,7 +28,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`. ### Test - Unit tests: [framework and command] -- Integration tests: [framework and command, uses docker-compose.test.yml] +- Blackbox tests: [framework and command, uses docker-compose.test.yml] - Coverage threshold: 75% overall, 90% critical paths - Coverage report published as pipeline artifact @@ -54,7 +54,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`. - Automated rollback on health check failure ### Smoke Tests -- Subset of integration tests targeting staging environment +- Subset of blackbox tests targeting staging environment - Validates critical user flows - Timeout: [maximum duration] diff --git a/.cursor/skills/deploy/templates/containerization.md b/.cursor/skills/deploy/templates/containerization.md index d1025be..d6c7073 100644 --- a/.cursor/skills/deploy/templates/containerization.md +++ b/.cursor/skills/deploy/templates/containerization.md @@ -48,7 +48,7 @@ networks: [shared network] ``` -## Docker Compose — Integration Tests +## Docker Compose — Blackbox Tests ```yaml # docker-compose.test.yml structure diff --git a/.cursor/skills/new-task/templates/task.md b/.cursor/skills/new-task/templates/task.md index d8547a9..f36ea38 100644 --- a/.cursor/skills/new-task/templates/task.md +++ b/.cursor/skills/new-task/templates/task.md @@ -64,7 +64,7 @@ Then [expected result] |--------|-------------|-----------------| | AC-1 | [test subject] | [expected result] | -## Integration Tests +## Blackbox Tests | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References | |--------|------------------------|-------------|-------------------|----------------| diff --git a/.cursor/skills/plan/SKILL.md b/.cursor/skills/plan/SKILL.md index ef4b3a1..b1cc48d 100644 --- a/.cursor/skills/plan/SKILL.md +++ b/.cursor/skills/plan/SKILL.md @@ -59,9 +59,9 @@ At the start of execution, create a TodoWrite with all steps (1 through 6 plus F ## Workflow -### Step 1: Integration Tests +### Step 1: Blackbox Tests -Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`. +Read and execute `.cursor/skills/test-spec/SKILL.md`. Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3. @@ -111,7 +111,7 @@ Read and follow `steps/07_quality-checklist.md`. - **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input - **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output - **Ignoring restrictions.md**: every constraint must be traceable in the architecture or risk register -- **Ignoring integration test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3) +- **Ignoring blackbox test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3) ## Escalation Rules @@ -135,7 +135,7 @@ Read and follow `steps/07_quality-checklist.md`. │ PREREQ: Data Gate (BLOCKING) │ │ → verify AC, restrictions, input_data, solution exist │ │ │ -│ 1. Integration Tests → blackbox-test-spec/SKILL.md │ +│ 1. Blackbox Tests → test-spec/SKILL.md │ │ [BLOCKING: user confirms test coverage] │ │ 2. Solution Analysis → architecture, data model, deployment │ │ [BLOCKING: user confirms architecture] │ diff --git a/.cursor/skills/plan/steps/01_artifact-management.md b/.cursor/skills/plan/steps/01_artifact-management.md index 7e09a42..1a5a9cf 100644 --- a/.cursor/skills/plan/steps/01_artifact-management.md +++ b/.cursor/skills/plan/steps/01_artifact-management.md @@ -6,12 +6,15 @@ All artifacts are written directly under DOCUMENT_DIR: ``` DOCUMENT_DIR/ -├── integration_tests/ -│ ├── environment.md -│ ├── test_data.md -│ ├── functional_tests.md -│ ├── non_functional_tests.md -│ └── traceability_matrix.md +├── tests/ +│ ├── test-environment.md +│ ├── test-data.md +│ ├── blackbox-tests.md +│ ├── performance-tests.md +│ ├── resilience-tests.md +│ ├── security-tests.md +│ ├── resource-limit-tests.md +│ └── traceability-matrix.md ├── architecture.md ├── system-flows.md ├── data_model.md @@ -47,11 +50,14 @@ DOCUMENT_DIR/ | Step | Save immediately after | Filename | |------|------------------------|----------| -| Step 1 | Integration test environment spec | `integration_tests/environment.md` | -| Step 1 | Integration test data spec | `integration_tests/test_data.md` | -| Step 1 | Integration functional tests | `integration_tests/functional_tests.md` | -| Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` | -| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` | +| Step 1 | Blackbox test environment spec | `tests/test-environment.md` | +| Step 1 | Blackbox test data spec | `tests/test-data.md` | +| Step 1 | Blackbox tests | `tests/blackbox-tests.md` | +| Step 1 | Blackbox performance tests | `tests/performance-tests.md` | +| Step 1 | Blackbox resilience tests | `tests/resilience-tests.md` | +| Step 1 | Blackbox security tests | `tests/security-tests.md` | +| Step 1 | Blackbox resource limit tests | `tests/resource-limit-tests.md` | +| Step 1 | Blackbox traceability matrix | `tests/traceability-matrix.md` | | Step 2 | Architecture analysis complete | `architecture.md` | | Step 2 | System flows documented | `system-flows.md` | | Step 2 | Data model documented | `data_model.md` | diff --git a/.cursor/skills/plan/steps/02_solution-analysis.md b/.cursor/skills/plan/steps/02_solution-analysis.md index 74f1554..701f409 100644 --- a/.cursor/skills/plan/steps/02_solution-analysis.md +++ b/.cursor/skills/plan/steps/02_solution-analysis.md @@ -7,7 +7,7 @@ ### Phase 2a: Architecture & Flows 1. Read all input files thoroughly -2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests) +2. Incorporate findings, questions, and insights discovered during Step 1 (blackbox tests) 3. Research unknown or questionable topics via internet; ask user about ambiguities 4. Document architecture using `templates/architecture.md` as structure 5. Document system flows using `templates/system-flows.md` as structure @@ -17,7 +17,7 @@ - [ ] System flows cover all main user/system interactions - [ ] No contradictions with problem.md or restrictions.md - [ ] Technology choices are justified -- [ ] Integration test findings are reflected in architecture decisions +- [ ] Blackbox test findings are reflected in architecture decisions **Save action**: Write `architecture.md` and `system-flows.md` diff --git a/.cursor/skills/plan/steps/03_component-decomposition.md b/.cursor/skills/plan/steps/03_component-decomposition.md index daadd3c..c026e65 100644 --- a/.cursor/skills/plan/steps/03_component-decomposition.md +++ b/.cursor/skills/plan/steps/03_component-decomposition.md @@ -5,7 +5,7 @@ **Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly. 1. Identify components from the architecture; think about separation, reusability, and communication patterns -2. Use integration test scenarios from Step 1 to validate component boundaries +2. Use blackbox test scenarios from Step 1 to validate component boundaries 3. If additional components are needed (data preparation, shared helpers), create them 4. For each component, write a spec using `templates/component-spec.md` as structure 5. Generate diagrams: @@ -19,7 +19,7 @@ - [ ] All inter-component interfaces are defined (who calls whom, with what) - [ ] Component dependency graph has no circular dependencies - [ ] All components from architecture.md are accounted for -- [ ] Every integration test scenario can be traced through component interactions +- [ ] Every blackbox test scenario can be traced through component interactions **Save action**: Write: - each component `components/[##]_[name]/description.md` diff --git a/.cursor/skills/plan/steps/06_jira-epics.md b/.cursor/skills/plan/steps/06_jira-epics.md index 3195684..b9a1ecd 100644 --- a/.cursor/skills/plan/steps/06_jira-epics.md +++ b/.cursor/skills/plan/steps/06_jira-epics.md @@ -35,7 +35,7 @@ Do NOT create minimal epics with just a summary and short description. The Jira **Self-verification**: - [ ] "Bootstrap & Initial Structure" epic exists and is first in order -- [ ] "Integration Tests" epic exists +- [ ] "Blackbox Tests" epic exists - [ ] Every component maps to exactly one epic - [ ] Dependency order is respected (no epic depends on a later one) - [ ] Acceptance criteria are measurable @@ -43,6 +43,6 @@ Do NOT create minimal epics with just a summary and short description. The Jira - [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs - [ ] Epic descriptions are self-contained — readable without opening other files -7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`. +7. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`. **Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs. diff --git a/.cursor/skills/plan/steps/07_quality-checklist.md b/.cursor/skills/plan/steps/07_quality-checklist.md index 0eff978..f883e88 100644 --- a/.cursor/skills/plan/steps/07_quality-checklist.md +++ b/.cursor/skills/plan/steps/07_quality-checklist.md @@ -2,8 +2,8 @@ Before writing the final report, verify ALL of the following: -### Integration Tests -- [ ] Every acceptance criterion is covered in traceability_matrix.md +### Blackbox Tests +- [ ] Every acceptance criterion is covered in traceability-matrix.md - [ ] Every restriction is verified by at least one test - [ ] Positive and negative scenarios are balanced - [ ] Docker environment is self-contained @@ -14,7 +14,7 @@ Before writing the final report, verify ALL of the following: - [ ] Covers all capabilities from solution.md - [ ] Technology choices are justified - [ ] Deployment model is defined -- [ ] Integration test findings are reflected in architecture decisions +- [ ] Blackbox test findings are reflected in architecture decisions ### Data Model - [ ] Every entity from architecture.md is defined @@ -35,7 +35,7 @@ Before writing the final report, verify ALL of the following: - [ ] No circular dependencies - [ ] All inter-component interfaces are defined and consistent - [ ] No orphan components (unused by any flow) -- [ ] Every integration test scenario can be traced through component interactions +- [ ] Every blackbox test scenario can be traced through component interactions ### Risks - [ ] All High/Critical risks have mitigations @@ -49,7 +49,7 @@ Before writing the final report, verify ALL of the following: ### Epics - [ ] "Bootstrap & Initial Structure" epic exists -- [ ] "Integration Tests" epic exists +- [ ] "Blackbox Tests" epic exists - [ ] Every component maps to an epic - [ ] Dependency order is correct - [ ] Acceptance criteria are measurable diff --git a/.cursor/skills/plan/templates/integration-functional-tests.md b/.cursor/skills/plan/templates/blackbox-tests.md similarity index 83% rename from .cursor/skills/plan/templates/integration-functional-tests.md rename to .cursor/skills/plan/templates/blackbox-tests.md index e57f7d4..d522698 100644 --- a/.cursor/skills/plan/templates/integration-functional-tests.md +++ b/.cursor/skills/plan/templates/blackbox-tests.md @@ -1,24 +1,24 @@ -# E2E Functional Tests Template +# Blackbox Tests Template -Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`. +Save as `DOCUMENT_DIR/tests/blackbox-tests.md`. --- ```markdown -# E2E Functional Tests +# Blackbox Tests ## Positive Scenarios ### FT-P-01: [Scenario Name] -**Summary**: [One sentence: what end-to-end use case this validates] +**Summary**: [One sentence: what black-box use case this validates] **Traces to**: AC-[ID], AC-[ID] **Category**: [which AC category — e.g., Position Accuracy, Image Processing, etc.] **Preconditions**: - [System state required before test] -**Input data**: [reference to specific data set or file from test_data.md] +**Input data**: [reference to specific data set or file from test-data.md] **Steps**: @@ -71,8 +71,8 @@ Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`. ## Guidance Notes -- Functional tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification. +- Blackbox tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification. - Positive scenarios validate the system does what it should. - Negative scenarios validate the system rejects or handles gracefully what it shouldn't accept. - Expected outcomes must be specific and measurable — not "works correctly" but "returns position within 50m of ground truth." -- Input data references should point to specific entries in test_data.md. +- Input data references should point to specific entries in test-data.md. diff --git a/.cursor/skills/plan/templates/epic-spec.md b/.cursor/skills/plan/templates/epic-spec.md index 872e99e..3157a84 100644 --- a/.cursor/skills/plan/templates/epic-spec.md +++ b/.cursor/skills/plan/templates/epic-spec.md @@ -80,7 +80,7 @@ Link to architecture.md and relevant component spec.] ### Definition of Done - [ ] All in-scope capabilities implemented -- [ ] Automated tests pass (unit + integration + e2e) +- [ ] Automated tests pass (unit + blackbox) - [ ] Minimum coverage threshold met (75%) - [ ] Runbooks written (if applicable) - [ ] Documentation updated diff --git a/.cursor/skills/plan/templates/integration-non-functional-tests.md b/.cursor/skills/plan/templates/integration-non-functional-tests.md deleted file mode 100644 index 6bf4c54..0000000 --- a/.cursor/skills/plan/templates/integration-non-functional-tests.md +++ /dev/null @@ -1,97 +0,0 @@ -# E2E Non-Functional Tests Template - -Save as `DOCUMENT_DIR/integration_tests/non_functional_tests.md`. - ---- - -```markdown -# E2E Non-Functional Tests - -## Performance Tests - -### NFT-PERF-01: [Test Name] - -**Summary**: [What performance characteristic this validates] -**Traces to**: AC-[ID] -**Metric**: [what is measured — latency, throughput, frame rate, etc.] - -**Preconditions**: -- [System state, load profile, data volume] - -**Steps**: - -| Step | Consumer Action | Measurement | -|------|----------------|-------------| -| 1 | [action] | [what to measure and how] | - -**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms] -**Duration**: [how long the test runs] - ---- - -## Resilience Tests - -### NFT-RES-01: [Test Name] - -**Summary**: [What failure/recovery scenario this validates] -**Traces to**: AC-[ID] - -**Preconditions**: -- [System state before fault injection] - -**Fault injection**: -- [What fault is introduced — process kill, network partition, invalid input sequence, etc.] - -**Steps**: - -| Step | Action | Expected Behavior | -|------|--------|------------------| -| 1 | [inject fault] | [system behavior during fault] | -| 2 | [observe recovery] | [system behavior after recovery] | - -**Pass criteria**: [recovery time, data integrity, continued operation] - ---- - -## Security Tests - -### NFT-SEC-01: [Test Name] - -**Summary**: [What security property this validates] -**Traces to**: AC-[ID], RESTRICT-[ID] - -**Steps**: - -| Step | Consumer Action | Expected Response | -|------|----------------|------------------| -| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] | - -**Pass criteria**: [specific security outcome] - ---- - -## Resource Limit Tests - -### NFT-RES-LIM-01: [Test Name] - -**Summary**: [What resource constraint this validates] -**Traces to**: AC-[ID], RESTRICT-[ID] - -**Preconditions**: -- [System running under specified constraints] - -**Monitoring**: -- [What resources to monitor — memory, CPU, GPU, disk, temperature] - -**Duration**: [how long to run] -**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout] -``` - ---- - -## Guidance Notes - -- Performance tests should run long enough to capture steady-state behavior, not just cold-start. -- Resilience tests must define both the fault and the expected recovery — not just "system should recover." -- Security tests at E2E level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities. -- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance. diff --git a/.cursor/skills/plan/templates/performance-tests.md b/.cursor/skills/plan/templates/performance-tests.md new file mode 100644 index 0000000..dfbcd14 --- /dev/null +++ b/.cursor/skills/plan/templates/performance-tests.md @@ -0,0 +1,35 @@ +# Performance Tests Template + +Save as `DOCUMENT_DIR/tests/performance-tests.md`. + +--- + +```markdown +# Performance Tests + +### NFT-PERF-01: [Test Name] + +**Summary**: [What performance characteristic this validates] +**Traces to**: AC-[ID] +**Metric**: [what is measured — latency, throughput, frame rate, etc.] + +**Preconditions**: +- [System state, load profile, data volume] + +**Steps**: + +| Step | Consumer Action | Measurement | +|------|----------------|-------------| +| 1 | [action] | [what to measure and how] | + +**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms] +**Duration**: [how long the test runs] +``` + +--- + +## Guidance Notes + +- Performance tests should run long enough to capture steady-state behavior, not just cold-start. +- Define clear pass/fail thresholds with specific metrics (p50, p95, p99 latency, throughput, etc.). +- Include warm-up preconditions to separate initialization cost from steady-state performance. diff --git a/.cursor/skills/plan/templates/resilience-tests.md b/.cursor/skills/plan/templates/resilience-tests.md new file mode 100644 index 0000000..72890ae --- /dev/null +++ b/.cursor/skills/plan/templates/resilience-tests.md @@ -0,0 +1,37 @@ +# Resilience Tests Template + +Save as `DOCUMENT_DIR/tests/resilience-tests.md`. + +--- + +```markdown +# Resilience Tests + +### NFT-RES-01: [Test Name] + +**Summary**: [What failure/recovery scenario this validates] +**Traces to**: AC-[ID] + +**Preconditions**: +- [System state before fault injection] + +**Fault injection**: +- [What fault is introduced — process kill, network partition, invalid input sequence, etc.] + +**Steps**: + +| Step | Action | Expected Behavior | +|------|--------|------------------| +| 1 | [inject fault] | [system behavior during fault] | +| 2 | [observe recovery] | [system behavior after recovery] | + +**Pass criteria**: [recovery time, data integrity, continued operation] +``` + +--- + +## Guidance Notes + +- Resilience tests must define both the fault and the expected recovery — not just "system should recover." +- Include specific recovery time expectations and data integrity checks. +- Test both graceful degradation (partial failure) and full recovery scenarios. diff --git a/.cursor/skills/plan/templates/resource-limit-tests.md b/.cursor/skills/plan/templates/resource-limit-tests.md new file mode 100644 index 0000000..53779e3 --- /dev/null +++ b/.cursor/skills/plan/templates/resource-limit-tests.md @@ -0,0 +1,31 @@ +# Resource Limit Tests Template + +Save as `DOCUMENT_DIR/tests/resource-limit-tests.md`. + +--- + +```markdown +# Resource Limit Tests + +### NFT-RES-LIM-01: [Test Name] + +**Summary**: [What resource constraint this validates] +**Traces to**: AC-[ID], RESTRICT-[ID] + +**Preconditions**: +- [System running under specified constraints] + +**Monitoring**: +- [What resources to monitor — memory, CPU, GPU, disk, temperature] + +**Duration**: [how long to run] +**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout] +``` + +--- + +## Guidance Notes + +- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance. +- Define specific numeric limits that can be programmatically checked. +- Include both the monitoring method and the threshold in the pass criteria. diff --git a/.cursor/skills/plan/templates/security-tests.md b/.cursor/skills/plan/templates/security-tests.md new file mode 100644 index 0000000..b243404 --- /dev/null +++ b/.cursor/skills/plan/templates/security-tests.md @@ -0,0 +1,30 @@ +# Security Tests Template + +Save as `DOCUMENT_DIR/tests/security-tests.md`. + +--- + +```markdown +# Security Tests + +### NFT-SEC-01: [Test Name] + +**Summary**: [What security property this validates] +**Traces to**: AC-[ID], RESTRICT-[ID] + +**Steps**: + +| Step | Consumer Action | Expected Response | +|------|----------------|------------------| +| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] | + +**Pass criteria**: [specific security outcome] +``` + +--- + +## Guidance Notes + +- Security tests at blackbox level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities. +- Verify the system remains operational after security-related edge cases (no crash, no hang). +- Test authentication/authorization boundaries from the consumer's perspective. diff --git a/.cursor/skills/plan/templates/integration-test-data.md b/.cursor/skills/plan/templates/test-data.md similarity index 62% rename from .cursor/skills/plan/templates/integration-test-data.md rename to .cursor/skills/plan/templates/test-data.md index 1ee4afe..0cee7fa 100644 --- a/.cursor/skills/plan/templates/integration-test-data.md +++ b/.cursor/skills/plan/templates/test-data.md @@ -1,11 +1,11 @@ -# E2E Test Data Template +# Test Data Template -Save as `DOCUMENT_DIR/integration_tests/test_data.md`. +Save as `DOCUMENT_DIR/tests/test-data.md`. --- ```markdown -# E2E Test Data Management +# Test Data Management ## Seed Data Sets @@ -23,6 +23,12 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`. |-----------------|----------------|-------------|-----------------| | [filename] | `_docs/00_problem/input_data/[filename]` | [what it contains] | [test IDs that use this data] | +## Expected Results Mapping + +| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source | +|-----------------|------------|-----------------|-------------------|-----------|----------------------| +| [test ID] | `input_data/[filename]` | [quantifiable expected output] | [exact / tolerance / pattern / threshold / file-diff] | [± value or N/A] | `input_data/expected_results/[filename]` or inline | + ## External Dependency Mocks | External Service | Mock/Stub | How Provided | Behavior | @@ -42,5 +48,8 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`. - Every seed data set should be traceable to specific test scenarios. - Input data from `_docs/00_problem/input_data/` should be mapped to test scenarios that use it. +- Every input data item MUST have a corresponding expected result in the Expected Results Mapping table. +- Expected results MUST be quantifiable: exact values, numeric tolerances, pattern matches, thresholds, or reference files. "Works correctly" is never acceptable. +- For complex expected outputs, provide machine-readable reference files (JSON, CSV) in `_docs/00_problem/input_data/expected_results/` and reference them in the mapping. - External mocks must be deterministic — same input always produces same output. - Data isolation must guarantee no test can affect another test's outcome. diff --git a/.cursor/skills/plan/templates/integration-environment.md b/.cursor/skills/plan/templates/test-environment.md similarity index 92% rename from .cursor/skills/plan/templates/integration-environment.md rename to .cursor/skills/plan/templates/test-environment.md index 9382dfa..b5d74fa 100644 --- a/.cursor/skills/plan/templates/integration-environment.md +++ b/.cursor/skills/plan/templates/test-environment.md @@ -1,16 +1,16 @@ -# E2E Test Environment Template +# Test Environment Template -Save as `DOCUMENT_DIR/integration_tests/environment.md`. +Save as `DOCUMENT_DIR/tests/environment.md`. --- ```markdown -# E2E Test Environment +# Test Environment ## Overview **System under test**: [main system name and entry points — API URLs, message queues, serial ports, etc.] -**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals. +**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating black-box use cases without access to internals. ## Docker Environment diff --git a/.cursor/skills/plan/templates/test-spec.md b/.cursor/skills/plan/templates/test-spec.md index 2b6ee44..5b7b83e 100644 --- a/.cursor/skills/plan/templates/test-spec.md +++ b/.cursor/skills/plan/templates/test-spec.md @@ -17,7 +17,7 @@ Use this template for each component's test spec. Save as `components/[##]_[name --- -## Integration Tests +## Blackbox Tests ### IT-01: [Test Name] @@ -169,4 +169,4 @@ Use this template for each component's test spec. Save as `components/[##]_[name - If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2"). - Performance test targets should come from the NFR section in `architecture.md`. - Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component. -- Not every component needs all 4 test types. A stateless utility component may only need integration tests. +- Not every component needs all 4 test types. A stateless utility component may only need blackbox tests. diff --git a/.cursor/skills/plan/templates/integration-traceability-matrix.md b/.cursor/skills/plan/templates/traceability-matrix.md similarity index 82% rename from .cursor/skills/plan/templates/integration-traceability-matrix.md rename to .cursor/skills/plan/templates/traceability-matrix.md index 0d63d81..e0192ac 100644 --- a/.cursor/skills/plan/templates/integration-traceability-matrix.md +++ b/.cursor/skills/plan/templates/traceability-matrix.md @@ -1,11 +1,11 @@ -# E2E Traceability Matrix Template +# Traceability Matrix Template -Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`. +Save as `DOCUMENT_DIR/tests/traceability-matrix.md`. --- ```markdown -# E2E Traceability Matrix +# Traceability Matrix ## Acceptance Criteria Coverage @@ -34,7 +34,7 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`. | Item | Reason Not Covered | Risk | Mitigation | |------|-------------------|------|-----------| -| [AC/Restriction ID] | [why it cannot be tested at E2E level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] | +| [AC/Restriction ID] | [why it cannot be tested at blackbox level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] | ``` --- @@ -44,4 +44,4 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`. - Every acceptance criterion must appear in the matrix — either covered or explicitly marked as not covered with a reason. - Every restriction must appear in the matrix. - NOT COVERED items must have a reason and a mitigation strategy (e.g., "covered at component test level" or "requires real hardware"). -- Coverage percentage should be at least 75% for acceptance criteria at the E2E level. +- Coverage percentage should be at least 75% for acceptance criteria at the blackbox test level. diff --git a/.cursor/skills/refactor/SKILL.md b/.cursor/skills/refactor/SKILL.md index e2124ff..1099328 100644 --- a/.cursor/skills/refactor/SKILL.md +++ b/.cursor/skills/refactor/SKILL.md @@ -155,7 +155,7 @@ Store in PROBLEM_DIR. | Metric Category | What to Capture | |----------------|-----------------| -| **Coverage** | Overall, unit, integration, critical paths | +| **Coverage** | Overall, unit, blackbox, critical paths | | **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio | | **Code Smells** | Total, critical, major | | **Performance** | Response times (P50/P95/P99), CPU/memory, throughput | @@ -279,11 +279,11 @@ Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`: Coverage requirements (must meet before refactoring): - Minimum overall coverage: 75% - Critical path coverage: 90% -- All public APIs must have integration tests +- All public APIs must have blackbox tests - All error handling paths must be tested For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`: -- Integration tests: summary, current behavior, input data, expected result, max expected time +- Blackbox tests: summary, current behavior, input data, expected result, max expected time - Acceptance tests: summary, preconditions, steps with expected results - Coverage analysis: current %, target %, uncovered critical paths @@ -297,7 +297,7 @@ For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_ **Self-verification**: - [ ] Coverage requirements met (75% overall, 90% critical paths) - [ ] All tests pass on current codebase -- [ ] All public APIs have integration tests +- [ ] All public APIs have blackbox tests - [ ] Test data fixtures are configured **Save action**: Write test specs; implemented tests go into the project's test folder @@ -332,7 +332,7 @@ Write `REFACTOR_DIR/coupling_analysis.md`: For each change in the decoupling strategy: 1. Implement the change -2. Run integration tests +2. Run blackbox tests 3. Fix any failures 4. Commit with descriptive message diff --git a/.cursor/skills/test-spec/SKILL.md b/.cursor/skills/test-spec/SKILL.md new file mode 100644 index 0000000..3c0892f --- /dev/null +++ b/.cursor/skills/test-spec/SKILL.md @@ -0,0 +1,411 @@ +--- +name: test-spec +description: | + Test specification skill. Analyzes input data and expected results completeness, + then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits) + that treat the system as a black box. Every test pairs input data with quantifiable expected results + so tests can verify correctness, not just execution. + 3-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate. + Produces 8 artifacts under tests/. + Trigger phrases: + - "test spec", "test specification", "test scenarios" + - "blackbox test spec", "black box tests", "blackbox tests" + - "performance tests", "resilience tests", "security tests" +category: build +tags: [testing, black-box, blackbox-tests, test-specification, qa] +disable-model-invocation: true +--- + +# Test Scenario Specification + +Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals. + +## Core Principles + +- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details +- **Traceability**: every test traces to at least one acceptance criterion or restriction +- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work +- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding +- **Spec, don't code**: this workflow produces test specifications, never test implementation code +- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed +- **No test without expected result**: every test scenario MUST pair input data with a quantifiable expected result; a test that cannot compare actual output against a known-correct answer is not verifiable and must be removed + +## Context Resolution + +Fixed paths — no mode detection needed: + +- PROBLEM_DIR: `_docs/00_problem/` +- SOLUTION_DIR: `_docs/01_solution/` +- DOCUMENT_DIR: `_docs/02_document/` +- TESTS_OUTPUT_DIR: `_docs/02_document/tests/` + +Announce the resolved paths to the user before proceeding. + +## Input Specification + +### Required Files + +| File | Purpose | +|------|---------| +| `_docs/00_problem/problem.md` | Problem description and context | +| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria | +| `_docs/00_problem/restrictions.md` | Constraints and limitations | +| `_docs/00_problem/input_data/` | Reference data examples, expected results, and optional reference files | +| `_docs/01_solution/solution.md` | Finalized solution | + +### Expected Results Specification + +Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be **quantifiable** — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict. + +Expected results live inside `_docs/00_problem/input_data/` in one or both of: + +1. **Mapping file** (`input_data/expected_results.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md` + +2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file + +``` +input_data/ +├── expected_results.md ← required: input→expected result mapping +├── expected_results/ ← optional: complex reference files +│ ├── image_01_detections.json +│ └── batch_A_results.json +├── image_01.jpg +├── empty_scene.jpg +└── data_parameters.md +``` + +**Quantifiability requirements** (see template for full format and examples): +- Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`) +- Structured data: exact JSON/CSV values, or a reference file in `expected_results/` +- Counts: exact counts (e.g., "3 detections", "0 errors") +- Text/patterns: exact string or regex pattern to match +- Timing: threshold (e.g., "response ≤ 500ms") +- Error cases: expected error code, message pattern, or HTTP status + +### Optional Files (used when available) + +| File | Purpose | +|------|---------| +| `DOCUMENT_DIR/architecture.md` | System architecture for environment design | +| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage | +| `DOCUMENT_DIR/components/` | Component specs for interface identification | + +### Prerequisite Checks (BLOCKING) + +1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing** +2. `restrictions.md` exists and is non-empty — **STOP if missing** +3. `input_data/` exists and contains at least one file — **STOP if missing** +4. `input_data/expected_results.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."* +5. `problem.md` exists and is non-empty — **STOP if missing** +6. `solution.md` exists and is non-empty — **STOP if missing** +7. Create TESTS_OUTPUT_DIR if it does not exist +8. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?** + +## Artifact Management + +### Directory Structure + +``` +TESTS_OUTPUT_DIR/ +├── environment.md +├── test-data.md +├── blackbox-tests.md +├── performance-tests.md +├── resilience-tests.md +├── security-tests.md +├── resource-limit-tests.md +└── traceability-matrix.md +``` + +### Save Timing + +| Phase | Save immediately after | Filename | +|-------|------------------------|----------| +| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — | +| Phase 2 | Environment spec | `environment.md` | +| Phase 2 | Test data spec | `test-data.md` | +| Phase 2 | Blackbox tests | `blackbox-tests.md` | +| Phase 2 | Performance tests | `performance-tests.md` | +| Phase 2 | Resilience tests | `resilience-tests.md` | +| Phase 2 | Security tests | `security-tests.md` | +| Phase 2 | Resource limit tests | `resource-limit-tests.md` | +| Phase 2 | Traceability matrix | `traceability-matrix.md` | +| Phase 3 | Updated test data spec (if data added) | `test-data.md` | +| Phase 3 | Updated test files (if tests removed) | respective test file | +| Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` | + +### Resumability + +If TESTS_OUTPUT_DIR already contains files: + +1. List existing files and match them to the save timing table above +2. Identify which phase/artifacts are complete +3. Resume from the next incomplete artifact +4. Inform the user which artifacts are being skipped + +## Progress Tracking + +At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes. + +## Workflow + +### Phase 1: Input Data Completeness Analysis + +**Role**: Professional Quality Assurance Engineer +**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios +**Constraints**: Analysis only — no test specs yet + +1. Read `_docs/01_solution/solution.md` +2. Read `acceptance_criteria.md`, `restrictions.md` +3. Read testing strategy from solution.md (if present) +4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows +5. Read `input_data/expected_results.md` and any referenced files in `input_data/expected_results/` +6. Analyze `input_data/` contents against: + - Coverage of acceptance criteria scenarios + - Coverage of restriction edge cases + - Coverage of testing strategy requirements +7. Analyze `input_data/expected_results.md` completeness: + - Every input data item has a corresponding expected result row in the mapping + - Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result") + - Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template + - Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid +8. Present input-to-expected-result pairing assessment: + +| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) | +|------------|--------------------------|---------------|----------------| +| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] | + +9. Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result +10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results.md` +11. If expected results are missing or not quantifiable, ask user to provide them before proceeding + +**BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient. + +--- + +### Phase 2: Test Scenario Specification + +**Role**: Professional Quality Assurance Engineer +**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios +**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built. + +Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios: + +1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure +2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure +3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure +4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure +5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure +6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure +7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure +8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure + +**Self-verification**: +- [ ] Every acceptance criterion is covered by at least one test scenario +- [ ] Every restriction is verified by at least one test scenario +- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results.md` +- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md` +- [ ] Positive and negative scenarios are balanced +- [ ] Consumer app has no direct access to system internals +- [ ] Docker environment is self-contained (`docker compose up` sufficient) +- [ ] External dependencies have mock/stub services defined +- [ ] Traceability matrix has no uncovered AC or restrictions + +**Save action**: Write all files under TESTS_OUTPUT_DIR: +- `environment.md` +- `test-data.md` +- `blackbox-tests.md` +- `performance-tests.md` +- `resilience-tests.md` +- `security-tests.md` +- `resource-limit-tests.md` +- `traceability-matrix.md` + +**BLOCKING**: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed. + +Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.). + +--- + +### Phase 3: Test Data Validation Gate (HARD GATE) + +**Role**: Professional Quality Assurance Engineer +**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%. +**Constraints**: This phase is MANDATORY and cannot be skipped. + +#### Step 1 — Build the test-data and expected-result requirements checklist + +Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, extract: + +| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? | +|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------| +| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] | + +Present this table to the user. + +#### Step 2 — Ask user to provide missing test data AND expected results + +For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user: + +> **Option A — Provide the missing items**: Supply what is missing: +> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location. +> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance. +> +> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples: +> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px" +> - "HTTP 200 with JSON body matching `expected_response_01.json`" +> - "Processing time < 500ms" +> - "0 false positives in the output set" +> +> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification. + +**BLOCKING**: Wait for the user's response for every missing item. + +#### Step 3 — Validate provided data and expected results + +For each item where the user chose **Option A**: + +**Input data validation**: +1. Verify the data file(s) exist at the indicated location +2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges) +3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants) + +**Expected result validation**: +4. Verify the expected result exists in `input_data/expected_results.md` or as a referenced file in `input_data/expected_results/` +5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of: + - Exact values (counts, strings, status codes) + - Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`) + - Pattern matches (regex, substring, JSON schema) + - Thresholds (e.g., `< 500ms`, `≤ 5% error rate`) + - Reference file for structural comparison (JSON diff, CSV diff) +6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple) +7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to + +If any validation fails, report the specific issue and loop back to Step 2 for that item. + +#### Step 4 — Remove tests without data or expected results + +For each item where the user chose **Option B**: + +1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.` +2. Remove the test scenario from the respective test file +3. Remove corresponding rows from `traceability-matrix.md` +4. Update `test-data.md` to reflect the removal + +**Save action**: Write updated files under TESTS_OUTPUT_DIR: +- `test-data.md` +- Affected test files (if tests removed) +- `traceability-matrix.md` (if tests removed) + +#### Step 5 — Final coverage check + +After all removals, recalculate coverage: + +1. Count remaining test scenarios that trace to acceptance criteria +2. Count total acceptance criteria + restrictions +3. Calculate coverage percentage: `covered_items / total_items * 100` + +| Metric | Value | +|--------|-------| +| Total AC + Restrictions | ? | +| Covered by remaining tests | ? | +| **Coverage %** | **?%** | + +**Decision**: + +- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user. +- **Coverage < 70%** → Phase 3 **FAILED**. Report: + > ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions: + > + > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed | + > |---|---|---| + > + > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply. + + **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%. + +#### Phase 3 Completion + +When coverage ≥ 70% and all remaining tests have validated data AND quantifiable expected results: + +1. Present the final coverage report +2. List all removed tests (if any) with reasons +3. Confirm every remaining test has: input data + quantifiable expected result + comparison method +4. Confirm all artifacts are saved and consistent + +--- + +## Escalation Rules + +| Situation | Action | +|-----------|--------| +| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed | +| Missing input_data/expected_results.md | **STOP** — ask user to provide expected results mapping using the template | +| Ambiguous requirements | ASK user | +| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate | +| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding | +| Test scenario conflicts with restrictions | ASK user to clarify intent | +| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md | +| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test | +| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec | + +## Common Mistakes + +- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test +- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values +- **Missing expected results**: input data without a paired expected result is useless — the test cannot determine pass/fail without knowing what "correct" looks like +- **Non-quantifiable expected results**: "should return good results" is not verifiable; expected results must have exact values, tolerances, thresholds, or pattern matches that code can evaluate +- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests +- **Untraceable tests**: every test should trace to at least one AC or restriction +- **Writing test code**: this skill produces specifications, never implementation code +- **Tests without data**: every test scenario MUST have concrete test data AND a quantifiable expected result; a test spec without either is not executable and must be removed + +## Trigger Conditions + +When the user wants to: +- Specify blackbox tests before implementation or refactoring +- Analyze input data completeness for test coverage +- Produce test scenarios from acceptance criteria + +**Keywords**: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios" + +## Methodology Quick Reference + +``` +┌──────────────────────────────────────────────────────────────────────┐ +│ Test Scenario Specification (3-Phase) │ +├──────────────────────────────────────────────────────────────────────┤ +│ PREREQ: Data Gate (BLOCKING) │ +│ → verify AC, restrictions, input_data (incl. expected_results.md) │ +│ │ +│ Phase 1: Input Data & Expected Results Completeness Analysis │ +│ → assess input_data/ coverage vs AC scenarios (≥70%) │ +│ → verify every input has a quantifiable expected result │ +│ → present input→expected-result pairing assessment │ +│ [BLOCKING: user confirms input data + expected results coverage] │ +│ │ +│ Phase 2: Test Scenario Specification │ +│ → environment.md │ +│ → test-data.md (with expected results mapping) │ +│ → blackbox-tests.md (positive + negative) │ +│ → performance-tests.md │ +│ → resilience-tests.md │ +│ → security-tests.md │ +│ → resource-limit-tests.md │ +│ → traceability-matrix.md │ +│ [BLOCKING: user confirms test coverage] │ +│ │ +│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE) │ +│ → build test-data + expected-result requirements checklist │ +│ → ask user: provide data+result (A) or remove test (B) │ +│ → validate input data (quality + quantity) │ +│ → validate expected results (quantifiable + comparison method) │ +│ → remove tests without data or expected result, warn user │ +│ → final coverage check (≥70% or FAIL + loop back) │ +│ [BLOCKING: coverage ≥ 70% required to pass] │ +├──────────────────────────────────────────────────────────────────────┤ +│ Principles: Black-box only · Traceability · Save immediately │ +│ Ask don't assume · Spec don't code │ +│ No test without data · No test without expected result │ +└──────────────────────────────────────────────────────────────────────┘ +``` diff --git a/.cursor/skills/test-spec/templates/expected-results.md b/.cursor/skills/test-spec/templates/expected-results.md new file mode 100644 index 0000000..0700733 --- /dev/null +++ b/.cursor/skills/test-spec/templates/expected-results.md @@ -0,0 +1,135 @@ +# Expected Results Template + +Save as `_docs/00_problem/input_data/expected_results.md`. +For complex expected outputs, create `_docs/00_problem/input_data/expected_results/` and place reference files there. +Referenced by the test-spec skill (`.cursor/skills/test-spec/SKILL.md`). + +--- + +```markdown +# Expected Results + +Maps every input data item to its quantifiable expected result. +Tests use this mapping to compare actual system output against known-correct answers. + +## Result Format Legend + +| Result Type | When to Use | Example | +|-------------|-------------|---------| +| Exact value | Output must match precisely | `status_code: 200`, `detection_count: 3` | +| Tolerance range | Numeric output with acceptable variance | `confidence: 0.92 ± 0.05`, `bbox_x: 120 ± 10px` | +| Threshold | Output must exceed or stay below a limit | `latency < 500ms`, `confidence ≥ 0.85` | +| Pattern match | Output must match a string/regex pattern | `error_message contains "invalid format"` | +| File reference | Complex output compared against a reference file | `match expected_results/case_01.json` | +| Schema match | Output structure must conform to a schema | `response matches DetectionResultSchema` | +| Set/count | Output must contain specific items or counts | `classes ⊇ {"car", "person"}`, `detections.length == 5` | + +## Comparison Methods + +| Method | Description | Tolerance Syntax | +|--------|-------------|-----------------| +| `exact` | Actual == Expected | N/A | +| `numeric_tolerance` | abs(actual - expected) ≤ tolerance | `± ` or `± %` | +| `range` | min ≤ actual ≤ max | `[min, max]` | +| `threshold_min` | actual ≥ threshold | `≥ ` | +| `threshold_max` | actual ≤ threshold | `≤ ` | +| `regex` | actual matches regex pattern | regex string | +| `substring` | actual contains substring | substring | +| `json_diff` | structural comparison against reference JSON | diff tolerance per field | +| `set_contains` | actual output set contains expected items | subset notation | +| `file_reference` | compare against reference file in expected_results/ | file path | + +## Input → Expected Result Mapping + +### [Scenario Group Name, e.g. "Single Image Detection"] + +| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | +|---|-------|-------------------|-----------------|------------|-----------|---------------| +| 1 | `[file or parameters]` | [what this input represents] | [quantifiable expected output] | [method from table above] | [± value, range, or N/A] | [path in expected_results/ or N/A] | + +#### Example — Object Detection + +| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | +|---|-------|-------------------|-----------------|------------|-----------|---------------| +| 1 | `image_01.jpg` | Aerial photo, 3 vehicles visible | `detection_count: 3`, classes: `["ArmorVehicle", "ArmorVehicle", "Truck"]` | exact (count), set_contains (classes) | N/A | N/A | +| 2 | `image_01.jpg` | Same image, bbox positions | bboxes: `[(120,80,340,290), (400,150,580,310), (50,400,200,520)]` | numeric_tolerance | ± 15px per coordinate | `expected_results/image_01_detections.json` | +| 3 | `image_01.jpg` | Same image, confidence scores | confidences: `[0.94, 0.88, 0.91]` | threshold_min | each ≥ 0.85 | N/A | +| 4 | `empty_scene.jpg` | Aerial photo, no objects | `detection_count: 0`, empty detections array | exact | N/A | N/A | +| 5 | `corrupted.dat` | Invalid file format | HTTP 400, body contains `"error"` key | exact (status), substring (body) | N/A | N/A | + +#### Example — Performance + +| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | +|---|-------|-------------------|-----------------|------------|-----------|---------------| +| 1 | `standard_image.jpg` | 1920x1080 single image | Response time | threshold_max | ≤ 2000ms | N/A | +| 2 | `large_image.jpg` | 8000x6000 tiled image | Response time | threshold_max | ≤ 10000ms | N/A | + +#### Example — Error Handling + +| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | +|---|-------|-------------------|-----------------|------------|-----------|---------------| +| 1 | `POST /detect` with no file | Missing required input | HTTP 422, message matches `"file.*required"` | exact (status), regex (message) | N/A | N/A | +| 2 | `POST /detect` with `probability_threshold: 5.0` | Out-of-range config | HTTP 422 or clamped to valid range | exact (status) or range [0.0, 1.0] | N/A | N/A | + +## Expected Result Reference Files + +When the expected output is too complex for an inline table cell (e.g., full JSON response with nested objects), place a reference file in `_docs/00_problem/input_data/expected_results/`. + +### File Naming Convention + +`_expected.` + +Examples: +- `image_01_detections.json` +- `batch_A_results.csv` +- `video_01_annotations.json` + +### Reference File Requirements + +- Must be machine-readable (JSON, CSV, YAML — not prose) +- Must contain only the expected output structure and values +- Must include tolerance annotations where applicable (as metadata fields or comments) +- Must be valid and parseable by standard libraries + +### Reference File Example (JSON) + +File: `expected_results/image_01_detections.json` + +​```json +{ + "input": "image_01.jpg", + "expected": { + "detection_count": 3, + "detections": [ + { + "class": "ArmorVehicle", + "confidence": { "min": 0.85 }, + "bbox": { "x1": 120, "y1": 80, "x2": 340, "y2": 290, "tolerance_px": 15 } + }, + { + "class": "ArmorVehicle", + "confidence": { "min": 0.85 }, + "bbox": { "x1": 400, "y1": 150, "x2": 580, "y2": 310, "tolerance_px": 15 } + }, + { + "class": "Truck", + "confidence": { "min": 0.85 }, + "bbox": { "x1": 50, "y1": 400, "x2": 200, "y2": 520, "tolerance_px": 15 } + } + ] + } +} +​``` +``` + +--- + +## Guidance Notes + +- Every row in the mapping table must have at least one quantifiable comparison — no row should say only "should work" or "returns result". +- Use `exact` comparison for counts, status codes, and discrete values. +- Use `numeric_tolerance` for floating-point values and spatial coordinates where minor variance is expected. +- Use `threshold_min`/`threshold_max` for performance metrics and confidence scores. +- Use `file_reference` when the expected output has more than ~3 fields or nested structures. +- Reference files must be committed alongside input data — they are part of the test specification. +- When the system has non-deterministic behavior (e.g., model inference variance across hardware), document the expected tolerance explicitly and justify it. diff --git a/.cursor/skills/ui-design/SKILL.md b/.cursor/skills/ui-design/SKILL.md new file mode 100644 index 0000000..afbd431 --- /dev/null +++ b/.cursor/skills/ui-design/SKILL.md @@ -0,0 +1,254 @@ +--- +name: ui-design +description: | + End-to-end UI design workflow: requirements gathering → design system synthesis → HTML+CSS mockup generation → visual verification → iterative refinement. + Zero external dependencies. Optional MCP enhancements (RenderLens, AccessLint). + Two modes: + - Full workflow: phases 0-8 for complex design tasks + - Quick mode: skip to code generation for simple requests + Command entry points: + - /design-audit — quality checks on existing mockup + - /design-polish — final refinement pass + - /design-critique — UX review with feedback + - /design-regen — regenerate with different direction + Trigger phrases: + - "design a UI", "create a mockup", "build a page" + - "make a landing page", "design a dashboard" + - "mockup", "design system", "UI design" +category: create +tags: [ui-design, mockup, html, css, tailwind, design-system, accessibility] +disable-model-invocation: true +--- + +# UI Design Skill + +End-to-end UI design workflow producing production-quality HTML+CSS mockups entirely within Cursor, with zero external tool dependencies. + +## Core Principles + +- **Design intent over defaults**: never settle for generic AI output; every visual choice must trace to user requirements +- **Verify visually**: AI must see what it generates whenever possible (browser screenshots) +- **Tokens over hardcoded values**: use CSS custom properties with semantic naming, not raw hex +- **Restraint over decoration**: less is more; every visual element must earn its place +- **Ask, don't assume**: when design direction is ambiguous, STOP and ask the user +- **One screen at a time**: generate individual screens, not entire applications at once + +## Context Resolution + +Determine the operating mode based on invocation before any other logic runs. + +**Project mode** (default — `_docs/` structure exists): +- MOCKUPS_DIR: `_docs/02_document/ui_mockups/` + +**Standalone mode** (explicit input file provided, e.g. `/ui-design @some_brief.md`): +- INPUT_FILE: the provided file (treated as design brief) +- MOCKUPS_DIR: `_standalone/ui_mockups/` + +Create MOCKUPS_DIR if it does not exist. Announce the detected mode and resolved path to the user. + +## Output Directory + +All generated artifacts go to `MOCKUPS_DIR`: + +``` +MOCKUPS_DIR/ +├── DESIGN.md # Generated design system (three-layer tokens) +├── index.html # Main mockup (or named per page) +└── [page-name].html # Additional pages if multi-page +``` + +## Complexity Detection (Phase 0) + +Before starting the workflow, classify the request: + +**Quick mode** — skip to Phase 5 (Code Generation): +- Request is a single component or screen +- User provides enough style context in their message +- `MOCKUPS_DIR/DESIGN.md` already exists +- Signals: "just make a...", "quick mockup of...", single component name, less than 2 sentences + +**Full mode** — run phases 1-8: +- Multi-page request +- Brand-specific requirements +- "design system for...", complex layouts, dashboard/admin panel +- No existing DESIGN.md + +Announce the detected mode to the user. + +## Phase 1: Context Check + +1. Check for existing project documentation: PRD, design specs, README with design notes +2. Check for existing `MOCKUPS_DIR/DESIGN.md` +3. Check for existing mockups in `MOCKUPS_DIR/` +4. If DESIGN.md exists → announce "Using existing design system" → skip to Phase 5 +5. If project docs with design info exist → extract requirements from them, skip to Phase 3 + +## Phase 2: Requirements Gathering + +Use the AskQuestion tool for structured input. Adapt based on what Phase 1 found — only ask for what's missing. + +**Round 1 — Structural:** + +Ask using AskQuestion with these questions: +- **Page type**: landing, dashboard, form, settings, profile, admin panel, e-commerce, blog, documentation, other +- **Target audience**: developers, business users, consumers, internal team, general public +- **Platform**: web desktop-first, web mobile-first +- **Key sections**: header, hero, sidebar, main content, cards grid, data table, form, footer (allow multiple) + +**Round 2 — Design Intent:** + +Ask using AskQuestion with these questions: +- **Visual atmosphere**: Airy & spacious / Dense & data-rich / Warm & approachable / Sharp & technical / Luxurious & premium +- **Color mood**: Cool blues & grays / Warm earth tones / Bold & vibrant / Monochrome / Dark mode / Let AI choose based on atmosphere / Custom (specify brand colors) +- **Typography mood**: Geometric (modern, clean) / Humanist (friendly, readable) / Monospace (technical, code-like) / Serif (editorial, premium) + +Then ask in free-form: +- "Name an app or website whose look you admire" (optional, helps anchor style) +- "Any specific content, copy, or data to include?" + +## Phase 3: Direction Exploration + +Generate 2-3 text-based direction summaries. Each direction is 3-5 sentences describing: +- Visual approach and mood +- Color palette direction (specific hues, not just "blue") +- Layout strategy (grid type, density, whitespace approach) +- Typography choice (specific font suggestions, not just "sans-serif") + +Present to user: "Here are 2-3 possible directions. Which resonates? Or describe a blend." + +Wait for user to pick before proceeding. + +## Phase 4: Design System Synthesis + +Generate `MOCKUPS_DIR/DESIGN.md` using the template from `templates/design-system.md`. + +The generated DESIGN.md must include all 6 sections: +1. Visual Atmosphere — descriptive mood (never "clean and modern") +2. Color System — three-layer CSS custom properties (primitives → semantic → component) +3. Typography — specific font family, weight hierarchy, size scale with rem values +4. Spacing & Layout — base unit, spacing scale, grid, breakpoints +5. Component Styling Defaults — buttons, cards, inputs, navigation with all states +6. Interaction States — loading, error, empty, hover, focus, disabled patterns + +Read `references/design-vocabulary.md` for atmosphere descriptors and style vocabulary to use when writing the DESIGN.md. + +## Phase 5: Code Generation + +Construct the generation by combining context from multiple sources: + +1. Read `MOCKUPS_DIR/DESIGN.md` for the design system +2. Read `references/components.md` for component best practices relevant to the page type +3. Read `references/anti-patterns.md` for explicit avoidance instructions + +Generate `MOCKUPS_DIR/[page-name].html` as a single file with: +- `` for Tailwind +- `