Refactor testing framework to replace integration tests with blackbox tests across various skills and documentation. Update related workflows, templates, and task specifications to align with the new blackbox testing approach. Remove obsolete integration test files and enhance clarity in task management and reporting structures.

2026-06-21 09:41:09 +00:00 · 2026-03-24 03:38:36 +02:00
parent ae3ad50b9e
commit e609586c7c
49 changed files with 2222 additions and 872 deletions
@@ -7,13 +7,13 @@ Workflow for projects with an existing codebase. Starts with documentation, prod
 | Step | Name                    | Sub-Skill                       | Internal SubSteps                     |
 |------|-------------------------|---------------------------------|---------------------------------------|
 | —    | Document (pre-step)     | document/SKILL.md               | Steps 1–8                             |
-| 2b   | Blackbox Test Spec      | blackbox-test-spec/SKILL.md     | Phase 1a–1b                           |
+| 2b   | Blackbox Test Spec      | test-spec/SKILL.md              | Phase 1a–1b                           |
 | 2c   | Decompose Tests         | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4             |
 | 2d   | Implement Tests         | implement/SKILL.md              | (batch-driven, no fixed sub-steps)    |
 | 2e   | Refactor                | refactor/SKILL.md               | Phases 0–5 (6-phase method)           |
 | 2f   | New Task                | new-task/SKILL.md               | Steps 1–8 (loop)                      |
 | 2g   | Implement               | implement/SKILL.md              | (batch-driven, no fixed sub-steps)    |
-| 2h   | Run Tests               | (autopilot-managed)             | Unit tests → Integration/blackbox tests |
+| 2h   | Run Tests               | (autopilot-managed)             | Unit tests → Blackbox tests |
 | 2hb  | Security Audit          | security/SKILL.md               | Phase 1–5 (optional)                  |
 | 2i   | Deploy                  | deploy/SKILL.md                 | Steps 1–7                             |

@@ -49,20 +49,20 @@ Action: An existing codebase without documentation was detected. Present using C
 ---

 **Step 2b — Blackbox Test Spec**
-Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/integration_tests/traceability_matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry)
+Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/tests/traceability-matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry)

-Action: Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`
+Action: Read and execute `.cursor/skills/test-spec/SKILL.md`

 This step applies when the codebase was documented via the `/document` skill. Test specifications must be produced before refactoring or further development.

 ---

 **Step 2c — Decompose Tests**
-Condition: `_docs/02_document/integration_tests/traceability_matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files)
+Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files)

-Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/integration_tests/` as input). The decompose skill will:
+Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will:
 1. Run Step 1t (test infrastructure bootstrap)
-2. Run Step 3 (integration test task decomposition)
+2. Run Step 3 (blackbox test task decomposition)
 3. Run Step 4 (cross-verification against test coverage)

 If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
@@ -117,7 +117,7 @@ Condition: the autopilot state shows Step 2g (Implement) is completed AND the au
 Action: Run the full test suite to verify the implementation before deployment.

 1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
-2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite
+2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests

 If all tests pass → auto-chain to Step 2hb (Security Audit).
@@ -11,7 +11,7 @@ Workflow for new projects built from scratch. Flows linearly: Problem → Resear
 | 2    | Plan      | plan/SKILL.md          | Step 1–6 + Final                      |
 | 3    | Decompose | decompose/SKILL.md     | Step 1–4                              |
 | 4    | Implement | implement/SKILL.md     | (batch-driven, no fixed sub-steps)    |
-| 5    | Run Tests | (autopilot-managed)    | Unit tests → Integration/blackbox tests |
+| 5    | Run Tests | (autopilot-managed)    | Unit tests → Blackbox tests |
 | 5b   | Security Audit | security/SKILL.md | Phase 1–5 (optional)                  |
 | 6    | Deploy    | deploy/SKILL.md        | Step 1–7                              |

@@ -100,7 +100,7 @@ Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND t
 Action: Run the full test suite to verify the implementation before deployment.

 1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
-2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite
+2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests

 If all tests pass → auto-chain to Step 5b (Security Audit).
@@ -1,321 +0,0 @@
---
-name: blackbox-test-spec
-description: |
-  Black-box integration test specification skill. Analyzes input data completeness and produces
-  detailed E2E test scenarios (functional + non-functional) that treat the system as a black box.
-  3-phase workflow: input data completeness analysis, test scenario specification, test data validation gate.
-  Produces 5 artifacts under integration_tests/.
-  Trigger phrases:
-  - "blackbox test spec", "black box tests", "integration test spec"
-  - "test specification", "e2e test spec"
-  - "test scenarios", "black box scenarios"
-category: build
-tags: [testing, black-box, integration-tests, e2e, test-specification, qa]
-disable-model-invocation: true
---
-
-# Black-Box Test Scenario Specification
-
-Analyze input data completeness and produce detailed black-box integration test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
-
-## Core Principles
-
- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
- **Traceability**: every test traces to at least one acceptance criterion or restriction
- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
- **Spec, don't code**: this workflow produces test specifications, never test implementation code
- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed
-
-## Context Resolution
-
-Fixed paths — no mode detection needed:
-
- PROBLEM_DIR: `_docs/00_problem/`
- SOLUTION_DIR: `_docs/01_solution/`
- DOCUMENT_DIR: `_docs/02_document/`
- TESTS_OUTPUT_DIR: `_docs/02_document/integration_tests/`
-
-Announce the resolved paths to the user before proceeding.
-
-## Input Specification
-
-### Required Files
-
-| File | Purpose |
-|------|---------|
-| `_docs/00_problem/problem.md` | Problem description and context |
-| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
-| `_docs/00_problem/restrictions.md` | Constraints and limitations |
-| `_docs/00_problem/input_data/` | Reference data examples |
-| `_docs/01_solution/solution.md` | Finalized solution |
-
-### Optional Files (used when available)
-
-| File | Purpose |
-|------|---------|
-| `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
-| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
-| `DOCUMENT_DIR/components/` | Component specs for interface identification |
-
-### Prerequisite Checks (BLOCKING)
-
-1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
-2. `restrictions.md` exists and is non-empty — **STOP if missing**
-3. `input_data/` exists and contains at least one file — **STOP if missing**
-4. `problem.md` exists and is non-empty — **STOP if missing**
-5. `solution.md` exists and is non-empty — **STOP if missing**
-6. Create TESTS_OUTPUT_DIR if it does not exist
-7. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**
-
-## Artifact Management
-
-### Directory Structure
-
-```
-TESTS_OUTPUT_DIR/
-├── environment.md
-├── test_data.md
-├── functional_tests.md
-├── non_functional_tests.md
-└── traceability_matrix.md
-```
-
-### Save Timing
-
-| Phase | Save immediately after | Filename |
-|-------|------------------------|----------|
-| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
-| Phase 2 | Environment spec | `environment.md` |
-| Phase 2 | Test data spec | `test_data.md` |
-| Phase 2 | Functional tests | `functional_tests.md` |
-| Phase 2 | Non-functional tests | `non_functional_tests.md` |
-| Phase 2 | Traceability matrix | `traceability_matrix.md` |
-| Phase 3 | Updated test data spec (if data added) | `test_data.md` |
-| Phase 3 | Updated functional tests (if tests removed) | `functional_tests.md` |
-| Phase 3 | Updated non-functional tests (if tests removed) | `non_functional_tests.md` |
-| Phase 3 | Updated traceability matrix (if tests removed) | `traceability_matrix.md` |
-
-### Resumability
-
-If TESTS_OUTPUT_DIR already contains files:
-
-1. List existing files and match them to the save timing table above
-2. Identify which phase/artifacts are complete
-3. Resume from the next incomplete artifact
-4. Inform the user which artifacts are being skipped
-
-## Progress Tracking
-
-At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes.
-
-## Workflow
-
-### Phase 1: Input Data Completeness Analysis
-
-**Role**: Professional Quality Assurance Engineer
-**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
-**Constraints**: Analysis only — no test specs yet
-
-1. Read `_docs/01_solution/solution.md`
-2. Read `acceptance_criteria.md`, `restrictions.md`
-3. Read testing strategy from solution.md (if present)
-4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
-5. Analyze `input_data/` contents against:
-   - Coverage of acceptance criteria scenarios
-   - Coverage of restriction edge cases
-   - Coverage of testing strategy requirements
-6. Threshold: at least 70% coverage of the scenarios
-7. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/`
-8. Present coverage assessment to user
-
-**BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient.
-
---
-
-### Phase 2: Black-Box Test Scenario Specification
-
-**Role**: Professional Quality Assurance Engineer
-**Goal**: Produce detailed black-box test specifications covering functional and non-functional scenarios
-**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
-
-Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
-
-1. Define test environment using `.cursor/skills/plan/templates/integration-environment.md` as structure
-2. Define test data management using `.cursor/skills/plan/templates/integration-test-data.md` as structure
-3. Write functional test scenarios (positive + negative) using `.cursor/skills/plan/templates/integration-functional-tests.md` as structure
-4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `.cursor/skills/plan/templates/integration-non-functional-tests.md` as structure
-5. Build traceability matrix using `.cursor/skills/plan/templates/integration-traceability-matrix.md` as structure
-
-**Self-verification**:
- [ ] Every acceptance criterion is covered by at least one test scenario
- [ ] Every restriction is verified by at least one test scenario
- [ ] Positive and negative scenarios are balanced
- [ ] Consumer app has no direct access to system internals
- [ ] Docker environment is self-contained (`docker compose up` sufficient)
- [ ] External dependencies have mock/stub services defined
- [ ] Traceability matrix has no uncovered AC or restrictions
-
-**Save action**: Write all files under TESTS_OUTPUT_DIR:
- `environment.md`
- `test_data.md`
- `functional_tests.md`
- `non_functional_tests.md`
- `traceability_matrix.md`
-
-**BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed.
-
-Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
-
---
-
-### Phase 3: Test Data Validation Gate (HARD GATE)
-
-**Role**: Professional Quality Assurance Engineer
-**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%.
-**Constraints**: This phase is MANDATORY and cannot be skipped.
-
-#### Step 1 — Build the test-data requirements checklist
-
-Scan `functional_tests.md` and `non_functional_tests.md`. For every test scenario, extract:
-
-| # | Test Scenario ID | Test Name | Required Data Description | Required Data Quality | Required Data Quantity | Data Provided? |
-|---|-----------------|-----------|---------------------------|----------------------|----------------------|----------------|
-
-Present this table to the user.
-
-#### Step 2 — Ask user to provide test data
-
-For each row where **Data Provided?** is **No**, ask the user:
-
-> **Option A — Provide the data**: Supply the necessary test data files (with required quality and quantity as described in the table). Place them in `_docs/00_problem/input_data/` or indicate the location.
->
-> **Option B — Skip this test**: If you cannot provide the data, this test scenario will be **removed** from the specification.
-
-**BLOCKING**: Wait for the user's response for every missing data item.
-
-#### Step 3 — Validate provided data
-
-For each item where the user chose **Option A**:
-
-1. Verify the data file(s) exist at the indicated location
-2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
-3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
-4. If validation fails, report the specific issue and loop back to Step 2 for that item
-
-#### Step 4 — Remove tests without data
-
-For each item where the user chose **Option B**:
-
-1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data.`
-2. Remove the test scenario from `functional_tests.md` or `non_functional_tests.md`
-3. Remove corresponding rows from `traceability_matrix.md`
-4. Update `test_data.md` to reflect the removal
-
-**Save action**: Write updated files under TESTS_OUTPUT_DIR:
- `test_data.md`
- `functional_tests.md` (if tests removed)
- `non_functional_tests.md` (if tests removed)
- `traceability_matrix.md` (if tests removed)
-
-#### Step 5 — Final coverage check
-
-After all removals, recalculate coverage:
-
-1. Count remaining test scenarios that trace to acceptance criteria
-2. Count total acceptance criteria + restrictions
-3. Calculate coverage percentage: `covered_items / total_items * 100`
-
-| Metric | Value |
-|--------|-------|
-| Total AC + Restrictions | ? |
-| Covered by remaining tests | ? |
-| **Coverage %** | **?%** |
-
-**Decision**:
-
- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user.
- **Coverage < 70%** → Phase 3 **FAILED**. Report:
-  > ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
-  >
-  > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
-  > |---|---|---|
-  >
-  > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
-
-  **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%.
-
-#### Phase 3 Completion
-
-When coverage ≥ 70% and all remaining tests have validated data:
-
-1. Present the final coverage report
-2. List all removed tests (if any) with reasons
-3. Confirm all artifacts are saved and consistent
-
---
-
-## Escalation Rules
-
-| Situation | Action |
-|-----------|--------|
-| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
-| Ambiguous requirements | ASK user |
-| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
-| Test scenario conflicts with restrictions | ASK user to clarify intent |
-| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
-| Test data not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
-| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
-
-## Common Mistakes
-
- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
- **Untraceable tests**: every test should trace to at least one AC or restriction
- **Writing test code**: this skill produces specifications, never implementation code
- **Tests without data**: every test scenario MUST have concrete test data; a test spec without data is not executable and must be removed
-
-## Trigger Conditions
-
-When the user wants to:
- Specify black-box integration tests before implementation or refactoring
- Analyze input data completeness for test coverage
- Produce E2E test scenarios from acceptance criteria
-
-**Keywords**: "blackbox test spec", "black box tests", "integration test spec", "test specification", "e2e test spec", "test scenarios"
-
-## Methodology Quick Reference
-
-```
-┌─────────────────────────────────────────────────────────────────┐
-│       Black-Box Test Scenario Specification (3-Phase)           │
-├─────────────────────────────────────────────────────────────────┤
-│ PREREQ: Data Gate (BLOCKING)                                    │
-│   → verify AC, restrictions, input_data, solution exist         │
-│                                                                 │
-│ Phase 1: Input Data Completeness Analysis                       │
-│   → assess input_data/ coverage vs AC scenarios (≥70%)          │
-│   [BLOCKING: user confirms input data coverage]                 │
-│                                                                 │
-│ Phase 2: Black-Box Test Scenario Specification                  │
-│   → environment.md                                              │
-│   → test_data.md                                                │
-│   → functional_tests.md (positive + negative)                   │
-│   → non_functional_tests.md (perf, resilience, security, limits)│
-│   → traceability_matrix.md                                      │
-│   [BLOCKING: user confirms test coverage]                       │
-│                                                                 │
-│ Phase 3: Test Data Validation Gate (HARD GATE)                  │
-│   → build test-data requirements checklist                      │
-│   → ask user: provide data (Option A) or remove test (Option B) │
-│   → validate provided data (quality + quantity)                 │
-│   → remove tests without data, warn user                        │
-│   → final coverage check (≥70% or FAIL + loop back)            │
-│   [BLOCKING: coverage ≥ 70% required to pass]                  │
-├─────────────────────────────────────────────────────────────────┤
-│ Principles: Black-box only · Traceability · Save immediately    │
-│             Ask don't assume · Spec don't code                  │
-│             No test without data                                │
-└─────────────────────────────────────────────────────────────────┘
-```
@@ -46,7 +46,7 @@ For each task, verify implementation satisfies every acceptance criterion:

 - Walk through each AC (Given/When/Then) and trace it in the code
 - Check that unit tests cover each AC
- Check that integration tests exist where specified in the task spec
+- Check that blackbox tests exist where specified in the task spec
 - Flag any AC that is not demonstrably satisfied as a **Spec-Gap** finding (severity: High)
 - Flag any scope creep (implementation beyond what the spec asked for) as a **Scope** finding (severity: Low)

@@ -2,7 +2,7 @@
 name: decompose
 description: |
  Decompose planned components into atomic implementable tasks with bootstrap structure plan.
-  4-step workflow: bootstrap structure plan, component task decomposition, integration test task decomposition, and cross-task verification.
+  4-step workflow: bootstrap structure plan, component task decomposition, blackbox test task decomposition, and cross-task verification.
  Supports full decomposition (_docs/ structure), single component mode, and tests-only mode.
  Trigger phrases:
  - "decompose", "decompose features", "feature decomposition"
@@ -36,7 +36,7 @@ Determine the operating mode based on invocation before any other logic runs.
 - DOCUMENT_DIR: `_docs/02_document/`
 - TASKS_DIR: `_docs/02_tasks/`
 - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, DOCUMENT_DIR
- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (integration tests) + Step 4 (cross-verification)
+- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (blackbox tests) + Step 4 (cross-verification)

 **Single component mode** (provided file is within `_docs/02_document/` and inside a `components/` subdirectory):
 - DOCUMENT_DIR: `_docs/02_document/`
@@ -45,12 +45,12 @@ Determine the operating mode based on invocation before any other logic runs.
 - Ask user for the parent Epic ID
 - Runs Step 2 (that component only, appending to existing task numbering)

-**Tests-only mode** (provided file/directory is within `integration_tests/`, or `DOCUMENT_DIR/integration_tests/` exists and input explicitly requests test decomposition):
+**Tests-only mode** (provided file/directory is within `tests/`, or `DOCUMENT_DIR/tests/` exists and input explicitly requests test decomposition):
 - DOCUMENT_DIR: `_docs/02_document/`
 - TASKS_DIR: `_docs/02_tasks/`
- TESTS_DIR: `DOCUMENT_DIR/integration_tests/`
+- TESTS_DIR: `DOCUMENT_DIR/tests/`
 - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR
- Runs Step 1t (test infrastructure bootstrap) + Step 3 (integration test decomposition) + Step 4 (cross-verification against test coverage)
+- Runs Step 1t (test infrastructure bootstrap) + Step 3 (blackbox test decomposition) + Step 4 (cross-verification against test coverage)
 - Skips Step 1 (project bootstrap) and Step 2 (component decomposition) — the codebase already exists

 Announce the detected mode and resolved paths to the user before proceeding.
@@ -70,7 +70,7 @@ Announce the detected mode and resolved paths to the user before proceeding.
 | `DOCUMENT_DIR/architecture.md` | Architecture from plan skill |
 | `DOCUMENT_DIR/system-flows.md` | System flows from plan skill |
 | `DOCUMENT_DIR/components/[##]_[name]/description.md` | Component specs from plan skill |
-| `DOCUMENT_DIR/integration_tests/` | Integration test specs from plan skill |
+| `DOCUMENT_DIR/tests/` | Blackbox test specs from plan skill |

 **Single component mode:**

@@ -84,10 +84,13 @@ Announce the detected mode and resolved paths to the user before proceeding.
 | File | Purpose |
 |------|---------|
 | `TESTS_DIR/environment.md` | Test environment specification (Docker services, networks, volumes) |
-| `TESTS_DIR/test_data.md` | Test data management (seed data, mocks, isolation) |
-| `TESTS_DIR/functional_tests.md` | Functional test scenarios (positive + negative) |
-| `TESTS_DIR/non_functional_tests.md` | Non-functional test scenarios (perf, resilience, security, limits) |
-| `TESTS_DIR/traceability_matrix.md` | AC/restriction coverage mapping |
+| `TESTS_DIR/test-data.md` | Test data management (seed data, mocks, isolation) |
+| `TESTS_DIR/blackbox-tests.md` | Blackbox functional scenarios (positive + negative) |
+| `TESTS_DIR/performance-tests.md` | Performance test scenarios |
+| `TESTS_DIR/resilience-tests.md` | Resilience test scenarios |
+| `TESTS_DIR/security-tests.md` | Security test scenarios |
+| `TESTS_DIR/resource-limit-tests.md` | Resource limit test scenarios |
+| `TESTS_DIR/traceability-matrix.md` | AC/restriction coverage mapping |
 | `_docs/00_problem/problem.md` | Problem context |
 | `_docs/00_problem/restrictions.md` | Constraints for test design |
 | `_docs/00_problem/acceptance_criteria.md` | Acceptance criteria being verified |
@@ -103,7 +106,7 @@ Announce the detected mode and resolved paths to the user before proceeding.
 1. The provided component file exists and is non-empty — **STOP if missing**

 **Tests-only mode:**
-1. `TESTS_DIR/functional_tests.md` exists and is non-empty — **STOP if missing**
+1. `TESTS_DIR/blackbox-tests.md` exists and is non-empty — **STOP if missing**
 2. `TESTS_DIR/environment.md` exists — **STOP if missing**
 3. Create TASKS_DIR if it does not exist
 4. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?**
@@ -130,7 +133,7 @@ TASKS_DIR/
 | Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` |
 | Step 1t | Test infrastructure bootstrap complete + Jira ticket created + file renamed | `[JIRA-ID]_test_infrastructure.md` |
 | Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
-| Step 3 | Each integration test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
+| Step 3 | Each blackbox test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
 | Step 4 | Cross-task verification complete | `_dependencies_table.md` |

 ### Resumability
@@ -153,7 +156,7 @@ At the start of execution, create a TodoWrite with all applicable steps. Update
 **Goal**: Produce `01_test_infrastructure.md` — the first task describing the test project scaffold
 **Constraints**: This is a plan document, not code. The `/implement` skill executes it.

-1. Read `TESTS_DIR/environment.md` and `TESTS_DIR/test_data.md`
+1. Read `TESTS_DIR/environment.md` and `TESTS_DIR/test-data.md`
 2. Read problem.md, restrictions.md, acceptance_criteria.md for domain context
 3. Document the test infrastructure plan using `templates/test-infrastructure-task.md`

@@ -162,20 +165,20 @@ The test infrastructure bootstrap must include:
 - Mock/stub service definitions for each external dependency
 - `docker-compose.test.yml` structure from environment.md
 - Test runner configuration (framework, plugins, fixtures)
- Test data fixture setup from test_data.md seed data sets
+- Test data fixture setup from test-data.md seed data sets
 - Test reporting configuration (format, output path)
 - Data isolation strategy

 **Self-verification**:
 - [ ] Every external dependency from environment.md has a mock service defined
 - [ ] Docker Compose structure covers all services from environment.md
- [ ] Test data fixtures cover all seed data sets from test_data.md
+- [ ] Test data fixtures cover all seed data sets from test-data.md
 - [ ] Test runner configuration matches the consumer app tech stack from environment.md
 - [ ] Data isolation strategy is defined

 **Save action**: Write `01_test_infrastructure.md` (temporary numeric name)

-**Jira action**: Create a Jira ticket for this task under the "Integration Tests" epic. Write the Jira ticket ID and Epic ID back into the task header.
+**Jira action**: Create a Jira ticket for this task under the "Blackbox Tests" epic. Write the Jira ticket ID and Epic ID back into the task header.

 **Rename action**: Rename the file from `01_test_infrastructure.md` to `[JIRA-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename.

@@ -199,27 +202,27 @@ The bootstrap structure plan must include:
 - Shared models, interfaces, and DTOs
 - Dockerfile per component (multi-stage, non-root, health checks, pinned base images)
 - `docker-compose.yml` for local development (all components + database + dependencies)
- `docker-compose.test.yml` for integration test environment (black-box test runner)
+- `docker-compose.test.yml` for blackbox test environment (blackbox test runner)
 - `.dockerignore`
 - CI/CD pipeline file (`.github/workflows/ci.yml` or `azure-pipelines.yml`) with stages from `deployment/ci_cd_pipeline.md`
 - Database migration setup and initial seed data scripts
 - Observability configuration: structured logging setup, health check endpoints (`/health/live`, `/health/ready`), metrics endpoint (`/metrics`)
 - Environment variable documentation (`.env.example`)
- Test structure with unit and integration test locations
+- Test structure with unit and blackbox test locations

 **Self-verification**:
 - [ ] All components have corresponding folders in the layout
 - [ ] All inter-component interfaces have DTOs defined
 - [ ] Dockerfile defined for each component
 - [ ] `docker-compose.yml` covers all components and dependencies
- [ ] `docker-compose.test.yml` enables black-box integration testing
+- [ ] `docker-compose.test.yml` enables blackbox testing
 - [ ] CI/CD pipeline file defined with lint, test, security, build, deploy stages
 - [ ] Database migration setup included
 - [ ] Health check endpoints specified for each service
 - [ ] Structured logging configuration included
 - [ ] `.env.example` with all required environment variables
 - [ ] Environment strategy covers dev, staging, production
- [ ] Test structure includes unit and integration test locations
+- [ ] Test structure includes unit and blackbox test locations

 **Save action**: Write `01_initial_structure.md` (temporary numeric name)

@@ -265,33 +268,33 @@ For each component (or the single provided component):

 ---

-### Step 3: Integration Test Task Decomposition (default and tests-only modes)
+### Step 3: Blackbox Test Task Decomposition (default and tests-only modes)

 **Role**: Professional Quality Assurance Engineer
-**Goal**: Decompose integration test specs into atomic, implementable task specs
+**Goal**: Decompose blackbox test specs into atomic, implementable task specs
 **Constraints**: Behavioral specs only — describe what, not how. No test code.

 **Numbering**:
 - In default mode: continue sequential numbering from where Step 2 left off.
 - In tests-only mode: start from 02 (01 is the test infrastructure bootstrap from Step 1t).

-1. Read all test specs from `DOCUMENT_DIR/integration_tests/` (functional_tests.md, non_functional_tests.md)
+1. Read all test specs from `DOCUMENT_DIR/tests/` (`blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, `resource-limit-tests.md`)
 2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
-3. Each task should reference the specific test scenarios it implements and the environment/test_data specs
+3. Each task should reference the specific test scenarios it implements and the environment/test-data specs
 4. Dependencies:
-   - In default mode: integration test tasks depend on the component implementation tasks they exercise
-   - In tests-only mode: integration test tasks depend on the test infrastructure bootstrap task (Step 1t)
+   - In default mode: blackbox test tasks depend on the component implementation tasks they exercise
+   - In tests-only mode: blackbox test tasks depend on the test infrastructure bootstrap task (Step 1t)
 5. Write each task spec using `templates/task.md`
 6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
 7. Note task dependencies (referencing Jira IDs of already-created dependency tasks)
-8. **Immediately after writing each task file**: create a Jira ticket under the "Integration Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
+8. **Immediately after writing each task file**: create a Jira ticket under the "Blackbox Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.

 **Self-verification**:
- [ ] Every functional test scenario from `integration_tests/functional_tests.md` is covered by a task
- [ ] Every non-functional test scenario from `integration_tests/non_functional_tests.md` is covered by a task
+- [ ] Every scenario from `tests/blackbox-tests.md` is covered by a task
+- [ ] Every scenario from `tests/performance-tests.md`, `tests/resilience-tests.md`, `tests/security-tests.md`, and `tests/resource-limit-tests.md` is covered by a task
 - [ ] No task exceeds 5 complexity points
 - [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode)
- [ ] Every task has a Jira ticket linked to the "Integration Tests" epic
+- [ ] Every task has a Jira ticket linked to the "Blackbox Tests" epic

 **Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`.

@@ -306,7 +309,7 @@ For each component (or the single provided component):
 1. Verify task dependencies across all tasks are consistent
 2. Check no gaps:
   - In default mode: every interface in architecture.md has tasks covering it
-   - In tests-only mode: every test scenario in `traceability_matrix.md` is covered by a task
+   - In tests-only mode: every test scenario in `traceability-matrix.md` is covered by a task
 3. Check no overlaps: tasks don't duplicate work
 4. Check no circular dependencies in the task graph
 5. Produce `_dependencies_table.md` using `templates/dependencies-table.md`
@@ -320,7 +323,7 @@ Default mode:
 - [ ] `_dependencies_table.md` contains every task with correct dependencies

 Tests-only mode:
- [ ] Every test scenario from traceability_matrix.md "Covered" entries has a corresponding task
+- [ ] Every test scenario from traceability-matrix.md "Covered" entries has a corresponding task
 - [ ] No circular dependencies in the task graph
 - [ ] Test task dependencies reference the test infrastructure bootstrap
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
@@ -366,14 +369,14 @@ Tests-only mode:
 │  1.  Bootstrap Structure  → [JIRA-ID]_initial_structure.md     │
 │      [BLOCKING: user confirms structure]                       │
 │  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
-│  3.  Integration Tests    → [JIRA-ID]_[short_name].md each    │
+│  3.  Blackbox Tests       → [JIRA-ID]_[short_name].md each    │
 │  4.  Cross-Verification   → _dependencies_table.md            │
 │      [BLOCKING: user confirms dependencies]                    │
 │                                                                │
 │ TESTS-ONLY MODE:                                                │
 │  1t. Test Infrastructure  → [JIRA-ID]_test_infrastructure.md   │
 │      [BLOCKING: user confirms test scaffold]                   │
-│  3.  Integration Tests    → [JIRA-ID]_[short_name].md each    │
+│  3.  Blackbox Tests       → [JIRA-ID]_[short_name].md each    │
 │  4.  Cross-Verification   → _dependencies_table.md            │
 │      [BLOCKING: user confirms dependencies]                    │
 │                                                                │
@@ -49,7 +49,7 @@ project-root/
 | Build | Compile/bundle the application | Every push |
 | Lint / Static Analysis | Code quality and style checks | Every push |
 | Unit Tests | Run unit test suite | Every push |
-| Integration Tests | Run integration test suite | Every push |
+| Blackbox Tests | Run blackbox test suite | Every push |
 | Security Scan | SAST / dependency check | Every push |
 | Deploy to Staging | Deploy to staging environment | Merge to staging branch |

@@ -64,7 +64,7 @@ Then [expected result]
 |--------|-------------|-----------------|
 | AC-1 | [test subject] | [expected result] |

-## Integration Tests
+## Blackbox Tests

 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
@@ -9,10 +9,10 @@ Use this template for the test infrastructure bootstrap (Step 1t in tests-only m

 **Task**: [JIRA-ID]_test_infrastructure
 **Name**: Test Infrastructure
-**Description**: Scaffold the E2E test project — test runner, mock services, Docker test environment, test data fixtures, reporting
+**Description**: Scaffold the Blackbox test project — test runner, mock services, Docker test environment, test data fixtures, reporting
 **Complexity**: [3|5] points
 **Dependencies**: None
-**Component**: Integration Tests
+**Component**: Blackbox Tests
 **Jira**: [TASK-ID]
 **Epic**: [EPIC-ID]

@@ -124,6 +124,6 @@ Then a report file exists at the configured output path with correct columns

 - This is a PLAN document, not code. The `/implement` skill executes it.
 - Focus on test infrastructure decisions, not individual test implementations.
- Reference environment.md and test_data.md from the test specs — don't repeat everything.
+- Reference environment.md and test-data.md from the test specs — don't repeat everything.
 - Mock services must be deterministic: same input always produces same output.
 - The Docker environment must be self-contained: `docker compose up` sufficient.
@@ -20,7 +20,7 @@ Plan and document the full deployment lifecycle: check deployment status and env

 ## Core Principles

- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker
+- **Docker-first**: every component runs in a container; local dev, blackbox tests, and production all use Docker
 - **Infrastructure as code**: all deployment configuration is version-controlled
 - **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
 - **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
@@ -157,7 +157,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 ### Step 2: Containerization

 **Role**: DevOps / Platform engineer
-**Goal**: Define Docker configuration for every component, local development, and integration test environments
+**Goal**: Define Docker configuration for every component, local development, and blackbox test environments
 **Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.

 1. Read architecture.md and all component specs
@@ -176,7 +176,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
   - Any message queues, caches, or external service mocks
   - Shared network
   - Environment variable files (`.env`)
-6. Define `docker-compose.test.yml` for integration tests:
+6. Define `docker-compose.test.yml` for blackbox tests:
   - Application components under test
   - Test runner container (black-box, no internal imports)
   - Isolated database with seed data
@@ -189,7 +189,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 - [ ] Non-root user for all containers
 - [ ] Health checks defined for every service
 - [ ] docker-compose.yml covers all components + dependencies
- [ ] docker-compose.test.yml enables black-box integration testing
+- [ ] docker-compose.test.yml enables black-box testing
 - [ ] `.dockerignore` defined

 **Save action**: Write `containerization.md` using `templates/containerization.md`
@@ -212,7 +212,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 | Stage | Trigger | Steps | Quality Gate |
 |-------|---------|-------|-------------|
 | **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
-| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage |
+| **Test** | Every push | Unit tests, blackbox tests, coverage report | 75%+ coverage |
 | **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
 | **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
 | **Push** | After build | Push to container registry | Push succeeds |
@@ -458,7 +458,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda

 - **Implementing during planning**: Steps 1–6 produce documents, not code (Step 7 is the exception — it creates scripts)
 - **Hardcoding secrets**: never include real credentials in deployment documents or scripts
- **Ignoring integration test containerization**: the test environment must be containerized alongside the app
+- **Ignoring blackbox test containerization**: the test environment must be containerized alongside the app
 - **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
 - **Using `:latest` tags**: always pin base image versions
 - **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
@@ -28,7 +28,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`.

 ### Test
 - Unit tests: [framework and command]
- Integration tests: [framework and command, uses docker-compose.test.yml]
+- Blackbox tests: [framework and command, uses docker-compose.test.yml]
 - Coverage threshold: 75% overall, 90% critical paths
 - Coverage report published as pipeline artifact

@@ -54,7 +54,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`.
 - Automated rollback on health check failure

 ### Smoke Tests
- Subset of integration tests targeting staging environment
+- Subset of blackbox tests targeting staging environment
 - Validates critical user flows
 - Timeout: [maximum duration]

@@ -48,7 +48,7 @@ networks:
  [shared network]
 ```

-## Docker Compose — Integration Tests
+## Docker Compose — Blackbox Tests

 ```yaml
 # docker-compose.test.yml structure
@@ -64,7 +64,7 @@ Then [expected result]
 |--------|-------------|-----------------|
 | AC-1 | [test subject] | [expected result] |

-## Integration Tests
+## Blackbox Tests

 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
@@ -59,9 +59,9 @@ At the start of execution, create a TodoWrite with all steps (1 through 6 plus F

 ## Workflow

-### Step 1: Integration Tests
+### Step 1: Blackbox Tests

-Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`.
+Read and execute `.cursor/skills/test-spec/SKILL.md`.

 Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3.

@@ -111,7 +111,7 @@ Read and follow `steps/07_quality-checklist.md`.
 - **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input
 - **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output
 - **Ignoring restrictions.md**: every constraint must be traceable in the architecture or risk register
- **Ignoring integration test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)
+- **Ignoring blackbox test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)

 ## Escalation Rules

@@ -135,7 +135,7 @@ Read and follow `steps/07_quality-checklist.md`.
 │ PREREQ: Data Gate (BLOCKING)                                    │
 │   → verify AC, restrictions, input_data, solution exist         │
 │                                                                │
-│ 1. Integration Tests   → blackbox-test-spec/SKILL.md            │
+│ 1. Blackbox Tests      → test-spec/SKILL.md                     │
 │    [BLOCKING: user confirms test coverage]                     │
 │ 2. Solution Analysis   → architecture, data model, deployment   │
 │    [BLOCKING: user confirms architecture]                      │
@@ -6,12 +6,15 @@ All artifacts are written directly under DOCUMENT_DIR:

 ```
 DOCUMENT_DIR/
-├── integration_tests/
-│   ├── environment.md
-│   ├── test_data.md
-│   ├── functional_tests.md
-│   ├── non_functional_tests.md
-│   └── traceability_matrix.md
+├── tests/
+│   ├── test-environment.md
+│   ├── test-data.md
+│   ├── blackbox-tests.md
+│   ├── performance-tests.md
+│   ├── resilience-tests.md
+│   ├── security-tests.md
+│   ├── resource-limit-tests.md
+│   └── traceability-matrix.md
 ├── architecture.md
 ├── system-flows.md
 ├── data_model.md
@@ -47,11 +50,14 @@ DOCUMENT_DIR/

 | Step | Save immediately after | Filename |
 |------|------------------------|----------|
-| Step 1 | Integration test environment spec | `integration_tests/environment.md` |
-| Step 1 | Integration test data spec | `integration_tests/test_data.md` |
-| Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
-| Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
-| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
+| Step 1 | Blackbox test environment spec | `tests/test-environment.md` |
+| Step 1 | Blackbox test data spec | `tests/test-data.md` |
+| Step 1 | Blackbox tests | `tests/blackbox-tests.md` |
+| Step 1 | Blackbox performance tests | `tests/performance-tests.md` |
+| Step 1 | Blackbox resilience tests | `tests/resilience-tests.md` |
+| Step 1 | Blackbox security tests | `tests/security-tests.md` |
+| Step 1 | Blackbox resource limit tests | `tests/resource-limit-tests.md` |
+| Step 1 | Blackbox traceability matrix | `tests/traceability-matrix.md` |
 | Step 2 | Architecture analysis complete | `architecture.md` |
 | Step 2 | System flows documented | `system-flows.md` |
 | Step 2 | Data model documented | `data_model.md` |
@@ -7,7 +7,7 @@
 ### Phase 2a: Architecture & Flows

 1. Read all input files thoroughly
-2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
+2. Incorporate findings, questions, and insights discovered during Step 1 (blackbox tests)
 3. Research unknown or questionable topics via internet; ask user about ambiguities
 4. Document architecture using `templates/architecture.md` as structure
 5. Document system flows using `templates/system-flows.md` as structure
@@ -17,7 +17,7 @@
 - [ ] System flows cover all main user/system interactions
 - [ ] No contradictions with problem.md or restrictions.md
 - [ ] Technology choices are justified
- [ ] Integration test findings are reflected in architecture decisions
+- [ ] Blackbox test findings are reflected in architecture decisions

 **Save action**: Write `architecture.md` and `system-flows.md`

@@ -5,7 +5,7 @@
 **Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.

 1. Identify components from the architecture; think about separation, reusability, and communication patterns
-2. Use integration test scenarios from Step 1 to validate component boundaries
+2. Use blackbox test scenarios from Step 1 to validate component boundaries
 3. If additional components are needed (data preparation, shared helpers), create them
 4. For each component, write a spec using `templates/component-spec.md` as structure
 5. Generate diagrams:
@@ -19,7 +19,7 @@
 - [ ] All inter-component interfaces are defined (who calls whom, with what)
 - [ ] Component dependency graph has no circular dependencies
 - [ ] All components from architecture.md are accounted for
- [ ] Every integration test scenario can be traced through component interactions
+- [ ] Every blackbox test scenario can be traced through component interactions

 **Save action**: Write:
 - each component `components/[##]_[name]/description.md`
@@ -35,7 +35,7 @@ Do NOT create minimal epics with just a summary and short description. The Jira

 **Self-verification**:
 - [ ] "Bootstrap & Initial Structure" epic exists and is first in order
- [ ] "Integration Tests" epic exists
+- [ ] "Blackbox Tests" epic exists
 - [ ] Every component maps to exactly one epic
 - [ ] Dependency order is respected (no epic depends on a later one)
 - [ ] Acceptance criteria are measurable
@@ -43,6 +43,6 @@ Do NOT create minimal epics with just a summary and short description. The Jira
 - [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
 - [ ] Epic descriptions are self-contained — readable without opening other files

-7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
+7. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`.

 **Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
@@ -2,8 +2,8 @@

 Before writing the final report, verify ALL of the following:

-### Integration Tests
- [ ] Every acceptance criterion is covered in traceability_matrix.md
+### Blackbox Tests
+- [ ] Every acceptance criterion is covered in traceability-matrix.md
 - [ ] Every restriction is verified by at least one test
 - [ ] Positive and negative scenarios are balanced
 - [ ] Docker environment is self-contained
@@ -14,7 +14,7 @@ Before writing the final report, verify ALL of the following:
 - [ ] Covers all capabilities from solution.md
 - [ ] Technology choices are justified
 - [ ] Deployment model is defined
- [ ] Integration test findings are reflected in architecture decisions
+- [ ] Blackbox test findings are reflected in architecture decisions

 ### Data Model
 - [ ] Every entity from architecture.md is defined
@@ -35,7 +35,7 @@ Before writing the final report, verify ALL of the following:
 - [ ] No circular dependencies
 - [ ] All inter-component interfaces are defined and consistent
 - [ ] No orphan components (unused by any flow)
- [ ] Every integration test scenario can be traced through component interactions
+- [ ] Every blackbox test scenario can be traced through component interactions

 ### Risks
 - [ ] All High/Critical risks have mitigations
@@ -49,7 +49,7 @@ Before writing the final report, verify ALL of the following:

 ### Epics
 - [ ] "Bootstrap & Initial Structure" epic exists
- [ ] "Integration Tests" epic exists
+- [ ] "Blackbox Tests" epic exists
 - [ ] Every component maps to an epic
 - [ ] Dependency order is correct
 - [ ] Acceptance criteria are measurable
@@ -1,24 +1,24 @@
-# E2E Functional Tests Template
+# Blackbox Tests Template

-Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`.
+Save as `DOCUMENT_DIR/tests/blackbox-tests.md`.

 ---

 ```markdown
-# E2E Functional Tests
+# Blackbox Tests

 ## Positive Scenarios

 ### FT-P-01: [Scenario Name]

-**Summary**: [One sentence: what end-to-end use case this validates]
+**Summary**: [One sentence: what black-box use case this validates]
 **Traces to**: AC-[ID], AC-[ID]
 **Category**: [which AC category — e.g., Position Accuracy, Image Processing, etc.]

 **Preconditions**:
 - [System state required before test]

-**Input data**: [reference to specific data set or file from test_data.md]
+**Input data**: [reference to specific data set or file from test-data.md]

 **Steps**:

@@ -71,8 +71,8 @@ Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`.

 ## Guidance Notes

- Functional tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
+- Blackbox tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
 - Positive scenarios validate the system does what it should.
 - Negative scenarios validate the system rejects or handles gracefully what it shouldn't accept.
 - Expected outcomes must be specific and measurable — not "works correctly" but "returns position within 50m of ground truth."
- Input data references should point to specific entries in test_data.md.
+- Input data references should point to specific entries in test-data.md.
@@ -80,7 +80,7 @@ Link to architecture.md and relevant component spec.]
 ### Definition of Done

 - [ ] All in-scope capabilities implemented
- [ ] Automated tests pass (unit + integration + e2e)
+- [ ] Automated tests pass (unit + blackbox)
 - [ ] Minimum coverage threshold met (75%)
 - [ ] Runbooks written (if applicable)
 - [ ] Documentation updated
@@ -1,97 +0,0 @@
-# E2E Non-Functional Tests Template
-
-Save as `DOCUMENT_DIR/integration_tests/non_functional_tests.md`.
-
---
-
-```markdown
-# E2E Non-Functional Tests
-
-## Performance Tests
-
-### NFT-PERF-01: [Test Name]
-
-**Summary**: [What performance characteristic this validates]
-**Traces to**: AC-[ID]
-**Metric**: [what is measured — latency, throughput, frame rate, etc.]
-
-**Preconditions**:
- [System state, load profile, data volume]
-
-**Steps**:
-
-| Step | Consumer Action | Measurement |
-|------|----------------|-------------|
-| 1 | [action] | [what to measure and how] |
-
-**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
-**Duration**: [how long the test runs]
-
---
-
-## Resilience Tests
-
-### NFT-RES-01: [Test Name]
-
-**Summary**: [What failure/recovery scenario this validates]
-**Traces to**: AC-[ID]
-
-**Preconditions**:
- [System state before fault injection]
-
-**Fault injection**:
- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
-
-**Steps**:
-
-| Step | Action | Expected Behavior |
-|------|--------|------------------|
-| 1 | [inject fault] | [system behavior during fault] |
-| 2 | [observe recovery] | [system behavior after recovery] |
-
-**Pass criteria**: [recovery time, data integrity, continued operation]
-
---
-
-## Security Tests
-
-### NFT-SEC-01: [Test Name]
-
-**Summary**: [What security property this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Steps**:
-
-| Step | Consumer Action | Expected Response |
-|------|----------------|------------------|
-| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
-
-**Pass criteria**: [specific security outcome]
-
---
-
-## Resource Limit Tests
-
-### NFT-RES-LIM-01: [Test Name]
-
-**Summary**: [What resource constraint this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Preconditions**:
- [System running under specified constraints]
-
-**Monitoring**:
- [What resources to monitor — memory, CPU, GPU, disk, temperature]
-
-**Duration**: [how long to run]
-**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
-```
-
---
-
-## Guidance Notes
-
- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
- Security tests at E2E level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
@@ -0,0 +1,35 @@
+# Performance Tests Template
+
+Save as `DOCUMENT_DIR/tests/performance-tests.md`.
+
+---
+
+```markdown
+# Performance Tests
+
+### NFT-PERF-01: [Test Name]
+
+**Summary**: [What performance characteristic this validates]
+**Traces to**: AC-[ID]
+**Metric**: [what is measured — latency, throughput, frame rate, etc.]
+
+**Preconditions**:
+- [System state, load profile, data volume]
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | [action] | [what to measure and how] |
+
+**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
+**Duration**: [how long the test runs]
+```
+
+---
+
+## Guidance Notes
+
+- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
+- Define clear pass/fail thresholds with specific metrics (p50, p95, p99 latency, throughput, etc.).
+- Include warm-up preconditions to separate initialization cost from steady-state performance.
@@ -0,0 +1,37 @@
+# Resilience Tests Template
+
+Save as `DOCUMENT_DIR/tests/resilience-tests.md`.
+
+---
+
+```markdown
+# Resilience Tests
+
+### NFT-RES-01: [Test Name]
+
+**Summary**: [What failure/recovery scenario this validates]
+**Traces to**: AC-[ID]
+
+**Preconditions**:
+- [System state before fault injection]
+
+**Fault injection**:
+- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | [inject fault] | [system behavior during fault] |
+| 2 | [observe recovery] | [system behavior after recovery] |
+
+**Pass criteria**: [recovery time, data integrity, continued operation]
+```
+
+---
+
+## Guidance Notes
+
+- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
+- Include specific recovery time expectations and data integrity checks.
+- Test both graceful degradation (partial failure) and full recovery scenarios.
@@ -0,0 +1,31 @@
+# Resource Limit Tests Template
+
+Save as `DOCUMENT_DIR/tests/resource-limit-tests.md`.
+
+---
+
+```markdown
+# Resource Limit Tests
+
+### NFT-RES-LIM-01: [Test Name]
+
+**Summary**: [What resource constraint this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Preconditions**:
+- [System running under specified constraints]
+
+**Monitoring**:
+- [What resources to monitor — memory, CPU, GPU, disk, temperature]
+
+**Duration**: [how long to run]
+**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
+```
+
+---
+
+## Guidance Notes
+
+- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
+- Define specific numeric limits that can be programmatically checked.
+- Include both the monitoring method and the threshold in the pass criteria.
@@ -0,0 +1,30 @@
+# Security Tests Template
+
+Save as `DOCUMENT_DIR/tests/security-tests.md`.
+
+---
+
+```markdown
+# Security Tests
+
+### NFT-SEC-01: [Test Name]
+
+**Summary**: [What security property this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
+
+**Pass criteria**: [specific security outcome]
+```
+
+---
+
+## Guidance Notes
+
+- Security tests at blackbox level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
+- Verify the system remains operational after security-related edge cases (no crash, no hang).
+- Test authentication/authorization boundaries from the consumer's perspective.
@@ -1,11 +1,11 @@
-# E2E Test Data Template
+# Test Data Template

-Save as `DOCUMENT_DIR/integration_tests/test_data.md`.
+Save as `DOCUMENT_DIR/tests/test-data.md`.

 ---

 ```markdown
-# E2E Test Data Management
+# Test Data Management

 ## Seed Data Sets

@@ -23,6 +23,12 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`.
 |-----------------|----------------|-------------|-----------------|
 | [filename] | `_docs/00_problem/input_data/[filename]` | [what it contains] | [test IDs that use this data] |

+## Expected Results Mapping
+
+| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
+|-----------------|------------|-----------------|-------------------|-----------|----------------------|
+| [test ID] | `input_data/[filename]` | [quantifiable expected output] | [exact / tolerance / pattern / threshold / file-diff] | [± value or N/A] | `input_data/expected_results/[filename]` or inline |
+
 ## External Dependency Mocks

 | External Service | Mock/Stub | How Provided | Behavior |
@@ -42,5 +48,8 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`.

 - Every seed data set should be traceable to specific test scenarios.
 - Input data from `_docs/00_problem/input_data/` should be mapped to test scenarios that use it.
+- Every input data item MUST have a corresponding expected result in the Expected Results Mapping table.
+- Expected results MUST be quantifiable: exact values, numeric tolerances, pattern matches, thresholds, or reference files. "Works correctly" is never acceptable.
+- For complex expected outputs, provide machine-readable reference files (JSON, CSV) in `_docs/00_problem/input_data/expected_results/` and reference them in the mapping.
 - External mocks must be deterministic — same input always produces same output.
 - Data isolation must guarantee no test can affect another test's outcome.
@@ -1,16 +1,16 @@
-# E2E Test Environment Template
+# Test Environment Template

-Save as `DOCUMENT_DIR/integration_tests/environment.md`.
+Save as `DOCUMENT_DIR/tests/environment.md`.

 ---

 ```markdown
-# E2E Test Environment
+# Test Environment

 ## Overview

 **System under test**: [main system name and entry points — API URLs, message queues, serial ports, etc.]
-**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals.
+**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating black-box use cases without access to internals.

 ## Docker Environment

@@ -17,7 +17,7 @@ Use this template for each component's test spec. Save as `components/[##]_[name

 ---

-## Integration Tests
+## Blackbox Tests

 ### IT-01: [Test Name]

@@ -169,4 +169,4 @@ Use this template for each component's test spec. Save as `components/[##]_[name
 - If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2").
 - Performance test targets should come from the NFR section in `architecture.md`.
 - Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component.
- Not every component needs all 4 test types. A stateless utility component may only need integration tests.
+- Not every component needs all 4 test types. A stateless utility component may only need blackbox tests.
@@ -1,11 +1,11 @@
-# E2E Traceability Matrix Template
+# Traceability Matrix Template

-Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.
+Save as `DOCUMENT_DIR/tests/traceability-matrix.md`.

 ---

 ```markdown
-# E2E Traceability Matrix
+# Traceability Matrix

 ## Acceptance Criteria Coverage

@@ -34,7 +34,7 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.

 | Item | Reason Not Covered | Risk | Mitigation |
 |------|-------------------|------|-----------|
-| [AC/Restriction ID] | [why it cannot be tested at E2E level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
+| [AC/Restriction ID] | [why it cannot be tested at blackbox level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
 ```

 ---
@@ -44,4 +44,4 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.
 - Every acceptance criterion must appear in the matrix — either covered or explicitly marked as not covered with a reason.
 - Every restriction must appear in the matrix.
 - NOT COVERED items must have a reason and a mitigation strategy (e.g., "covered at component test level" or "requires real hardware").
- Coverage percentage should be at least 75% for acceptance criteria at the E2E level.
+- Coverage percentage should be at least 75% for acceptance criteria at the blackbox test level.
@@ -155,7 +155,7 @@ Store in PROBLEM_DIR.

 | Metric Category | What to Capture |
 |----------------|-----------------|
-| **Coverage** | Overall, unit, integration, critical paths |
+| **Coverage** | Overall, unit, blackbox, critical paths |
 | **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
 | **Code Smells** | Total, critical, major |
 | **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
@@ -279,11 +279,11 @@ Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
 Coverage requirements (must meet before refactoring):
 - Minimum overall coverage: 75%
 - Critical path coverage: 90%
- All public APIs must have integration tests
+- All public APIs must have blackbox tests
 - All error handling paths must be tested

 For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
- Integration tests: summary, current behavior, input data, expected result, max expected time
+- Blackbox tests: summary, current behavior, input data, expected result, max expected time
 - Acceptance tests: summary, preconditions, steps with expected results
 - Coverage analysis: current %, target %, uncovered critical paths

@@ -297,7 +297,7 @@ For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_
 **Self-verification**:
 - [ ] Coverage requirements met (75% overall, 90% critical paths)
 - [ ] All tests pass on current codebase
- [ ] All public APIs have integration tests
+- [ ] All public APIs have blackbox tests
 - [ ] Test data fixtures are configured

 **Save action**: Write test specs; implemented tests go into the project's test folder
@@ -332,7 +332,7 @@ Write `REFACTOR_DIR/coupling_analysis.md`:
 For each change in the decoupling strategy:

 1. Implement the change
-2. Run integration tests
+2. Run blackbox tests
 3. Fix any failures
 4. Commit with descriptive message

@@ -0,0 +1,411 @@
+---
+name: test-spec
+description: |
+  Test specification skill. Analyzes input data and expected results completeness,
+  then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits)
+  that treat the system as a black box. Every test pairs input data with quantifiable expected results
+  so tests can verify correctness, not just execution.
+  3-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate.
+  Produces 8 artifacts under tests/.
+  Trigger phrases:
+  - "test spec", "test specification", "test scenarios"
+  - "blackbox test spec", "black box tests", "blackbox tests"
+  - "performance tests", "resilience tests", "security tests"
+category: build
+tags: [testing, black-box, blackbox-tests, test-specification, qa]
+disable-model-invocation: true
+---
+
+# Test Scenario Specification
+
+Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
+
+## Core Principles
+
+- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
+- **Traceability**: every test traces to at least one acceptance criterion or restriction
+- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
+- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
+- **Spec, don't code**: this workflow produces test specifications, never test implementation code
+- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed
+- **No test without expected result**: every test scenario MUST pair input data with a quantifiable expected result; a test that cannot compare actual output against a known-correct answer is not verifiable and must be removed
+
+## Context Resolution
+
+Fixed paths — no mode detection needed:
+
+- PROBLEM_DIR: `_docs/00_problem/`
+- SOLUTION_DIR: `_docs/01_solution/`
+- DOCUMENT_DIR: `_docs/02_document/`
+- TESTS_OUTPUT_DIR: `_docs/02_document/tests/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+| File | Purpose |
+|------|---------|
+| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations |
+| `_docs/00_problem/input_data/` | Reference data examples, expected results, and optional reference files |
+| `_docs/01_solution/solution.md` | Finalized solution |
+
+### Expected Results Specification
+
+Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be **quantifiable** — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict.
+
+Expected results live inside `_docs/00_problem/input_data/` in one or both of:
+
+1. **Mapping file** (`input_data/expected_results.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md`
+
+2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file
+
+```
+input_data/
+├── expected_results.md          ← required: input→expected result mapping
+├── expected_results/            ← optional: complex reference files
+│   ├── image_01_detections.json
+│   └── batch_A_results.json
+├── image_01.jpg
+├── empty_scene.jpg
+└── data_parameters.md
+```
+
+**Quantifiability requirements** (see template for full format and examples):
+- Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`)
+- Structured data: exact JSON/CSV values, or a reference file in `expected_results/`
+- Counts: exact counts (e.g., "3 detections", "0 errors")
+- Text/patterns: exact string or regex pattern to match
+- Timing: threshold (e.g., "response ≤ 500ms")
+- Error cases: expected error code, message pattern, or HTTP status
+
+### Optional Files (used when available)
+
+| File | Purpose |
+|------|---------|
+| `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
+| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
+| `DOCUMENT_DIR/components/` | Component specs for interface identification |
+
+### Prerequisite Checks (BLOCKING)
+
+1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
+2. `restrictions.md` exists and is non-empty — **STOP if missing**
+3. `input_data/` exists and contains at least one file — **STOP if missing**
+4. `input_data/expected_results.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."*
+5. `problem.md` exists and is non-empty — **STOP if missing**
+6. `solution.md` exists and is non-empty — **STOP if missing**
+7. Create TESTS_OUTPUT_DIR if it does not exist
+8. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**
+
+## Artifact Management
+
+### Directory Structure
+
+```
+TESTS_OUTPUT_DIR/
+├── environment.md
+├── test-data.md
+├── blackbox-tests.md
+├── performance-tests.md
+├── resilience-tests.md
+├── security-tests.md
+├── resource-limit-tests.md
+└── traceability-matrix.md
+```
+
+### Save Timing
+
+| Phase | Save immediately after | Filename |
+|-------|------------------------|----------|
+| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
+| Phase 2 | Environment spec | `environment.md` |
+| Phase 2 | Test data spec | `test-data.md` |
+| Phase 2 | Blackbox tests | `blackbox-tests.md` |
+| Phase 2 | Performance tests | `performance-tests.md` |
+| Phase 2 | Resilience tests | `resilience-tests.md` |
+| Phase 2 | Security tests | `security-tests.md` |
+| Phase 2 | Resource limit tests | `resource-limit-tests.md` |
+| Phase 2 | Traceability matrix | `traceability-matrix.md` |
+| Phase 3 | Updated test data spec (if data added) | `test-data.md` |
+| Phase 3 | Updated test files (if tests removed) | respective test file |
+| Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
+
+### Resumability
+
+If TESTS_OUTPUT_DIR already contains files:
+
+1. List existing files and match them to the save timing table above
+2. Identify which phase/artifacts are complete
+3. Resume from the next incomplete artifact
+4. Inform the user which artifacts are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes.
+
+## Workflow
+
+### Phase 1: Input Data Completeness Analysis
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
+**Constraints**: Analysis only — no test specs yet
+
+1. Read `_docs/01_solution/solution.md`
+2. Read `acceptance_criteria.md`, `restrictions.md`
+3. Read testing strategy from solution.md (if present)
+4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
+5. Read `input_data/expected_results.md` and any referenced files in `input_data/expected_results/`
+6. Analyze `input_data/` contents against:
+   - Coverage of acceptance criteria scenarios
+   - Coverage of restriction edge cases
+   - Coverage of testing strategy requirements
+7. Analyze `input_data/expected_results.md` completeness:
+   - Every input data item has a corresponding expected result row in the mapping
+   - Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
+   - Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
+   - Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
+8. Present input-to-expected-result pairing assessment:
+
+| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
+|------------|--------------------------|---------------|----------------|
+| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
+
+9. Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result
+10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results.md`
+11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
+
+**BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient.
+
+---
+
+### Phase 2: Test Scenario Specification
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios
+**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
+
+Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
+
+1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
+2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
+3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
+4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
+5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
+6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
+7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
+8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
+
+**Self-verification**:
+- [ ] Every acceptance criterion is covered by at least one test scenario
+- [ ] Every restriction is verified by at least one test scenario
+- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results.md`
+- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
+- [ ] Positive and negative scenarios are balanced
+- [ ] Consumer app has no direct access to system internals
+- [ ] Docker environment is self-contained (`docker compose up` sufficient)
+- [ ] External dependencies have mock/stub services defined
+- [ ] Traceability matrix has no uncovered AC or restrictions
+
+**Save action**: Write all files under TESTS_OUTPUT_DIR:
+- `environment.md`
+- `test-data.md`
+- `blackbox-tests.md`
+- `performance-tests.md`
+- `resilience-tests.md`
+- `security-tests.md`
+- `resource-limit-tests.md`
+- `traceability-matrix.md`
+
+**BLOCKING**: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed.
+
+Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
+
+---
+
+### Phase 3: Test Data Validation Gate (HARD GATE)
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%.
+**Constraints**: This phase is MANDATORY and cannot be skipped.
+
+#### Step 1 — Build the test-data and expected-result requirements checklist
+
+Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, extract:
+
+| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
+|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
+| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
+
+Present this table to the user.
+
+#### Step 2 — Ask user to provide missing test data AND expected results
+
+For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
+
+> **Option A — Provide the missing items**: Supply what is missing:
+> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
+> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
+>
+> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
+> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
+> - "HTTP 200 with JSON body matching `expected_response_01.json`"
+> - "Processing time < 500ms"
+> - "0 false positives in the output set"
+>
+> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
+
+**BLOCKING**: Wait for the user's response for every missing item.
+
+#### Step 3 — Validate provided data and expected results
+
+For each item where the user chose **Option A**:
+
+**Input data validation**:
+1. Verify the data file(s) exist at the indicated location
+2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
+3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
+
+**Expected result validation**:
+4. Verify the expected result exists in `input_data/expected_results.md` or as a referenced file in `input_data/expected_results/`
+5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
+   - Exact values (counts, strings, status codes)
+   - Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
+   - Pattern matches (regex, substring, JSON schema)
+   - Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
+   - Reference file for structural comparison (JSON diff, CSV diff)
+6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
+7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
+
+If any validation fails, report the specific issue and loop back to Step 2 for that item.
+
+#### Step 4 — Remove tests without data or expected results
+
+For each item where the user chose **Option B**:
+
+1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
+2. Remove the test scenario from the respective test file
+3. Remove corresponding rows from `traceability-matrix.md`
+4. Update `test-data.md` to reflect the removal
+
+**Save action**: Write updated files under TESTS_OUTPUT_DIR:
+- `test-data.md`
+- Affected test files (if tests removed)
+- `traceability-matrix.md` (if tests removed)
+
+#### Step 5 — Final coverage check
+
+After all removals, recalculate coverage:
+
+1. Count remaining test scenarios that trace to acceptance criteria
+2. Count total acceptance criteria + restrictions
+3. Calculate coverage percentage: `covered_items / total_items * 100`
+
+| Metric | Value |
+|--------|-------|
+| Total AC + Restrictions | ? |
+| Covered by remaining tests | ? |
+| **Coverage %** | **?%** |
+
+**Decision**:
+
+- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user.
+- **Coverage < 70%** → Phase 3 **FAILED**. Report:
+  > ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
+  >
+  > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
+  > |---|---|---|
+  >
+  > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
+
+  **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%.
+
+#### Phase 3 Completion
+
+When coverage ≥ 70% and all remaining tests have validated data AND quantifiable expected results:
+
+1. Present the final coverage report
+2. List all removed tests (if any) with reasons
+3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
+4. Confirm all artifacts are saved and consistent
+
+---
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
+| Missing input_data/expected_results.md | **STOP** — ask user to provide expected results mapping using the template |
+| Ambiguous requirements | ASK user |
+| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
+| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
+| Test scenario conflicts with restrictions | ASK user to clarify intent |
+| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
+| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
+| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
+
+## Common Mistakes
+
+- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
+- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
+- **Missing expected results**: input data without a paired expected result is useless — the test cannot determine pass/fail without knowing what "correct" looks like
+- **Non-quantifiable expected results**: "should return good results" is not verifiable; expected results must have exact values, tolerances, thresholds, or pattern matches that code can evaluate
+- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
+- **Untraceable tests**: every test should trace to at least one AC or restriction
+- **Writing test code**: this skill produces specifications, never implementation code
+- **Tests without data**: every test scenario MUST have concrete test data AND a quantifiable expected result; a test spec without either is not executable and must be removed
+
+## Trigger Conditions
+
+When the user wants to:
+- Specify blackbox tests before implementation or refactoring
+- Analyze input data completeness for test coverage
+- Produce test scenarios from acceptance criteria
+
+**Keywords**: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios"
+
+## Methodology Quick Reference
+
+```
+┌──────────────────────────────────────────────────────────────────────┐
+│              Test Scenario Specification (3-Phase)                    │
+├──────────────────────────────────────────────────────────────────────┤
+│ PREREQ: Data Gate (BLOCKING)                                         │
+│   → verify AC, restrictions, input_data (incl. expected_results.md)  │
+│                                                                      │
+│ Phase 1: Input Data & Expected Results Completeness Analysis         │
+│   → assess input_data/ coverage vs AC scenarios (≥70%)               │
+│   → verify every input has a quantifiable expected result            │
+│   → present input→expected-result pairing assessment                 │
+│   [BLOCKING: user confirms input data + expected results coverage]   │
+│                                                                      │
+│ Phase 2: Test Scenario Specification                                 │
+│   → environment.md                                                   │
+│   → test-data.md (with expected results mapping)                     │
+│   → blackbox-tests.md (positive + negative)                          │
+│   → performance-tests.md                                             │
+│   → resilience-tests.md                                              │
+│   → security-tests.md                                                │
+│   → resource-limit-tests.md                                          │
+│   → traceability-matrix.md                                           │
+│   [BLOCKING: user confirms test coverage]                            │
+│                                                                      │
+│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)    │
+│   → build test-data + expected-result requirements checklist         │
+│   → ask user: provide data+result (A) or remove test (B)            │
+│   → validate input data (quality + quantity)                         │
+│   → validate expected results (quantifiable + comparison method)     │
+│   → remove tests without data or expected result, warn user          │
+│   → final coverage check (≥70% or FAIL + loop back)                 │
+│   [BLOCKING: coverage ≥ 70% required to pass]                       │
+├──────────────────────────────────────────────────────────────────────┤
+│ Principles: Black-box only · Traceability · Save immediately         │
+│             Ask don't assume · Spec don't code                       │
+│             No test without data · No test without expected result    │
+└──────────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,135 @@
+# Expected Results Template
+
+Save as `_docs/00_problem/input_data/expected_results.md`.
+For complex expected outputs, create `_docs/00_problem/input_data/expected_results/` and place reference files there.
+Referenced by the test-spec skill (`.cursor/skills/test-spec/SKILL.md`).
+
+---
+
+```markdown
+# Expected Results
+
+Maps every input data item to its quantifiable expected result.
+Tests use this mapping to compare actual system output against known-correct answers.
+
+## Result Format Legend
+
+| Result Type | When to Use | Example |
+|-------------|-------------|---------|
+| Exact value | Output must match precisely | `status_code: 200`, `detection_count: 3` |
+| Tolerance range | Numeric output with acceptable variance | `confidence: 0.92 ± 0.05`, `bbox_x: 120 ± 10px` |
+| Threshold | Output must exceed or stay below a limit | `latency < 500ms`, `confidence ≥ 0.85` |
+| Pattern match | Output must match a string/regex pattern | `error_message contains "invalid format"` |
+| File reference | Complex output compared against a reference file | `match expected_results/case_01.json` |
+| Schema match | Output structure must conform to a schema | `response matches DetectionResultSchema` |
+| Set/count | Output must contain specific items or counts | `classes ⊇ {"car", "person"}`, `detections.length == 5` |
+
+## Comparison Methods
+
+| Method | Description | Tolerance Syntax |
+|--------|-------------|-----------------|
+| `exact` | Actual == Expected | N/A |
+| `numeric_tolerance` | abs(actual - expected) ≤ tolerance | `± <value>` or `± <percent>%` |
+| `range` | min ≤ actual ≤ max | `[min, max]` |
+| `threshold_min` | actual ≥ threshold | `≥ <value>` |
+| `threshold_max` | actual ≤ threshold | `≤ <value>` |
+| `regex` | actual matches regex pattern | regex string |
+| `substring` | actual contains substring | substring |
+| `json_diff` | structural comparison against reference JSON | diff tolerance per field |
+| `set_contains` | actual output set contains expected items | subset notation |
+| `file_reference` | compare against reference file in expected_results/ | file path |
+
+## Input → Expected Result Mapping
+
+### [Scenario Group Name, e.g. "Single Image Detection"]
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `[file or parameters]` | [what this input represents] | [quantifiable expected output] | [method from table above] | [± value, range, or N/A] | [path in expected_results/ or N/A] |
+
+#### Example — Object Detection
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `image_01.jpg` | Aerial photo, 3 vehicles visible | `detection_count: 3`, classes: `["ArmorVehicle", "ArmorVehicle", "Truck"]` | exact (count), set_contains (classes) | N/A | N/A |
+| 2 | `image_01.jpg` | Same image, bbox positions | bboxes: `[(120,80,340,290), (400,150,580,310), (50,400,200,520)]` | numeric_tolerance | ± 15px per coordinate | `expected_results/image_01_detections.json` |
+| 3 | `image_01.jpg` | Same image, confidence scores | confidences: `[0.94, 0.88, 0.91]` | threshold_min | each ≥ 0.85 | N/A |
+| 4 | `empty_scene.jpg` | Aerial photo, no objects | `detection_count: 0`, empty detections array | exact | N/A | N/A |
+| 5 | `corrupted.dat` | Invalid file format | HTTP 400, body contains `"error"` key | exact (status), substring (body) | N/A | N/A |
+
+#### Example — Performance
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `standard_image.jpg` | 1920x1080 single image | Response time | threshold_max | ≤ 2000ms | N/A |
+| 2 | `large_image.jpg` | 8000x6000 tiled image | Response time | threshold_max | ≤ 10000ms | N/A |
+
+#### Example — Error Handling
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `POST /detect` with no file | Missing required input | HTTP 422, message matches `"file.*required"` | exact (status), regex (message) | N/A | N/A |
+| 2 | `POST /detect` with `probability_threshold: 5.0` | Out-of-range config | HTTP 422 or clamped to valid range | exact (status) or range [0.0, 1.0] | N/A | N/A |
+
+## Expected Result Reference Files
+
+When the expected output is too complex for an inline table cell (e.g., full JSON response with nested objects), place a reference file in `_docs/00_problem/input_data/expected_results/`.
+
+### File Naming Convention
+
+`<input_name>_expected.<format>`
+
+Examples:
+- `image_01_detections.json`
+- `batch_A_results.csv`
+- `video_01_annotations.json`
+
+### Reference File Requirements
+
+- Must be machine-readable (JSON, CSV, YAML — not prose)
+- Must contain only the expected output structure and values
+- Must include tolerance annotations where applicable (as metadata fields or comments)
+- Must be valid and parseable by standard libraries
+
+### Reference File Example (JSON)
+
+File: `expected_results/image_01_detections.json`
+
+```json
+{
+  "input": "image_01.jpg",
+  "expected": {
+    "detection_count": 3,
+    "detections": [
+      {
+        "class": "ArmorVehicle",
+        "confidence": { "min": 0.85 },
+        "bbox": { "x1": 120, "y1": 80, "x2": 340, "y2": 290, "tolerance_px": 15 }
+      },
+      {
+        "class": "ArmorVehicle",
+        "confidence": { "min": 0.85 },
+        "bbox": { "x1": 400, "y1": 150, "x2": 580, "y2": 310, "tolerance_px": 15 }
+      },
+      {
+        "class": "Truck",
+        "confidence": { "min": 0.85 },
+        "bbox": { "x1": 50, "y1": 400, "x2": 200, "y2": 520, "tolerance_px": 15 }
+      }
+    ]
+  }
+}
+```
+```
+
+---
+
+## Guidance Notes
+
+- Every row in the mapping table must have at least one quantifiable comparison — no row should say only "should work" or "returns result".
+- Use `exact` comparison for counts, status codes, and discrete values.
+- Use `numeric_tolerance` for floating-point values and spatial coordinates where minor variance is expected.
+- Use `threshold_min`/`threshold_max` for performance metrics and confidence scores.
+- Use `file_reference` when the expected output has more than ~3 fields or nested structures.
+- Reference files must be committed alongside input data — they are part of the test specification.
+- When the system has non-deterministic behavior (e.g., model inference variance across hardware), document the expected tolerance explicitly and justify it.
@@ -0,0 +1,254 @@
+---
+name: ui-design
+description: |
+  End-to-end UI design workflow: requirements gathering → design system synthesis → HTML+CSS mockup generation → visual verification → iterative refinement.
+  Zero external dependencies. Optional MCP enhancements (RenderLens, AccessLint).
+  Two modes:
+  - Full workflow: phases 0-8 for complex design tasks
+  - Quick mode: skip to code generation for simple requests
+  Command entry points:
+  - /design-audit — quality checks on existing mockup
+  - /design-polish — final refinement pass
+  - /design-critique — UX review with feedback
+  - /design-regen — regenerate with different direction
+  Trigger phrases:
+  - "design a UI", "create a mockup", "build a page"
+  - "make a landing page", "design a dashboard"
+  - "mockup", "design system", "UI design"
+category: create
+tags: [ui-design, mockup, html, css, tailwind, design-system, accessibility]
+disable-model-invocation: true
+---
+
+# UI Design Skill
+
+End-to-end UI design workflow producing production-quality HTML+CSS mockups entirely within Cursor, with zero external tool dependencies.
+
+## Core Principles
+
+- **Design intent over defaults**: never settle for generic AI output; every visual choice must trace to user requirements
+- **Verify visually**: AI must see what it generates whenever possible (browser screenshots)
+- **Tokens over hardcoded values**: use CSS custom properties with semantic naming, not raw hex
+- **Restraint over decoration**: less is more; every visual element must earn its place
+- **Ask, don't assume**: when design direction is ambiguous, STOP and ask the user
+- **One screen at a time**: generate individual screens, not entire applications at once
+
+## Context Resolution
+
+Determine the operating mode based on invocation before any other logic runs.
+
+**Project mode** (default — `_docs/` structure exists):
+- MOCKUPS_DIR: `_docs/02_document/ui_mockups/`
+
+**Standalone mode** (explicit input file provided, e.g. `/ui-design @some_brief.md`):
+- INPUT_FILE: the provided file (treated as design brief)
+- MOCKUPS_DIR: `_standalone/ui_mockups/`
+
+Create MOCKUPS_DIR if it does not exist. Announce the detected mode and resolved path to the user.
+
+## Output Directory
+
+All generated artifacts go to `MOCKUPS_DIR`:
+
+```
+MOCKUPS_DIR/
+├── DESIGN.md              # Generated design system (three-layer tokens)
+├── index.html             # Main mockup (or named per page)
+└── [page-name].html       # Additional pages if multi-page
+```
+
+## Complexity Detection (Phase 0)
+
+Before starting the workflow, classify the request:
+
+**Quick mode** — skip to Phase 5 (Code Generation):
+- Request is a single component or screen
+- User provides enough style context in their message
+- `MOCKUPS_DIR/DESIGN.md` already exists
+- Signals: "just make a...", "quick mockup of...", single component name, less than 2 sentences
+
+**Full mode** — run phases 1-8:
+- Multi-page request
+- Brand-specific requirements
+- "design system for...", complex layouts, dashboard/admin panel
+- No existing DESIGN.md
+
+Announce the detected mode to the user.
+
+## Phase 1: Context Check
+
+1. Check for existing project documentation: PRD, design specs, README with design notes
+2. Check for existing `MOCKUPS_DIR/DESIGN.md`
+3. Check for existing mockups in `MOCKUPS_DIR/`
+4. If DESIGN.md exists → announce "Using existing design system" → skip to Phase 5
+5. If project docs with design info exist → extract requirements from them, skip to Phase 3
+
+## Phase 2: Requirements Gathering
+
+Use the AskQuestion tool for structured input. Adapt based on what Phase 1 found — only ask for what's missing.
+
+**Round 1 — Structural:**
+
+Ask using AskQuestion with these questions:
+- **Page type**: landing, dashboard, form, settings, profile, admin panel, e-commerce, blog, documentation, other
+- **Target audience**: developers, business users, consumers, internal team, general public
+- **Platform**: web desktop-first, web mobile-first
+- **Key sections**: header, hero, sidebar, main content, cards grid, data table, form, footer (allow multiple)
+
+**Round 2 — Design Intent:**
+
+Ask using AskQuestion with these questions:
+- **Visual atmosphere**: Airy & spacious / Dense & data-rich / Warm & approachable / Sharp & technical / Luxurious & premium
+- **Color mood**: Cool blues & grays / Warm earth tones / Bold & vibrant / Monochrome / Dark mode / Let AI choose based on atmosphere / Custom (specify brand colors)
+- **Typography mood**: Geometric (modern, clean) / Humanist (friendly, readable) / Monospace (technical, code-like) / Serif (editorial, premium)
+
+Then ask in free-form:
+- "Name an app or website whose look you admire" (optional, helps anchor style)
+- "Any specific content, copy, or data to include?"
+
+## Phase 3: Direction Exploration
+
+Generate 2-3 text-based direction summaries. Each direction is 3-5 sentences describing:
+- Visual approach and mood
+- Color palette direction (specific hues, not just "blue")
+- Layout strategy (grid type, density, whitespace approach)
+- Typography choice (specific font suggestions, not just "sans-serif")
+
+Present to user: "Here are 2-3 possible directions. Which resonates? Or describe a blend."
+
+Wait for user to pick before proceeding.
+
+## Phase 4: Design System Synthesis
+
+Generate `MOCKUPS_DIR/DESIGN.md` using the template from `templates/design-system.md`.
+
+The generated DESIGN.md must include all 6 sections:
+1. Visual Atmosphere — descriptive mood (never "clean and modern")
+2. Color System — three-layer CSS custom properties (primitives → semantic → component)
+3. Typography — specific font family, weight hierarchy, size scale with rem values
+4. Spacing & Layout — base unit, spacing scale, grid, breakpoints
+5. Component Styling Defaults — buttons, cards, inputs, navigation with all states
+6. Interaction States — loading, error, empty, hover, focus, disabled patterns
+
+Read `references/design-vocabulary.md` for atmosphere descriptors and style vocabulary to use when writing the DESIGN.md.
+
+## Phase 5: Code Generation
+
+Construct the generation by combining context from multiple sources:
+
+1. Read `MOCKUPS_DIR/DESIGN.md` for the design system
+2. Read `references/components.md` for component best practices relevant to the page type
+3. Read `references/anti-patterns.md` for explicit avoidance instructions
+
+Generate `MOCKUPS_DIR/[page-name].html` as a single file with:
+- `<script src="https://cdn.tailwindcss.com"></script>` for Tailwind
+- `<style>` block with all CSS custom properties from DESIGN.md
+- Tailwind config override in `<script>` to map tokens to Tailwind theme
+- Semantic HTML (nav, main, section, article, footer)
+- Mobile-first responsive design
+- All interactive elements with hover, focus, active states
+- At least one loading skeleton example
+- Proper heading hierarchy (single h1)
+
+**Anti-AI-Slop guard clauses** (MANDATORY — read `references/anti-patterns.md` for full list):
+- Do NOT use Inter or Roboto unless user explicitly requested them
+- Do NOT default to purple/indigo accent color
+- Do NOT create "card soup" — vary layout patterns
+- Do NOT make all buttons equal weight
+- Do NOT over-decorate
+- Use the actual tokens from DESIGN.md, not hardcoded values
+
+For quick mode without DESIGN.md: use a sensible default design system matching the request context. Still follow all anti-slop rules.
+
+## Phase 6: Visual Verification
+
+Tiered verification — use the best available tool:
+
+**Layer 1 — Structural Check** (always runs):
+Read `references/quality-checklist.md` and verify against the structural checklist.
+
+**Layer 2 — Visual Check** (when browser tool is available):
+1. Open the generated HTML file using the browser tool
+2. Take screenshots at desktop (1440px) width
+3. Examine the screenshot for: spacing consistency, alignment, color rendering, typography hierarchy, overall visual balance
+4. Compare against DESIGN.md's intended atmosphere
+5. Flag issues: cramped areas, orphan text, broken layouts, invisible elements
+
+**Layer 3 — Compliance Check** (when MCP tools are available):
+- If AccessLint MCP is configured: audit HTML for WCAG violations, auto-fix flagged issues
+- If RenderLens MCP is configured: render + audit (Lighthouse + WCAG scores) + diff
+
+Auto-fix any issues found. Re-verify after fixes.
+
+## Phase 7: User Review
+
+1. Open mockup in browser for the user:
+   - Primary: use Cursor browser tool (AI can see and discuss the same view)
+   - Fallback: use OS-appropriate command (`open` on macOS, `xdg-open` on Linux, `start` on Windows)
+2. Present assessment summary: structural check results, visual observations, compliance scores if available
+3. Ask: "How does this look? What would you like me to change?"
+
+## Phase 8: Iteration
+
+1. Parse user feedback into specific changes
+2. Apply targeted edits via StrReplace (not full regeneration unless user requests a fundamentally different direction)
+3. Re-run visual verification (Phase 6)
+4. Present changes to user
+5. Repeat until user approves
+
+## Command Entry Points
+
+These commands bypass the full workflow for targeted operations on existing mockups:
+
+### /design-audit
+Run quality checks on an existing mockup in `MOCKUPS_DIR/`.
+1. Read the HTML file
+2. Run structural checklist from `references/quality-checklist.md`
+3. If browser tool available: take screenshot and visual check
+4. If AccessLint MCP available: WCAG audit
+5. Report findings with severity levels
+
+### /design-polish
+Final refinement pass on an existing mockup.
+1. Read the HTML file and DESIGN.md
+2. Check token usage (no hardcoded values that should be tokens)
+3. Verify all interaction states are present
+4. Refine spacing consistency, typography hierarchy
+5. Apply micro-improvements (subtle shadows, transitions, hover states)
+
+### /design-critique
+UX review with specific feedback.
+1. Read the HTML file
+2. Evaluate: information hierarchy, call-to-action clarity, cognitive load, navigation flow
+3. Check against anti-patterns from `references/anti-patterns.md`
+4. Provide a structured critique with specific improvement suggestions
+
+### /design-regen
+Regenerate mockup with a different design direction.
+1. Keep the existing page structure and content
+2. Ask user what direction to change (atmosphere, colors, layout, typography)
+3. Update DESIGN.md tokens accordingly
+4. Regenerate the HTML with the new design system
+
+## Optional MCP Enhancements
+
+When configured, these MCP servers enhance the workflow:
+
+| MCP Server | Phase | What It Adds |
+|------------|-------|-------------|
+| RenderLens | 6 | HTML→screenshot, Lighthouse audit, pixel-level diff |
+| AccessLint | 6 | WCAG violation detection + auto-fix (99.5% fix rate) |
+| Playwright | 6 | Screenshot at multiple viewports, visual regression |
+
+The skill works fully without any MCP servers. MCPs are enhancements, not requirements.
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Unclear design direction | **ASK user** — present direction options |
+| Conflicting requirements (e.g., "minimal but feature-rich") | **ASK user** which to prioritize |
+| User asks for a framework-specific output (React, Vue) | **WARN**: this skill generates HTML+CSS mockups; suggest adapting after approval |
+| Generated mockup looks wrong in visual verification | Auto-fix if possible; **ASK user** if the issue is subjective |
+| User requests multi-page site | Generate one page at a time; maintain DESIGN.md consistency across pages |
+| Accessibility audit fails | Auto-fix violations; **WARN user** about remaining manual-check items |
@@ -0,0 +1,69 @@
+# Anti-Patterns — AI Slop Prevention
+
+Read this file before generating any HTML/CSS. These are explicit instructions for what NOT to do.
+
+## Typography Anti-Patterns
+
+- **Do NOT default to Inter or Roboto.** These are the #1 signal of AI-generated UI. Choose a font that matches the atmosphere from `design-vocabulary.md`. Only use Inter/Roboto if the user explicitly requests them.
+- **Do NOT use the same font weight everywhere.** Establish a clear weight hierarchy: 600-700 for headings, 400 for body, 500 for UI elements.
+- **Do NOT set body text smaller than 14px (0.875rem).** Prefer 16px (1rem) for body.
+- **Do NOT skip heading levels.** Go h1 → h2 → h3, never h1 → h3.
+- **Do NOT use placeholder-only form fields.** Labels above inputs are mandatory; placeholders are hints only.
+
+## Color Anti-Patterns
+
+- **Do NOT default to purple or indigo accent colors.** Purple/indigo is the second-biggest AI-slop signal. Use the accent color from DESIGN.md tokens.
+- **Do NOT use more than 1 strong accent color** in the same view. Secondary accents should be muted or derived from the primary.
+- **Do NOT use gray text on colored backgrounds** without checking contrast. WCAG AA requires 4.5:1 for normal text, 3:1 for large text.
+- **Do NOT use rainbow color coding** for categories. Limit to 5-6 carefully chosen, distinguishable colors.
+- **Do NOT apply background gradients to text** (gradient text is fragile and often unreadable).
+
+## Layout Anti-Patterns
+
+- **Do NOT create "card soup"** — rows of identical cards with no visual break. Vary layout patterns: full-width sections, split layouts, featured items, asymmetric grids.
+- **Do NOT center everything.** Left-align body text. Center only headings, short captions, and CTAs.
+- **Do NOT use fixed pixel widths** for layout. Use relative units (%, fr, auto, minmax).
+- **Do NOT nest excessive containers.** Avoid "div soup" — use semantic elements (nav, main, section, article, aside, footer).
+- **Do NOT ignore mobile.** Design mobile-first; every component must work at 375px width.
+
+## Component Anti-Patterns
+
+- **Do NOT make all buttons equal weight.** Establish clear hierarchy: one primary (filled), secondary (outline), ghost (text-only) per visible area.
+- **Do NOT use spinners for content with known layout.** Use skeleton loaders that match the shape of the content.
+- **Do NOT put a modal inside a modal.** If you need nested interaction, use a slide-over or expand the current modal.
+- **Do NOT disable buttons without explanation.** Every disabled button needs a title attribute or adjacent text explaining why.
+- **Do NOT use "Click here" as link text.** Links should describe the destination: "View documentation", "Download report".
+- **Do NOT show hamburger menus on desktop.** Hamburgers are for mobile only; use full navigation on desktop.
+- **Do NOT use equal-weight buttons in a pair.** One must be visually primary, the other secondary.
+
+## Interaction Anti-Patterns
+
+- **Do NOT skip hover states on interactive elements.** Every clickable element needs a visible hover change.
+- **Do NOT skip focus states.** Keyboard users need visible focus indicators on every interactive element.
+- **Do NOT omit loading states.** If data loads asynchronously, show a skeleton or progress indicator.
+- **Do NOT omit empty states.** When a list or section has no data, show an illustration + explanation + action CTA.
+- **Do NOT omit error states.** Form validation errors need inline messages below the field with an icon.
+- **Do NOT use bare alert() for messages.** Use toast notifications or inline banners.
+
+## Decoration Anti-Patterns
+
+- **Do NOT over-decorate.** Restraint over decoration. Every visual element must earn its place.
+- **Do NOT apply shadows AND borders AND background fills simultaneously** on the same element. Pick one or two.
+- **Do NOT use generic stock-photo placeholder images.** Use SVG illustrations, solid color blocks with icons, or real content.
+- **Do NOT use decorative backgrounds** that reduce text readability.
+- **Do NOT animate everything.** Use motion sparingly and purposefully: transitions for state changes (200-300ms), not decorative animation.
+
+## Spacing Anti-Patterns
+
+- **Do NOT use inconsistent spacing.** Stick to the spacing scale from DESIGN.md (multiples of 4px or 8px base unit).
+- **Do NOT use zero padding inside containers.** Minimum 12-16px padding for any content container.
+- **Do NOT crowd elements.** When in doubt, add more whitespace, not less.
+- **Do NOT use different spacing systems** in different parts of the same page. One scale for the whole page.
+
+## Accessibility Anti-Patterns
+
+- **Do NOT rely on color alone** to convey information. Add icons, text, or patterns.
+- **Do NOT use thin font weights (100-300) for body text.** Minimum 400 for readability.
+- **Do NOT create custom controls** without proper ARIA attributes. Prefer native HTML elements.
+- **Do NOT trap keyboard focus** outside of modals. Only modals should have focus traps.
+- **Do NOT auto-play media** without user consent and a visible stop/mute control.
@@ -0,0 +1,307 @@
+# Component Reference
+
+Use this reference when generating UI mockups. Each component includes best practices, required states, and accessibility requirements.
+
+## Navigation
+
+### Top Navigation Bar
+- Fixed or sticky at top; z-index above content
+- Logo/brand left, primary nav center or right, actions (search, profile, CTA) far right
+- Active state: underline, background highlight, or bold — pick one, be consistent
+- Mobile: collapse to hamburger menu at `md` breakpoint; never show hamburger on desktop
+- Height: 56-72px; padding inline 16-24px
+- Aliases: navbar, header nav, app bar, top bar
+
+### Sidebar Navigation
+- Width: 240-280px expanded, 64-72px collapsed
+- Sections with labels; icons + text for each item
+- Active item: background fill + accent color text/icon
+- Collapse/expand toggle; responsive: overlay on mobile
+- Scroll independently from main content if taller than viewport
+- Aliases: side nav, drawer, rail
+
+### Breadcrumbs
+- Show hierarchy path; separator: `/` or `>`
+- Current page is plain text (not a link); parent pages are links
+- Truncate with ellipsis if more than 4-5 levels
+- Aliases: path indicator, navigation trail
+
+### Tabs
+- Use for switching between related content views within the same context
+- Active tab: border-bottom accent or filled background
+- Never nest tabs inside tabs
+- Scrollable when too many to fit; show scroll indicators
+- Aliases: tab bar, segmented control, view switcher
+
+### Pagination
+- Show current page, first, last, and 2-3 surrounding pages
+- Previous/Next buttons always visible; disabled at boundaries
+- Show total count when available: "Showing 1-20 of 342"
+- Aliases: pager, page navigation
+
+## Content Display
+
+### Card
+- Border-radius: 8-12px; subtle shadow or border (not both unless intentional)
+- Padding: 16-24px; consistent within the same card grid
+- Content order: image/visual → title → description → metadata → actions
+- Hover: subtle shadow lift or border-color change (not both)
+- Never stack more than 3 cards vertically without visual break
+- Aliases: tile, panel, content block
+
+### Data Table
+- Header row: sticky, slightly bolder background, sort indicators
+- Row hover: subtle background change
+- Striped rows optional; alternate between base and surface colors
+- Cell padding: 12-16px vertical, 16px horizontal
+- Truncate long text with ellipsis + tooltip on hover
+- Responsive: horizontal scroll with frozen first column, or stack to card layout on mobile
+- Include empty state when no data
+- Aliases: grid, spreadsheet, list view
+
+### List
+- Consistent item height or padding
+- Dividers between items: subtle border or spacing (not both)
+- Interactive lists: hover state on entire row
+- Leading element (icon/avatar) + content (title + subtitle) + trailing element (action/badge)
+- Aliases: item list, feed, timeline
+
+### Stat/Metric Card
+- Large number/value prominently displayed
+- Label above or below the value; comparison/trend indicator optional
+- Color-code trends: green up, red down, gray neutral
+- Aliases: KPI card, metric tile, stat block
+
+### Avatar
+- Circular; sizes: 24/32/40/48/64px
+- Fallback: initials on colored background when no image
+- Status indicator: small circle at bottom-right (green=online, gray=offline)
+- Group: overlap with z-index stacking; show "+N" for overflow
+- Aliases: profile picture, user icon
+
+### Badge/Tag
+- Small, pill-shaped or rounded-rectangle
+- Color indicates category or status; limit to 5-6 distinct colors
+- Text: short (1-3 words); truncate if longer
+- Removable variant: include x button
+- Aliases: chip, label, status indicator
+
+### Hero Section
+- Full-width; height 400-600px or viewport-relative
+- Strong headline (h1) + supporting text + primary CTA
+- Background: gradient, image with overlay, or solid color — not all three
+- Text must have sufficient contrast over any background
+- Aliases: banner, jumbotron, splash
+
+### Empty State
+- Illustration or icon (not a generic placeholder)
+- Explanatory text: what this area will contain
+- Primary action CTA: "Create your first...", "Add...", "Import..."
+- Never show just blank space
+- Aliases: zero state, no data, blank slate
+
+### Skeleton Loader
+- Match the shape and layout of the content being loaded
+- Animate with subtle pulse or shimmer (left-to-right gradient)
+- Show for predictable content; use progress bar for uploads/processes
+- Never use spinning loaders for content that has a known layout
+- Aliases: placeholder, loading state, content loader
+
+## Forms & Input
+
+### Text Input
+- Height: 40-48px; padding inline 12-16px
+- Label above the input (not placeholder-only); placeholder as hint only
+- States: default, hover, focus (accent ring), error (red border + message), disabled (reduced opacity)
+- Error message below the field with icon; don't use red placeholder
+- Aliases: text field, input box, form field
+
+### Textarea
+- Minimum height: 80-120px; resizable vertically
+- Character count when there's a limit
+- Same states as text input
+- Aliases: multiline input, text area, comment box
+
+### Select/Dropdown
+- Match text input height and styling
+- Chevron indicator on the right
+- Options list: max height with scroll; selected item checkmark
+- Search/filter for lists longer than 10 items
+- Aliases: combo box, picker, dropdown menu
+
+### Checkbox
+- Size: 16-20px; rounded corners (2-4px)
+- Label to the right; clickable area includes the label
+- States: unchecked, checked (accent fill + white check), indeterminate (dash), disabled
+- Group: vertical stack with 8-12px gap
+- Aliases: check box, toggle option, multi-select
+
+### Radio Button
+- Size: 16-20px; circular
+- Same interaction patterns as checkbox but single-select
+- Group: vertical stack; minimum 2 options
+- Aliases: radio, option button, single-select
+
+### Toggle/Switch
+- Width: 40-52px; height: 20-28px; thumb is circular
+- Off: gray track; On: accent color track
+- Label to the left or right; describe the "on" state
+- Never use for actions that require a submit; toggles are instant
+- Aliases: switch, on/off toggle
+
+### File Upload
+- Drop zone with dashed border; icon + "Drag & drop or click to upload"
+- Show file type restrictions and size limit
+- Progress indicator during upload
+- File list after upload: name, size, remove button
+- Aliases: file picker, upload area, attachment
+
+### Form Layout
+- Single column for most forms; two columns only for related short fields (first/last name, city/state)
+- Group related fields with section headings
+- Required field indicator: asterisk after label
+- Submit button: right-aligned or full-width; clearly primary
+- Inline validation: show errors on blur, not on every keystroke
+
+## Actions
+
+### Button
+- Primary: filled accent color, white text; one per visible area
+- Secondary: outline or subtle background; supports primary action
+- Ghost/tertiary: text-only with hover background
+- Sizes: sm (32px), md (40px), lg (48px); padding inline 16-24px
+- States: default, hover (darken/lighten 10%), active (darken 15%), focus (ring), disabled (opacity 0.5 + not-allowed cursor)
+- Disabled buttons must have a title attribute explaining why
+- Icon-only buttons: need aria-label; minimum 40px touch target
+- Aliases: action, CTA, submit
+
+### Icon Button
+- Circular or rounded-square; minimum 40px for touch targets
+- Tooltip on hover showing the action name
+- Visually lighter than text buttons
+- Aliases: toolbar button, action icon
+
+### Dropdown Menu
+- Trigger: button or icon button
+- Menu: elevated surface (shadow), rounded corners
+- Items: 36-44px height; icon + label; hover background
+- Dividers between groups; section labels for grouped items
+- Keyboard navigable: arrow keys, enter to select, escape to close
+- Aliases: context menu, action menu, overflow menu
+
+### Floating Action Button (FAB)
+- Circular, 56px; elevated with shadow
+- One per screen maximum; bottom-right placement
+- Primary creation action only
+- Extended variant: pill-shape with icon + label
+- Aliases: FAB, add button, create button
+
+## Feedback
+
+### Toast/Notification
+- Position: top-right or bottom-right; stack vertically
+- Auto-dismiss: 4-6 seconds for info; persist for errors until dismissed
+- Types: success (green), error (red), warning (amber), info (blue)
+- Content: icon + message + optional action link + close button
+- Maximum 3 visible at once; queue the rest
+- Aliases: snackbar, alert toast, flash message
+
+### Alert/Banner
+- Full-width within its container; not floating
+- Types: info, success, warning, error with corresponding colors
+- Icon left, message center, dismiss button right
+- Persistent until user dismisses or condition changes
+- Aliases: notice, inline alert, status banner
+
+### Modal/Dialog
+- Centered; overlay dims background (opacity 0.5 black)
+- Max width: 480-640px for standard, 800px for complex
+- Header (title + close button) + body + footer (actions)
+- Actions: right-aligned; primary right, secondary left
+- Close on overlay click and Escape key
+- Never put a modal inside a modal
+- Focus trap: tab cycles within modal while open
+- Aliases: popup, dialog box, lightbox
+
+### Tooltip
+- Appears on hover after 300-500ms delay; disappears on mouse leave
+- Position: above element by default; flip if near viewport edge
+- Max width: 200-280px; short text only
+- Arrow/caret pointing to trigger element
+- Aliases: hint, info popup, hover text
+
+### Progress Indicator
+- Linear bar: for known duration/percentage; show percentage text
+- Skeleton: for content loading with known layout
+- Spinner: only for indeterminate short waits (< 3 seconds) where layout is unknown
+- Step indicator: for multi-step flows; show completed/current/upcoming
+- Aliases: loading bar, progress bar, stepper
+
+## Layout
+
+### Page Shell
+- Max content width: 1200-1440px; centered with auto margins
+- Sidebar + main content pattern: sidebar fixed, main scrolls
+- Header/footer outside max-width for full-bleed effect
+- Consistent padding: 16px mobile, 24px tablet, 32px desktop
+
+### Grid
+- CSS Grid or Flexbox; 12-column system or auto-fit with minmax
+- Gap: 16-24px between items
+- Responsive: 1 column mobile, 2 columns tablet, 3-4 columns desktop
+- Never rely on fixed pixel widths; use fr units or percentages
+
+### Section Divider
+- Use spacing (48-96px margin) as primary divider; use lines sparingly
+- If using lines: subtle (1px, border color); full-width or indented
+- Alternate section backgrounds (base/surface) for clear separation without lines
+
+### Responsive Breakpoints
+- sm: 640px (large phone landscape)
+- md: 768px (tablet)
+- lg: 1024px (small laptop)
+- xl: 1280px (desktop)
+- Design mobile-first: base styles are mobile, layer up with breakpoints
+
+## Specialized
+
+### Pricing Table
+- 2-4 tiers side by side; highlight recommended tier
+- Feature comparison with checkmarks; group features by category
+- CTA button per tier; recommended tier has primary button, others secondary
+- Monthly/annual toggle if applicable
+- Aliases: pricing cards, plan comparison
+
+### Testimonial
+- Quote text (large, italic or with quotation marks)
+- Attribution: avatar + name + title/company
+- Layout: single featured or carousel/grid of multiple
+- Aliases: review, customer quote, social proof
+
+### Footer
+- Full-width; darker background than body
+- Column layout: links grouped by category; 3-5 columns
+- Bottom row: copyright, legal links, social icons
+- Responsive: columns stack on mobile
+- Aliases: site footer, bottom navigation
+
+### Search
+- Input with search icon; expand on focus or always visible
+- Results: dropdown with highlighted matching text
+- Recent searches and suggestions
+- Keyboard shortcut hint (Cmd+K / Ctrl+K)
+- Aliases: search bar, omnibar, search field
+
+### Date Picker
+- Input that opens a calendar dropdown
+- Navigate months with arrows; today highlighted
+- Range selection: two calendars side by side
+- Presets: "Today", "Last 7 days", "This month"
+- Aliases: calendar picker, date selector
+
+### Chart/Graph Placeholder
+- Container with appropriate aspect ratio (16:9 for line/bar, 1:1 for pie)
+- Include chart title, legend, and axis labels in the mockup
+- Use representative fake data; label as "Sample Data"
+- Tooltip placeholder on hover
+- Aliases: data visualization, graph, analytics chart
@@ -0,0 +1,139 @@
+# Design Vocabulary
+
+Use this reference when writing DESIGN.md files and constructing generation prompts. Replace vague descriptors with specific, actionable terms.
+
+## Atmosphere Descriptors
+
+Use these instead of "clean and modern":
+
+| Atmosphere | Characteristics | Font Direction | Color Direction | Spacing |
+|------------|----------------|---------------|-----------------|---------|
+| **Airy & Spacious** | Generous whitespace, light backgrounds, floating elements, subtle shadows | Thin/light weights, generous letter-spacing | Soft pastels, whites, muted accents | Large margins, open padding |
+| **Dense & Data-Rich** | Compact spacing, information-heavy, efficient use of space | Medium weights, tighter line-heights, smaller sizes | Neutral grays, high-contrast data colors | Tight but consistent padding |
+| **Warm & Approachable** | Rounded corners, friendly illustrations, organic shapes | Rounded/humanist typefaces, comfortable sizes | Earth tones, warm neutrals, amber/coral accents | Medium spacing, generous touch targets |
+| **Sharp & Technical** | Crisp edges, precise alignment, monospace elements, dark themes | Geometric or monospace, precise sizing | Cool grays, electric blues/greens, dark backgrounds | Grid-strict, mathematical spacing |
+| **Luxurious & Premium** | Generous space, refined details, serif accents, subtle animations | Serif or elegant sans-serif, generous sizing | Deep darks, gold/champagne accents, rich jewel tones | Expansive whitespace, dramatic padding |
+| **Playful & Creative** | Asymmetric layouts, bold colors, hand-drawn elements, motion | Display fonts, variable weights, expressive sizing | Bright saturated colors, unexpected combinations | Dynamic, deliberately uneven |
+| **Corporate & Enterprise** | Structured grids, predictable patterns, dense but organized | System fonts or conservative sans-serif | Brand blues/grays, accent for status indicators | Systematic, spec-driven |
+| **Editorial & Content** | Typography-forward, reading-focused, long-form layout | Serif for body text, sans for UI elements | Near-monochrome, sparse accent color | Generous line-height, wide columns |
+
+## Style-Specific Vocabulary
+
+### When user says... → Use these terms in DESIGN.md
+
+| Vague Input | Professional Translation |
+|-------------|------------------------|
+| "clean" | Restrained palette, generous whitespace, consistent alignment grid |
+| "modern" | Current design patterns (2024-2026), subtle depth, micro-interactions |
+| "minimal" | Single accent color, maximum negative space, typography-driven hierarchy |
+| "professional" | Structured grid, conservative palette, system fonts, clear navigation |
+| "fun" | Saturated palette, rounded elements, playful illustrations, motion |
+| "elegant" | Serif typography, muted palette, generous spacing, refined details |
+| "techy" | Dark theme, monospace accents, neon highlights, sharp corners |
+| "bold" | High contrast, large type, strong color blocks, dramatic layout |
+| "friendly" | Rounded corners (12-16px), humanist fonts, warm colors, illustrations |
+| "corporate" | Blue-gray palette, structured grid, conventional layout, data tables |
+
+## Color Mood Palettes
+
+### Cool Blues & Grays
+- Background: #f8fafc → #f1f5f9
+- Surface: #ffffff
+- Text: #0f172a → #475569
+- Accent: #2563eb (blue-600)
+- Pairs well with: Airy, Sharp, Corporate atmospheres
+
+### Warm Earth Tones
+- Background: #faf8f5 → #f5f0eb
+- Surface: #ffffff
+- Text: #292524 → #78716c
+- Accent: #c2410c (orange-700) or #b45309 (amber-700)
+- Pairs well with: Warm, Editorial atmospheres
+
+### Bold & Vibrant
+- Background: #fafafa → #f5f5f5
+- Surface: #ffffff
+- Text: #171717 → #525252
+- Accent: #dc2626 (red-600) or #7c3aed (violet-600) or #059669 (emerald-600)
+- Pairs well with: Playful, Creative atmospheres
+
+### Monochrome
+- Background: #fafafa → #f5f5f5
+- Surface: #ffffff
+- Text: #171717 → #737373
+- Accent: #171717 (black) with #e5e5e5 borders
+- Pairs well with: Minimal, Luxurious, Editorial atmospheres
+
+### Dark Mode
+- Background: #09090b → #18181b
+- Surface: #27272a → #3f3f46
+- Text: #fafafa → #a1a1aa
+- Accent: #3b82f6 (blue-500) or #22d3ee (cyan-400)
+- Pairs well with: Sharp, Technical, Dense atmospheres
+
+## Typography Mood Mapping
+
+### Geometric (Modern, Clean)
+Fonts: DM Sans, Plus Jakarta Sans, Outfit, General Sans, Satoshi
+- Characteristics: even stroke weight, circular letter forms, precise geometry
+- Best for: SaaS, tech products, dashboards, landing pages
+
+### Humanist (Friendly, Readable)
+Fonts: Source Sans 3, Nunito, Lato, Open Sans, Noto Sans
+- Characteristics: organic curves, varying stroke, warm feel
+- Best for: consumer apps, health/wellness, education, community platforms
+
+### Monospace (Technical, Code-Like)
+Fonts: JetBrains Mono, Fira Code, IBM Plex Mono, Space Mono
+- Characteristics: fixed-width, technical aesthetic, raw precision
+- Best for: developer tools, terminals, data displays, documentation
+
+### Serif (Editorial, Premium)
+Fonts: Playfair Display, Lora, Merriweather, Crimson Pro, Libre Baskerville
+- Characteristics: traditional elegance, reading comfort, authority
+- Best for: blogs, magazines, luxury brands, portfolio sites
+
+### Display (Expressive, Bold)
+Fonts: Cabinet Grotesk, Clash Display, Archivo Black, Space Grotesk
+- Characteristics: high impact, personality-driven, attention-grabbing
+- Best for: hero sections, headlines, creative portfolios, marketing pages
+- Use for headings only; pair with a readable body font
+
+## Shape & Depth Vocabulary
+
+### Border Radius Scale
+| Term | Value | Use for |
+|------|-------|---------|
+| Sharp | 0-2px | Technical, enterprise, data-heavy |
+| Subtle | 4-6px | Professional, balanced |
+| Rounded | 8-12px | Friendly, modern SaaS |
+| Pill | 16-24px or full | Playful, badges, tags |
+| Circle | 50% | Avatars, icon buttons |
+
+### Shadow Scale
+| Term | Value | Use for |
+|------|-------|---------|
+| None | none | Flat design, minimal |
+| Whisper | 0 1px 2px rgba(0,0,0,0.05) | Subtle elevation, cards |
+| Soft | 0 4px 6px rgba(0,0,0,0.07) | Standard cards, dropdowns |
+| Medium | 0 10px 15px rgba(0,0,0,0.1) | Elevated elements, modals |
+| Strong | 0 20px 25px rgba(0,0,0,0.15) | Floating elements, popovers |
+
+### Surface Hierarchy
+1. **Background** — deepest layer, covers viewport
+2. **Surface** — content containers (cards, panels) sitting on background
+3. **Elevated** — elements above surface (modals, dropdowns, tooltips)
+4. **Overlay** — dimming layer between surface and elevated elements
+
+## Layout Pattern Names
+
+| Pattern | Description | Best for |
+|---------|-------------|----------|
+| **Holy grail** | Header + sidebar + main + footer | Admin dashboards, apps |
+| **Magazine** | Multi-column with varied widths | Content sites, blogs |
+| **Single column** | Centered narrow content | Landing pages, articles, forms |
+| **Split screen** | Two equal or 60/40 halves | Comparison pages, sign-up flows |
+| **Card grid** | Uniform grid of cards | Product listings, portfolios |
+| **Asymmetric** | Deliberately unequal columns | Creative, editorial layouts |
+| **Full bleed** | Edge-to-edge sections, no max-width | Marketing pages, portfolios |
+| **Dashboard** | Stat cards + charts + tables in grid | Analytics, admin panels |
@@ -0,0 +1,109 @@
+# Quality Checklist
+
+Run through this checklist after generating or modifying a mockup. Three layers; run all that apply.
+
+## Layer 1: Structural Check (Always Run)
+
+### Semantic HTML
+- [ ] Uses `nav`, `main`, `section`, `article`, `aside`, `footer` — not just `div`
+- [ ] Single `h1` per page
+- [ ] Heading hierarchy follows h1 → h2 → h3 without skipping levels
+- [ ] Lists use `ul`/`ol`/`li`, not styled `div`s
+- [ ] Interactive elements are `button` or `a`, not clickable `div`s
+
+### Design Tokens
+- [ ] CSS custom properties defined in `<style>` block
+- [ ] Colors in HTML reference tokens (e.g., `var(--color-accent)`) not raw hex
+- [ ] Spacing follows the defined scale, not arbitrary pixel values
+- [ ] Font family matches DESIGN.md, not browser default or Inter/Roboto
+
+### Responsive Design
+- [ ] Mobile-first: base styles work at 375px
+- [ ] Content readable without horizontal scroll at all breakpoints
+- [ ] Navigation adapts: full nav on desktop, collapsed on mobile
+- [ ] Images/media have max-width: 100%
+- [ ] Touch targets minimum 44px on mobile
+
+### Interaction States
+- [ ] All buttons have hover, focus, active states
+- [ ] All links have hover and focus states
+- [ ] At least one loading state example (skeleton loader preferred)
+- [ ] At least one empty state with illustration + CTA
+- [ ] Disabled elements have visual indicator + explanation (title attribute)
+- [ ] Form inputs have focus ring using accent color
+
+### Component Quality
+- [ ] Button hierarchy: one primary per visible area, secondary and ghost variants present
+- [ ] Forms: labels above inputs, not placeholder-only
+- [ ] Error states: inline message below field with icon
+- [ ] No hamburger menu on desktop
+- [ ] No modal inside modal
+- [ ] No "Click here" links
+
+### Code Quality
+- [ ] Valid HTML (no unclosed tags, no duplicate IDs)
+- [ ] Tailwind classes are valid (no made-up utilities)
+- [ ] No inline styles that duplicate token values
+- [ ] File is self-contained (single HTML file, no external dependencies except Tailwind CDN)
+- [ ] Total file size under 50KB
+
+## Layer 2: Visual Check (When Browser Tool Available)
+
+Take a screenshot and examine:
+
+### Spacing & Alignment
+- [ ] Consistent margins between sections
+- [ ] Elements within the same row are vertically aligned
+- [ ] Padding within cards/containers is consistent
+- [ ] No orphan text (single word on its own line in headings)
+- [ ] Grid alignment: elements on the same row have matching heights or intentional variation
+
+### Typography
+- [ ] Heading sizes create clear hierarchy (visible difference between h1, h2, h3)
+- [ ] Body text is comfortable reading size (not tiny)
+- [ ] Font rendering looks correct (font loaded or appropriate fallback)
+- [ ] Line length: body text 50-75 characters per line
+
+### Color & Contrast
+- [ ] Primary accent is visible but not overwhelming
+- [ ] Text is readable over all backgrounds
+- [ ] No elements blend into their backgrounds
+- [ ] Status colors (success/error/warning) are distinguishable
+
+### Overall Composition
+- [ ] Visual weight is balanced (not all content on one side)
+- [ ] Clear focal point on the page (hero, headline, or primary CTA)
+- [ ] Appropriate whitespace: not cramped, not excessively empty
+- [ ] Consistent visual language throughout the page
+
+### Atmosphere Match
+- [ ] Overall feel matches the DESIGN.md atmosphere description
+- [ ] Not generic "AI generated" look
+- [ ] Color palette is cohesive (no unexpected color outliers)
+- [ ] Typography choice matches the intended mood
+
+## Layer 3: Compliance Check (When MCP Tools Available)
+
+### AccessLint MCP
+- [ ] Run `audit_html` on the generated file
+- [ ] Fix all violations with fixability "fixable" or "potentially_fixable"
+- [ ] Document any remaining violations that require manual judgment
+- [ ] Re-run `diff_html` to confirm fixes resolved violations
+
+### RenderLens MCP
+- [ ] Render at 1440px and 375px widths
+- [ ] Lighthouse accessibility score ≥ 80
+- [ ] Lighthouse performance score ≥ 70
+- [ ] Lighthouse best practices score ≥ 80
+- [ ] If iterating: run diff between previous and current version
+
+## Severity Classification
+
+When reporting issues found during the checklist:
+
+| Severity | Criteria | Action |
+|----------|----------|--------|
+| **Critical** | Broken layout, invisible content, no mobile support | Fix immediately before showing to user |
+| **High** | Missing interaction states, accessibility violations, token misuse | Fix before showing to user |
+| **Medium** | Minor spacing inconsistency, non-ideal font weight, slight alignment issue | Note in assessment, fix if easy |
+| **Low** | Style preference, minor polish opportunity | Note in assessment, fix during /design-polish |
@@ -0,0 +1,199 @@
+# Design System: [Project Name]
+
+## 1. Visual Atmosphere
+
+[Describe the mood, density, and aesthetic philosophy in 2-3 sentences. Be specific — never use "clean and modern". Reference the atmosphere type from design-vocabulary.md. Example: "A spacious, light-filled interface with generous whitespace that feels calm and unhurried. Elements float on a near-white canvas with subtle shadows providing depth. The overall impression is sophisticated simplicity — premium without being cold."]
+
+## 2. Color System
+
+### Primitives
+
+```css
+:root {
+  --white: #ffffff;
+  --black: #000000;
+
+  --gray-50: #______;
+  --gray-100: #______;
+  --gray-200: #______;
+  --gray-300: #______;
+  --gray-400: #______;
+  --gray-500: #______;
+  --gray-600: #______;
+  --gray-700: #______;
+  --gray-800: #______;
+  --gray-900: #______;
+  --gray-950: #______;
+
+  --accent-50: #______;
+  --accent-100: #______;
+  --accent-200: #______;
+  --accent-300: #______;
+  --accent-400: #______;
+  --accent-500: #______;
+  --accent-600: #______;
+  --accent-700: #______;
+  --accent-800: #______;
+  --accent-900: #______;
+
+  --red-500: #______;
+  --red-600: #______;
+  --green-500: #______;
+  --green-600: #______;
+  --amber-500: #______;
+  --amber-600: #______;
+}
+```
+
+### Semantic Tokens
+
+```css
+:root {
+  --color-bg-primary: var(--gray-50);
+  --color-bg-secondary: var(--gray-100);
+  --color-bg-surface: var(--white);
+  --color-bg-inverse: var(--gray-900);
+
+  --color-text-primary: var(--gray-900);
+  --color-text-secondary: var(--gray-500);
+  --color-text-tertiary: var(--gray-400);
+  --color-text-inverse: var(--white);
+  --color-text-link: var(--accent-600);
+
+  --color-accent: var(--accent-600);
+  --color-accent-hover: var(--accent-700);
+  --color-accent-light: var(--accent-50);
+
+  --color-border: var(--gray-200);
+  --color-border-strong: var(--gray-300);
+  --color-divider: var(--gray-100);
+
+  --color-error: var(--red-600);
+  --color-error-light: var(--red-500);
+  --color-success: var(--green-600);
+  --color-success-light: var(--green-500);
+  --color-warning: var(--amber-600);
+  --color-warning-light: var(--amber-500);
+}
+```
+
+### Component Tokens
+
+```css
+:root {
+  --button-primary-bg: var(--color-accent);
+  --button-primary-text: var(--color-text-inverse);
+  --button-primary-hover: var(--color-accent-hover);
+  --button-secondary-bg: transparent;
+  --button-secondary-border: var(--color-border-strong);
+  --button-secondary-text: var(--color-text-primary);
+
+  --card-bg: var(--color-bg-surface);
+  --card-border: var(--color-border);
+  --card-shadow: 0 1px 3px rgba(0, 0, 0, 0.08);
+
+  --input-bg: var(--color-bg-surface);
+  --input-border: var(--color-border);
+  --input-border-focus: var(--color-accent);
+  --input-text: var(--color-text-primary);
+  --input-placeholder: var(--color-text-tertiary);
+
+  --nav-bg: var(--color-bg-surface);
+  --nav-active-bg: var(--color-accent-light);
+  --nav-active-text: var(--color-accent);
+}
+```
+
+## 3. Typography
+
+- **Font family**: [Specific font name], [fallback], system-ui, sans-serif
+- **Font source**: Google Fonts link or system font
+
+| Level | Element | Size | Weight | Line Height | Letter Spacing |
+|-------|---------|------|--------|-------------|----------------|
+| Display | Hero headlines | 3rem (48px) | 700 | 1.1 | -0.02em |
+| H1 | Page title | 2.25rem (36px) | 700 | 1.2 | -0.01em |
+| H2 | Section title | 1.5rem (24px) | 600 | 1.3 | 0 |
+| H3 | Subsection | 1.25rem (20px) | 600 | 1.4 | 0 |
+| H4 | Card/group title | 1.125rem (18px) | 600 | 1.4 | 0 |
+| Body | Default text | 1rem (16px) | 400 | 1.5 | 0 |
+| Small | Captions, meta | 0.875rem (14px) | 400 | 1.5 | 0.01em |
+| XS | Labels, badges | 0.75rem (12px) | 500 | 1.4 | 0.02em |
+
+## 4. Spacing & Layout
+
+- **Base unit**: 4px (0.25rem)
+- **Spacing scale**: 1 (4px), 2 (8px), 3 (12px), 4 (16px), 5 (20px), 6 (24px), 8 (32px), 10 (40px), 12 (48px), 16 (64px), 20 (80px), 24 (96px)
+- **Content max-width**: [1200px / 1280px / 1440px]
+- **Grid**: [12-column / auto-fit] with [16px / 24px] gap
+
+| Breakpoint | Name | Min Width | Columns | Padding |
+|------------|------|-----------|---------|---------|
+| Mobile | sm | 0 | 1 | 16px |
+| Tablet | md | 768px | 2 | 24px |
+| Laptop | lg | 1024px | 3-4 | 32px |
+| Desktop | xl | 1280px | 4+ | 32px |
+
+## 5. Component Styling Defaults
+
+### Buttons
+- Border radius: [6px / 8px / full]
+- Padding: 10px 20px (md), 8px 16px (sm), 12px 24px (lg)
+- Font weight: 500
+- Transition: background-color 150ms ease, box-shadow 150ms ease
+- Focus: 2px ring with 2px offset using `--color-accent`
+- Disabled: opacity 0.5, cursor not-allowed
+
+### Cards
+- Border radius: [8px / 12px]
+- Border: 1px solid var(--card-border)
+- Shadow: var(--card-shadow)
+- Padding: 20-24px
+- Hover (if interactive): shadow increase or border-color change
+
+### Inputs
+- Height: 40px (md), 36px (sm), 48px (lg)
+- Border radius: 6px
+- Border: 1px solid var(--input-border)
+- Padding: 0 12px
+- Focus: border-color var(--input-border-focus) + 2px ring
+- Error: border-color var(--color-error) + error message below
+
+### Navigation
+- Item height: 40px
+- Active: background var(--nav-active-bg), text var(--nav-active-text)
+- Hover: background var(--color-bg-secondary)
+- Transition: background-color 150ms ease
+
+## 6. Interaction States (MANDATORY)
+
+### Loading
+- Use skeleton loaders matching content shape
+- Pulse animation: opacity 0.4 → 1.0, duration 1.5s, ease-in-out
+- Background: var(--color-bg-secondary)
+
+### Error
+- Inline message below the element
+- Icon (circle-exclamation) + red text using var(--color-error)
+- Border change on the input/container to var(--color-error)
+
+### Empty
+- Centered illustration or icon (64-96px)
+- Heading: "No [items] yet" or similar
+- Descriptive text: one sentence explaining what will appear
+- Primary CTA button: "Create first...", "Add...", "Import..."
+
+### Hover
+- Interactive elements: subtle background shift or underline
+- Cards: shadow increase or border-color change
+- Transition: 150ms ease
+
+### Focus
+- Visible ring: 2px solid var(--color-accent), 2px offset
+- Applied to all interactive elements (buttons, inputs, links, tabs)
+- Never remove outline without providing alternative focus indicator
+
+### Disabled
+- Opacity: 0.5
+- Cursor: not-allowed
+- Title attribute explaining why the element is disabled