diff --git a/.cursor/rules/human-input-sound.mdc b/.cursor/rules/human-attention-sound.mdc
similarity index 52%
rename from .cursor/rules/human-input-sound.mdc
rename to .cursor/rules/human-attention-sound.mdc
index e7e3aa3..1fbb3fc 100644
--- a/.cursor/rules/human-input-sound.mdc
+++ b/.cursor/rules/human-attention-sound.mdc
@@ -1,10 +1,12 @@
 ---
-description: "Play a notification sound whenever the AI agent needs human input, confirmation, or approval"
+description: "Play a notification sound when the AI agent needs human input or when AI generation is finished"
 alwaysApply: true
 ---
-# Sound Notification on Human Input
+# Sound Notification for Human Attention
 
-Whenever you are about to ask the user a question, request confirmation, present options for a decision, or otherwise pause and wait for human input, you MUST first run the appropriate shell command for the current OS:
+Play a notification sound whenever human attention is needed. This includes waiting for input AND completing generation.
+
+## Commands by OS
 
 - **macOS**: `afplay /System/Library/Sounds/Glass.aiff &`
 - **Linux**: `paplay /usr/share/sounds/freedesktop/stereo/bell.oga 2>/dev/null || aplay /usr/share/sounds/freedesktop/stereo/bell.oga 2>/dev/null || echo -e '\a' &`
@@ -12,13 +14,15 @@ Whenever you are about to ask the user a question, request confirmation, present
 
 Detect the OS from the user's system info or by running `uname -s` if unknown.
 
-This applies to:
+## When to play the sound
+
 - Asking clarifying questions
 - Presenting choices (e.g. via AskQuestion tool)
 - Requesting approval for destructive actions
 - Reporting that you are blocked and need guidance
 - Any situation where the conversation will stall without user response
+- **When AI generation is complete** — play the sound as the very last action before ending your turn, so the user knows the response is ready
 
-Do NOT play the sound when:
-- You are providing a final answer that doesn't require a response
-- You are in the middle of executing a multi-step task and just providing a status update
+## When NOT to play the sound
+
+- In the middle of executing a multi-step task and just providing a status update (more tool calls will follow)
diff --git a/.cursor/skills/autopilot/flows/existing-code.md b/.cursor/skills/autopilot/flows/existing-code.md
index 1da7e83..2207644 100644
--- a/.cursor/skills/autopilot/flows/existing-code.md
+++ b/.cursor/skills/autopilot/flows/existing-code.md
@@ -7,13 +7,13 @@ Workflow for projects with an existing codebase. Starts with documentation, prod
 | Step | Name                    | Sub-Skill                       | Internal SubSteps                     |
 |------|-------------------------|---------------------------------|---------------------------------------|
 | —    | Document (pre-step)     | document/SKILL.md               | Steps 1–8                             |
-| 2b   | Blackbox Test Spec      | blackbox-test-spec/SKILL.md     | Phase 1a–1b                           |
+| 2b   | Blackbox Test Spec      | test-spec/SKILL.md              | Phase 1a–1b                           |
 | 2c   | Decompose Tests         | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4             |
 | 2d   | Implement Tests         | implement/SKILL.md              | (batch-driven, no fixed sub-steps)    |
 | 2e   | Refactor                | refactor/SKILL.md               | Phases 0–5 (6-phase method)           |
 | 2f   | New Task                | new-task/SKILL.md               | Steps 1–8 (loop)                      |
 | 2g   | Implement               | implement/SKILL.md              | (batch-driven, no fixed sub-steps)    |
-| 2h   | Run Tests               | (autopilot-managed)             | Unit tests → Integration/blackbox tests |
+| 2h   | Run Tests               | (autopilot-managed)             | Unit tests → Blackbox tests |
 | 2hb  | Security Audit          | security/SKILL.md               | Phase 1–5 (optional)                  |
 | 2i   | Deploy                  | deploy/SKILL.md                 | Steps 1–7                             |
 
@@ -49,20 +49,20 @@ Action: An existing codebase without documentation was detected. Present using C
 ---
 
 **Step 2b — Blackbox Test Spec**
-Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/integration_tests/traceability_matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry)
+Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/tests/traceability-matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry)
 
-Action: Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`
+Action: Read and execute `.cursor/skills/test-spec/SKILL.md`
 
 This step applies when the codebase was documented via the `/document` skill. Test specifications must be produced before refactoring or further development.
 
 ---
 
 **Step 2c — Decompose Tests**
-Condition: `_docs/02_document/integration_tests/traceability_matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files)
+Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files)
 
-Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/integration_tests/` as input). The decompose skill will:
+Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will:
 1. Run Step 1t (test infrastructure bootstrap)
-2. Run Step 3 (integration test task decomposition)
+2. Run Step 3 (blackbox test task decomposition)
 3. Run Step 4 (cross-verification against test coverage)
 
 If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
@@ -117,7 +117,7 @@ Condition: the autopilot state shows Step 2g (Implement) is completed AND the au
 Action: Run the full test suite to verify the implementation before deployment.
 
 1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
-2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite
+2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests
 
 If all tests pass → auto-chain to Step 2hb (Security Audit).
diff --git a/.cursor/skills/autopilot/flows/greenfield.md b/.cursor/skills/autopilot/flows/greenfield.md
index 807a0af..37c6523 100644
--- a/.cursor/skills/autopilot/flows/greenfield.md
+++ b/.cursor/skills/autopilot/flows/greenfield.md
@@ -11,7 +11,7 @@ Workflow for new projects built from scratch. Flows linearly: Problem → Resear
 | 2    | Plan      | plan/SKILL.md          | Step 1–6 + Final                      |
 | 3    | Decompose | decompose/SKILL.md     | Step 1–4                              |
 | 4    | Implement | implement/SKILL.md     | (batch-driven, no fixed sub-steps)    |
-| 5    | Run Tests | (autopilot-managed)    | Unit tests → Integration/blackbox tests |
+| 5    | Run Tests | (autopilot-managed)    | Unit tests → Blackbox tests |
 | 5b   | Security Audit | security/SKILL.md | Phase 1–5 (optional)                  |
 | 6    | Deploy    | deploy/SKILL.md        | Step 1–7                              |
 
@@ -100,7 +100,7 @@ Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND t
 Action: Run the full test suite to verify the implementation before deployment.
 
 1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
-2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite
+2. **Blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the blackbox test suite
 3. **Report results**: present a summary of passed/failed/skipped tests
 
 If all tests pass → auto-chain to Step 5b (Security Audit).
diff --git a/.cursor/skills/blackbox-test-spec/SKILL.md b/.cursor/skills/blackbox-test-spec/SKILL.md
deleted file mode 100644
index 7ddf953..0000000
--- a/.cursor/skills/blackbox-test-spec/SKILL.md
+++ /dev/null
@@ -1,321 +0,0 @@
----
-name: blackbox-test-spec
-description: |
-  Black-box integration test specification skill. Analyzes input data completeness and produces
-  detailed E2E test scenarios (functional + non-functional) that treat the system as a black box.
-  3-phase workflow: input data completeness analysis, test scenario specification, test data validation gate.
-  Produces 5 artifacts under integration_tests/.
-  Trigger phrases:
-  - "blackbox test spec", "black box tests", "integration test spec"
-  - "test specification", "e2e test spec"
-  - "test scenarios", "black box scenarios"
-category: build
-tags: [testing, black-box, integration-tests, e2e, test-specification, qa]
-disable-model-invocation: true
----
-
-# Black-Box Test Scenario Specification
-
-Analyze input data completeness and produce detailed black-box integration test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
-
-## Core Principles
-
-- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
-- **Traceability**: every test traces to at least one acceptance criterion or restriction
-- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
-- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
-- **Spec, don't code**: this workflow produces test specifications, never test implementation code
-- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed
-
-## Context Resolution
-
-Fixed paths — no mode detection needed:
-
-- PROBLEM_DIR: `_docs/00_problem/`
-- SOLUTION_DIR: `_docs/01_solution/`
-- DOCUMENT_DIR: `_docs/02_document/`
-- TESTS_OUTPUT_DIR: `_docs/02_document/integration_tests/`
-
-Announce the resolved paths to the user before proceeding.
-
-## Input Specification
-
-### Required Files
-
-| File | Purpose |
-|------|---------|
-| `_docs/00_problem/problem.md` | Problem description and context |
-| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
-| `_docs/00_problem/restrictions.md` | Constraints and limitations |
-| `_docs/00_problem/input_data/` | Reference data examples |
-| `_docs/01_solution/solution.md` | Finalized solution |
-
-### Optional Files (used when available)
-
-| File | Purpose |
-|------|---------|
-| `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
-| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
-| `DOCUMENT_DIR/components/` | Component specs for interface identification |
-
-### Prerequisite Checks (BLOCKING)
-
-1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
-2. `restrictions.md` exists and is non-empty — **STOP if missing**
-3. `input_data/` exists and contains at least one file — **STOP if missing**
-4. `problem.md` exists and is non-empty — **STOP if missing**
-5. `solution.md` exists and is non-empty — **STOP if missing**
-6. Create TESTS_OUTPUT_DIR if it does not exist
-7. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**
-
-## Artifact Management
-
-### Directory Structure
-
-```
-TESTS_OUTPUT_DIR/
-├── environment.md
-├── test_data.md
-├── functional_tests.md
-├── non_functional_tests.md
-└── traceability_matrix.md
-```
-
-### Save Timing
-
-| Phase | Save immediately after | Filename |
-|-------|------------------------|----------|
-| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
-| Phase 2 | Environment spec | `environment.md` |
-| Phase 2 | Test data spec | `test_data.md` |
-| Phase 2 | Functional tests | `functional_tests.md` |
-| Phase 2 | Non-functional tests | `non_functional_tests.md` |
-| Phase 2 | Traceability matrix | `traceability_matrix.md` |
-| Phase 3 | Updated test data spec (if data added) | `test_data.md` |
-| Phase 3 | Updated functional tests (if tests removed) | `functional_tests.md` |
-| Phase 3 | Updated non-functional tests (if tests removed) | `non_functional_tests.md` |
-| Phase 3 | Updated traceability matrix (if tests removed) | `traceability_matrix.md` |
-
-### Resumability
-
-If TESTS_OUTPUT_DIR already contains files:
-
-1. List existing files and match them to the save timing table above
-2. Identify which phase/artifacts are complete
-3. Resume from the next incomplete artifact
-4. Inform the user which artifacts are being skipped
-
-## Progress Tracking
-
-At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes.
-
-## Workflow
-
-### Phase 1: Input Data Completeness Analysis
-
-**Role**: Professional Quality Assurance Engineer
-**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
-**Constraints**: Analysis only — no test specs yet
-
-1. Read `_docs/01_solution/solution.md`
-2. Read `acceptance_criteria.md`, `restrictions.md`
-3. Read testing strategy from solution.md (if present)
-4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
-5. Analyze `input_data/` contents against:
-   - Coverage of acceptance criteria scenarios
-   - Coverage of restriction edge cases
-   - Coverage of testing strategy requirements
-6. Threshold: at least 70% coverage of the scenarios
-7. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/`
-8. Present coverage assessment to user
-
-**BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient.
-
----
-
-### Phase 2: Black-Box Test Scenario Specification
-
-**Role**: Professional Quality Assurance Engineer
-**Goal**: Produce detailed black-box test specifications covering functional and non-functional scenarios
-**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
-
-Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
-
-1. Define test environment using `.cursor/skills/plan/templates/integration-environment.md` as structure
-2. Define test data management using `.cursor/skills/plan/templates/integration-test-data.md` as structure
-3. Write functional test scenarios (positive + negative) using `.cursor/skills/plan/templates/integration-functional-tests.md` as structure
-4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `.cursor/skills/plan/templates/integration-non-functional-tests.md` as structure
-5. Build traceability matrix using `.cursor/skills/plan/templates/integration-traceability-matrix.md` as structure
-
-**Self-verification**:
-- [ ] Every acceptance criterion is covered by at least one test scenario
-- [ ] Every restriction is verified by at least one test scenario
-- [ ] Positive and negative scenarios are balanced
-- [ ] Consumer app has no direct access to system internals
-- [ ] Docker environment is self-contained (`docker compose up` sufficient)
-- [ ] External dependencies have mock/stub services defined
-- [ ] Traceability matrix has no uncovered AC or restrictions
-
-**Save action**: Write all files under TESTS_OUTPUT_DIR:
-- `environment.md`
-- `test_data.md`
-- `functional_tests.md`
-- `non_functional_tests.md`
-- `traceability_matrix.md`
-
-**BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed.
-
-Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
-
----
-
-### Phase 3: Test Data Validation Gate (HARD GATE)
-
-**Role**: Professional Quality Assurance Engineer
-**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%.
-**Constraints**: This phase is MANDATORY and cannot be skipped.
-
-#### Step 1 — Build the test-data requirements checklist
-
-Scan `functional_tests.md` and `non_functional_tests.md`. For every test scenario, extract:
-
-| # | Test Scenario ID | Test Name | Required Data Description | Required Data Quality | Required Data Quantity | Data Provided? |
-|---|-----------------|-----------|---------------------------|----------------------|----------------------|----------------|
-
-Present this table to the user.
-
-#### Step 2 — Ask user to provide test data
-
-For each row where **Data Provided?** is **No**, ask the user:
-
-> **Option A — Provide the data**: Supply the necessary test data files (with required quality and quantity as described in the table). Place them in `_docs/00_problem/input_data/` or indicate the location.
->
-> **Option B — Skip this test**: If you cannot provide the data, this test scenario will be **removed** from the specification.
-
-**BLOCKING**: Wait for the user's response for every missing data item.
-
-#### Step 3 — Validate provided data
-
-For each item where the user chose **Option A**:
-
-1. Verify the data file(s) exist at the indicated location
-2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
-3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
-4. If validation fails, report the specific issue and loop back to Step 2 for that item
-
-#### Step 4 — Remove tests without data
-
-For each item where the user chose **Option B**:
-
-1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data.`
-2. Remove the test scenario from `functional_tests.md` or `non_functional_tests.md`
-3. Remove corresponding rows from `traceability_matrix.md`
-4. Update `test_data.md` to reflect the removal
-
-**Save action**: Write updated files under TESTS_OUTPUT_DIR:
-- `test_data.md`
-- `functional_tests.md` (if tests removed)
-- `non_functional_tests.md` (if tests removed)
-- `traceability_matrix.md` (if tests removed)
-
-#### Step 5 — Final coverage check
-
-After all removals, recalculate coverage:
-
-1. Count remaining test scenarios that trace to acceptance criteria
-2. Count total acceptance criteria + restrictions
-3. Calculate coverage percentage: `covered_items / total_items * 100`
-
-| Metric | Value |
-|--------|-------|
-| Total AC + Restrictions | ? |
-| Covered by remaining tests | ? |
-| **Coverage %** | **?%** |
-
-**Decision**:
-
-- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user.
-- **Coverage < 70%** → Phase 3 **FAILED**. Report:
-  > ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
-  >
-  > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
-  > |---|---|---|
-  >
-  > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
-
-  **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%.
-
-#### Phase 3 Completion
-
-When coverage ≥ 70% and all remaining tests have validated data:
-
-1. Present the final coverage report
-2. List all removed tests (if any) with reasons
-3. Confirm all artifacts are saved and consistent
-
----
-
-## Escalation Rules
-
-| Situation | Action |
-|-----------|--------|
-| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
-| Ambiguous requirements | ASK user |
-| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
-| Test scenario conflicts with restrictions | ASK user to clarify intent |
-| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
-| Test data not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
-| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
-
-## Common Mistakes
-
-- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
-- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
-- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
-- **Untraceable tests**: every test should trace to at least one AC or restriction
-- **Writing test code**: this skill produces specifications, never implementation code
-- **Tests without data**: every test scenario MUST have concrete test data; a test spec without data is not executable and must be removed
-
-## Trigger Conditions
-
-When the user wants to:
-- Specify black-box integration tests before implementation or refactoring
-- Analyze input data completeness for test coverage
-- Produce E2E test scenarios from acceptance criteria
-
-**Keywords**: "blackbox test spec", "black box tests", "integration test spec", "test specification", "e2e test spec", "test scenarios"
-
-## Methodology Quick Reference
-
-```
-┌─────────────────────────────────────────────────────────────────┐
-│       Black-Box Test Scenario Specification (3-Phase)           │
-├─────────────────────────────────────────────────────────────────┤
-│ PREREQ: Data Gate (BLOCKING)                                    │
-│   → verify AC, restrictions, input_data, solution exist         │
-│                                                                 │
-│ Phase 1: Input Data Completeness Analysis                       │
-│   → assess input_data/ coverage vs AC scenarios (≥70%)          │
-│   [BLOCKING: user confirms input data coverage]                 │
-│                                                                 │
-│ Phase 2: Black-Box Test Scenario Specification                  │
-│   → environment.md                                              │
-│   → test_data.md                                                │
-│   → functional_tests.md (positive + negative)                   │
-│   → non_functional_tests.md (perf, resilience, security, limits)│
-│   → traceability_matrix.md                                      │
-│   [BLOCKING: user confirms test coverage]                       │
-│                                                                 │
-│ Phase 3: Test Data Validation Gate (HARD GATE)                  │
-│   → build test-data requirements checklist                      │
-│   → ask user: provide data (Option A) or remove test (Option B) │
-│   → validate provided data (quality + quantity)                 │
-│   → remove tests without data, warn user                        │
-│   → final coverage check (≥70% or FAIL + loop back)            │
-│   [BLOCKING: coverage ≥ 70% required to pass]                  │
-├─────────────────────────────────────────────────────────────────┤
-│ Principles: Black-box only · Traceability · Save immediately    │
-│             Ask don't assume · Spec don't code                  │
-│             No test without data                                │
-└─────────────────────────────────────────────────────────────────┘
-```
diff --git a/.cursor/skills/code-review/SKILL.md b/.cursor/skills/code-review/SKILL.md
index 1c5bd4f..44c190c 100644
--- a/.cursor/skills/code-review/SKILL.md
+++ b/.cursor/skills/code-review/SKILL.md
@@ -46,7 +46,7 @@ For each task, verify implementation satisfies every acceptance criterion:
 
 - Walk through each AC (Given/When/Then) and trace it in the code
 - Check that unit tests cover each AC
-- Check that integration tests exist where specified in the task spec
+- Check that blackbox tests exist where specified in the task spec
 - Flag any AC that is not demonstrably satisfied as a **Spec-Gap** finding (severity: High)
 - Flag any scope creep (implementation beyond what the spec asked for) as a **Scope** finding (severity: Low)
 
diff --git a/.cursor/skills/decompose/SKILL.md b/.cursor/skills/decompose/SKILL.md
index 3837814..ac1cb2c 100644
--- a/.cursor/skills/decompose/SKILL.md
+++ b/.cursor/skills/decompose/SKILL.md
@@ -2,7 +2,7 @@
 name: decompose
 description: |
   Decompose planned components into atomic implementable tasks with bootstrap structure plan.
-  4-step workflow: bootstrap structure plan, component task decomposition, integration test task decomposition, and cross-task verification.
+  4-step workflow: bootstrap structure plan, component task decomposition, blackbox test task decomposition, and cross-task verification.
   Supports full decomposition (_docs/ structure), single component mode, and tests-only mode.
   Trigger phrases:
   - "decompose", "decompose features", "feature decomposition"
@@ -36,7 +36,7 @@ Determine the operating mode based on invocation before any other logic runs.
 - DOCUMENT_DIR: `_docs/02_document/`
 - TASKS_DIR: `_docs/02_tasks/`
 - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, DOCUMENT_DIR
-- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (integration tests) + Step 4 (cross-verification)
+- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (blackbox tests) + Step 4 (cross-verification)
 
 **Single component mode** (provided file is within `_docs/02_document/` and inside a `components/` subdirectory):
 - DOCUMENT_DIR: `_docs/02_document/`
@@ -45,12 +45,12 @@ Determine the operating mode based on invocation before any other logic runs.
 - Ask user for the parent Epic ID
 - Runs Step 2 (that component only, appending to existing task numbering)
 
-**Tests-only mode** (provided file/directory is within `integration_tests/`, or `DOCUMENT_DIR/integration_tests/` exists and input explicitly requests test decomposition):
+**Tests-only mode** (provided file/directory is within `tests/`, or `DOCUMENT_DIR/tests/` exists and input explicitly requests test decomposition):
 - DOCUMENT_DIR: `_docs/02_document/`
 - TASKS_DIR: `_docs/02_tasks/`
-- TESTS_DIR: `DOCUMENT_DIR/integration_tests/`
+- TESTS_DIR: `DOCUMENT_DIR/tests/`
 - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR
-- Runs Step 1t (test infrastructure bootstrap) + Step 3 (integration test decomposition) + Step 4 (cross-verification against test coverage)
+- Runs Step 1t (test infrastructure bootstrap) + Step 3 (blackbox test decomposition) + Step 4 (cross-verification against test coverage)
 - Skips Step 1 (project bootstrap) and Step 2 (component decomposition) — the codebase already exists
 
 Announce the detected mode and resolved paths to the user before proceeding.
@@ -70,7 +70,7 @@ Announce the detected mode and resolved paths to the user before proceeding.
 | `DOCUMENT_DIR/architecture.md` | Architecture from plan skill |
 | `DOCUMENT_DIR/system-flows.md` | System flows from plan skill |
 | `DOCUMENT_DIR/components/[##]_[name]/description.md` | Component specs from plan skill |
-| `DOCUMENT_DIR/integration_tests/` | Integration test specs from plan skill |
+| `DOCUMENT_DIR/tests/` | Blackbox test specs from plan skill |
 
 **Single component mode:**
 
@@ -84,10 +84,13 @@ Announce the detected mode and resolved paths to the user before proceeding.
 | File | Purpose |
 |------|---------|
 | `TESTS_DIR/environment.md` | Test environment specification (Docker services, networks, volumes) |
-| `TESTS_DIR/test_data.md` | Test data management (seed data, mocks, isolation) |
-| `TESTS_DIR/functional_tests.md` | Functional test scenarios (positive + negative) |
-| `TESTS_DIR/non_functional_tests.md` | Non-functional test scenarios (perf, resilience, security, limits) |
-| `TESTS_DIR/traceability_matrix.md` | AC/restriction coverage mapping |
+| `TESTS_DIR/test-data.md` | Test data management (seed data, mocks, isolation) |
+| `TESTS_DIR/blackbox-tests.md` | Blackbox functional scenarios (positive + negative) |
+| `TESTS_DIR/performance-tests.md` | Performance test scenarios |
+| `TESTS_DIR/resilience-tests.md` | Resilience test scenarios |
+| `TESTS_DIR/security-tests.md` | Security test scenarios |
+| `TESTS_DIR/resource-limit-tests.md` | Resource limit test scenarios |
+| `TESTS_DIR/traceability-matrix.md` | AC/restriction coverage mapping |
 | `_docs/00_problem/problem.md` | Problem context |
 | `_docs/00_problem/restrictions.md` | Constraints for test design |
 | `_docs/00_problem/acceptance_criteria.md` | Acceptance criteria being verified |
@@ -103,7 +106,7 @@ Announce the detected mode and resolved paths to the user before proceeding.
 1. The provided component file exists and is non-empty — **STOP if missing**
 
 **Tests-only mode:**
-1. `TESTS_DIR/functional_tests.md` exists and is non-empty — **STOP if missing**
+1. `TESTS_DIR/blackbox-tests.md` exists and is non-empty — **STOP if missing**
 2. `TESTS_DIR/environment.md` exists — **STOP if missing**
 3. Create TASKS_DIR if it does not exist
 4. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?**
@@ -130,7 +133,7 @@ TASKS_DIR/
 | Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` |
 | Step 1t | Test infrastructure bootstrap complete + Jira ticket created + file renamed | `[JIRA-ID]_test_infrastructure.md` |
 | Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
-| Step 3 | Each integration test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
+| Step 3 | Each blackbox test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
 | Step 4 | Cross-task verification complete | `_dependencies_table.md` |
 
 ### Resumability
@@ -153,7 +156,7 @@ At the start of execution, create a TodoWrite with all applicable steps. Update
 **Goal**: Produce `01_test_infrastructure.md` — the first task describing the test project scaffold
 **Constraints**: This is a plan document, not code. The `/implement` skill executes it.
 
-1. Read `TESTS_DIR/environment.md` and `TESTS_DIR/test_data.md`
+1. Read `TESTS_DIR/environment.md` and `TESTS_DIR/test-data.md`
 2. Read problem.md, restrictions.md, acceptance_criteria.md for domain context
 3. Document the test infrastructure plan using `templates/test-infrastructure-task.md`
 
@@ -162,20 +165,20 @@ The test infrastructure bootstrap must include:
 - Mock/stub service definitions for each external dependency
 - `docker-compose.test.yml` structure from environment.md
 - Test runner configuration (framework, plugins, fixtures)
-- Test data fixture setup from test_data.md seed data sets
+- Test data fixture setup from test-data.md seed data sets
 - Test reporting configuration (format, output path)
 - Data isolation strategy
 
 **Self-verification**:
 - [ ] Every external dependency from environment.md has a mock service defined
 - [ ] Docker Compose structure covers all services from environment.md
-- [ ] Test data fixtures cover all seed data sets from test_data.md
+- [ ] Test data fixtures cover all seed data sets from test-data.md
 - [ ] Test runner configuration matches the consumer app tech stack from environment.md
 - [ ] Data isolation strategy is defined
 
 **Save action**: Write `01_test_infrastructure.md` (temporary numeric name)
 
-**Jira action**: Create a Jira ticket for this task under the "Integration Tests" epic. Write the Jira ticket ID and Epic ID back into the task header.
+**Jira action**: Create a Jira ticket for this task under the "Blackbox Tests" epic. Write the Jira ticket ID and Epic ID back into the task header.
 
 **Rename action**: Rename the file from `01_test_infrastructure.md` to `[JIRA-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename.
 
@@ -199,27 +202,27 @@ The bootstrap structure plan must include:
 - Shared models, interfaces, and DTOs
 - Dockerfile per component (multi-stage, non-root, health checks, pinned base images)
 - `docker-compose.yml` for local development (all components + database + dependencies)
-- `docker-compose.test.yml` for integration test environment (black-box test runner)
+- `docker-compose.test.yml` for blackbox test environment (blackbox test runner)
 - `.dockerignore`
 - CI/CD pipeline file (`.github/workflows/ci.yml` or `azure-pipelines.yml`) with stages from `deployment/ci_cd_pipeline.md`
 - Database migration setup and initial seed data scripts
 - Observability configuration: structured logging setup, health check endpoints (`/health/live`, `/health/ready`), metrics endpoint (`/metrics`)
 - Environment variable documentation (`.env.example`)
-- Test structure with unit and integration test locations
+- Test structure with unit and blackbox test locations
 
 **Self-verification**:
 - [ ] All components have corresponding folders in the layout
 - [ ] All inter-component interfaces have DTOs defined
 - [ ] Dockerfile defined for each component
 - [ ] `docker-compose.yml` covers all components and dependencies
-- [ ] `docker-compose.test.yml` enables black-box integration testing
+- [ ] `docker-compose.test.yml` enables blackbox testing
 - [ ] CI/CD pipeline file defined with lint, test, security, build, deploy stages
 - [ ] Database migration setup included
 - [ ] Health check endpoints specified for each service
 - [ ] Structured logging configuration included
 - [ ] `.env.example` with all required environment variables
 - [ ] Environment strategy covers dev, staging, production
-- [ ] Test structure includes unit and integration test locations
+- [ ] Test structure includes unit and blackbox test locations
 
 **Save action**: Write `01_initial_structure.md` (temporary numeric name)
 
@@ -265,33 +268,33 @@ For each component (or the single provided component):
 
 ---
 
-### Step 3: Integration Test Task Decomposition (default and tests-only modes)
+### Step 3: Blackbox Test Task Decomposition (default and tests-only modes)
 
 **Role**: Professional Quality Assurance Engineer
-**Goal**: Decompose integration test specs into atomic, implementable task specs
+**Goal**: Decompose blackbox test specs into atomic, implementable task specs
 **Constraints**: Behavioral specs only — describe what, not how. No test code.
 
 **Numbering**:
 - In default mode: continue sequential numbering from where Step 2 left off.
 - In tests-only mode: start from 02 (01 is the test infrastructure bootstrap from Step 1t).
 
-1. Read all test specs from `DOCUMENT_DIR/integration_tests/` (functional_tests.md, non_functional_tests.md)
+1. Read all test specs from `DOCUMENT_DIR/tests/` (`blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, `resource-limit-tests.md`)
 2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
-3. Each task should reference the specific test scenarios it implements and the environment/test_data specs
+3. Each task should reference the specific test scenarios it implements and the environment/test-data specs
 4. Dependencies:
-   - In default mode: integration test tasks depend on the component implementation tasks they exercise
-   - In tests-only mode: integration test tasks depend on the test infrastructure bootstrap task (Step 1t)
+   - In default mode: blackbox test tasks depend on the component implementation tasks they exercise
+   - In tests-only mode: blackbox test tasks depend on the test infrastructure bootstrap task (Step 1t)
 5. Write each task spec using `templates/task.md`
 6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
 7. Note task dependencies (referencing Jira IDs of already-created dependency tasks)
-8. **Immediately after writing each task file**: create a Jira ticket under the "Integration Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
+8. **Immediately after writing each task file**: create a Jira ticket under the "Blackbox Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`.
 
 **Self-verification**:
-- [ ] Every functional test scenario from `integration_tests/functional_tests.md` is covered by a task
-- [ ] Every non-functional test scenario from `integration_tests/non_functional_tests.md` is covered by a task
+- [ ] Every scenario from `tests/blackbox-tests.md` is covered by a task
+- [ ] Every scenario from `tests/performance-tests.md`, `tests/resilience-tests.md`, `tests/security-tests.md`, and `tests/resource-limit-tests.md` is covered by a task
 - [ ] No task exceeds 5 complexity points
 - [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode)
-- [ ] Every task has a Jira ticket linked to the "Integration Tests" epic
+- [ ] Every task has a Jira ticket linked to the "Blackbox Tests" epic
 
 **Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`.
 
@@ -306,7 +309,7 @@ For each component (or the single provided component):
 1. Verify task dependencies across all tasks are consistent
 2. Check no gaps:
    - In default mode: every interface in architecture.md has tasks covering it
-   - In tests-only mode: every test scenario in `traceability_matrix.md` is covered by a task
+   - In tests-only mode: every test scenario in `traceability-matrix.md` is covered by a task
 3. Check no overlaps: tasks don't duplicate work
 4. Check no circular dependencies in the task graph
 5. Produce `_dependencies_table.md` using `templates/dependencies-table.md`
@@ -320,7 +323,7 @@ Default mode:
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
 
 Tests-only mode:
-- [ ] Every test scenario from traceability_matrix.md "Covered" entries has a corresponding task
+- [ ] Every test scenario from traceability-matrix.md "Covered" entries has a corresponding task
 - [ ] No circular dependencies in the task graph
 - [ ] Test task dependencies reference the test infrastructure bootstrap
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
@@ -366,14 +369,14 @@ Tests-only mode:
 │  1.  Bootstrap Structure  → [JIRA-ID]_initial_structure.md     │
 │      [BLOCKING: user confirms structure]                       │
 │  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
-│  3.  Integration Tests    → [JIRA-ID]_[short_name].md each    │
+│  3.  Blackbox Tests       → [JIRA-ID]_[short_name].md each    │
 │  4.  Cross-Verification   → _dependencies_table.md            │
 │      [BLOCKING: user confirms dependencies]                    │
 │                                                                │
 │ TESTS-ONLY MODE:                                                │
 │  1t. Test Infrastructure  → [JIRA-ID]_test_infrastructure.md   │
 │      [BLOCKING: user confirms test scaffold]                   │
-│  3.  Integration Tests    → [JIRA-ID]_[short_name].md each    │
+│  3.  Blackbox Tests       → [JIRA-ID]_[short_name].md each    │
 │  4.  Cross-Verification   → _dependencies_table.md            │
 │      [BLOCKING: user confirms dependencies]                    │
 │                                                                │
diff --git a/.cursor/skills/decompose/templates/initial-structure-task.md b/.cursor/skills/decompose/templates/initial-structure-task.md
index 9642f65..371e5e0 100644
--- a/.cursor/skills/decompose/templates/initial-structure-task.md
+++ b/.cursor/skills/decompose/templates/initial-structure-task.md
@@ -49,7 +49,7 @@ project-root/
 | Build | Compile/bundle the application | Every push |
 | Lint / Static Analysis | Code quality and style checks | Every push |
 | Unit Tests | Run unit test suite | Every push |
-| Integration Tests | Run integration test suite | Every push |
+| Blackbox Tests | Run blackbox test suite | Every push |
 | Security Scan | SAST / dependency check | Every push |
 | Deploy to Staging | Deploy to staging environment | Merge to staging branch |
 
diff --git a/.cursor/skills/decompose/templates/task.md b/.cursor/skills/decompose/templates/task.md
index d8547a9..f36ea38 100644
--- a/.cursor/skills/decompose/templates/task.md
+++ b/.cursor/skills/decompose/templates/task.md
@@ -64,7 +64,7 @@ Then [expected result]
 |--------|-------------|-----------------|
 | AC-1 | [test subject] | [expected result] |
 
-## Integration Tests
+## Blackbox Tests
 
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
diff --git a/.cursor/skills/decompose/templates/test-infrastructure-task.md b/.cursor/skills/decompose/templates/test-infrastructure-task.md
index 49b18ca..a07cb42 100644
--- a/.cursor/skills/decompose/templates/test-infrastructure-task.md
+++ b/.cursor/skills/decompose/templates/test-infrastructure-task.md
@@ -9,10 +9,10 @@ Use this template for the test infrastructure bootstrap (Step 1t in tests-only m
 
 **Task**: [JIRA-ID]_test_infrastructure
 **Name**: Test Infrastructure
-**Description**: Scaffold the E2E test project — test runner, mock services, Docker test environment, test data fixtures, reporting
+**Description**: Scaffold the Blackbox test project — test runner, mock services, Docker test environment, test data fixtures, reporting
 **Complexity**: [3|5] points
 **Dependencies**: None
-**Component**: Integration Tests
+**Component**: Blackbox Tests
 **Jira**: [TASK-ID]
 **Epic**: [EPIC-ID]
 
@@ -124,6 +124,6 @@ Then a report file exists at the configured output path with correct columns
 
 - This is a PLAN document, not code. The `/implement` skill executes it.
 - Focus on test infrastructure decisions, not individual test implementations.
-- Reference environment.md and test_data.md from the test specs — don't repeat everything.
+- Reference environment.md and test-data.md from the test specs — don't repeat everything.
 - Mock services must be deterministic: same input always produces same output.
 - The Docker environment must be self-contained: `docker compose up` sufficient.
diff --git a/.cursor/skills/deploy/SKILL.md b/.cursor/skills/deploy/SKILL.md
index 022520f..d3bc3e6 100644
--- a/.cursor/skills/deploy/SKILL.md
+++ b/.cursor/skills/deploy/SKILL.md
@@ -20,7 +20,7 @@ Plan and document the full deployment lifecycle: check deployment status and env
 
 ## Core Principles
 
-- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker
+- **Docker-first**: every component runs in a container; local dev, blackbox tests, and production all use Docker
 - **Infrastructure as code**: all deployment configuration is version-controlled
 - **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
 - **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
@@ -157,7 +157,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 ### Step 2: Containerization
 
 **Role**: DevOps / Platform engineer
-**Goal**: Define Docker configuration for every component, local development, and integration test environments
+**Goal**: Define Docker configuration for every component, local development, and blackbox test environments
 **Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
 
 1. Read architecture.md and all component specs
@@ -176,7 +176,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
    - Any message queues, caches, or external service mocks
    - Shared network
    - Environment variable files (`.env`)
-6. Define `docker-compose.test.yml` for integration tests:
+6. Define `docker-compose.test.yml` for blackbox tests:
    - Application components under test
    - Test runner container (black-box, no internal imports)
    - Isolated database with seed data
@@ -189,7 +189,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 - [ ] Non-root user for all containers
 - [ ] Health checks defined for every service
 - [ ] docker-compose.yml covers all components + dependencies
-- [ ] docker-compose.test.yml enables black-box integration testing
+- [ ] docker-compose.test.yml enables black-box testing
 - [ ] `.dockerignore` defined
 
 **Save action**: Write `containerization.md` using `templates/containerization.md`
@@ -212,7 +212,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 | Stage | Trigger | Steps | Quality Gate |
 |-------|---------|-------|-------------|
 | **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
-| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage |
+| **Test** | Every push | Unit tests, blackbox tests, coverage report | 75%+ coverage |
 | **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
 | **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
 | **Push** | After build | Push to container registry | Push succeeds |
@@ -458,7 +458,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 
 - **Implementing during planning**: Steps 1–6 produce documents, not code (Step 7 is the exception — it creates scripts)
 - **Hardcoding secrets**: never include real credentials in deployment documents or scripts
-- **Ignoring integration test containerization**: the test environment must be containerized alongside the app
+- **Ignoring blackbox test containerization**: the test environment must be containerized alongside the app
 - **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
 - **Using `:latest` tags**: always pin base image versions
 - **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
diff --git a/.cursor/skills/deploy/templates/ci_cd_pipeline.md b/.cursor/skills/deploy/templates/ci_cd_pipeline.md
index 57b8b41..16102e3 100644
--- a/.cursor/skills/deploy/templates/ci_cd_pipeline.md
+++ b/.cursor/skills/deploy/templates/ci_cd_pipeline.md
@@ -28,7 +28,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`.
 
 ### Test
 - Unit tests: [framework and command]
-- Integration tests: [framework and command, uses docker-compose.test.yml]
+- Blackbox tests: [framework and command, uses docker-compose.test.yml]
 - Coverage threshold: 75% overall, 90% critical paths
 - Coverage report published as pipeline artifact
 
@@ -54,7 +54,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`.
 - Automated rollback on health check failure
 
 ### Smoke Tests
-- Subset of integration tests targeting staging environment
+- Subset of blackbox tests targeting staging environment
 - Validates critical user flows
 - Timeout: [maximum duration]
 
diff --git a/.cursor/skills/deploy/templates/containerization.md b/.cursor/skills/deploy/templates/containerization.md
index d1025be..d6c7073 100644
--- a/.cursor/skills/deploy/templates/containerization.md
+++ b/.cursor/skills/deploy/templates/containerization.md
@@ -48,7 +48,7 @@ networks:
   [shared network]
 ```
 
-## Docker Compose — Integration Tests
+## Docker Compose — Blackbox Tests
 
 ```yaml
 # docker-compose.test.yml structure
diff --git a/.cursor/skills/new-task/templates/task.md b/.cursor/skills/new-task/templates/task.md
index d8547a9..f36ea38 100644
--- a/.cursor/skills/new-task/templates/task.md
+++ b/.cursor/skills/new-task/templates/task.md
@@ -64,7 +64,7 @@ Then [expected result]
 |--------|-------------|-----------------|
 | AC-1 | [test subject] | [expected result] |
 
-## Integration Tests
+## Blackbox Tests
 
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
diff --git a/.cursor/skills/plan/SKILL.md b/.cursor/skills/plan/SKILL.md
index ef4b3a1..b1cc48d 100644
--- a/.cursor/skills/plan/SKILL.md
+++ b/.cursor/skills/plan/SKILL.md
@@ -59,9 +59,9 @@ At the start of execution, create a TodoWrite with all steps (1 through 6 plus F
 
 ## Workflow
 
-### Step 1: Integration Tests
+### Step 1: Blackbox Tests
 
-Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`.
+Read and execute `.cursor/skills/test-spec/SKILL.md`.
 
 Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3.
 
@@ -111,7 +111,7 @@ Read and follow `steps/07_quality-checklist.md`.
 - **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input
 - **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output
 - **Ignoring restrictions.md**: every constraint must be traceable in the architecture or risk register
-- **Ignoring integration test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)
+- **Ignoring blackbox test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)
 
 ## Escalation Rules
 
@@ -135,7 +135,7 @@ Read and follow `steps/07_quality-checklist.md`.
 │ PREREQ: Data Gate (BLOCKING)                                    │
 │   → verify AC, restrictions, input_data, solution exist         │
 │                                                                │
-│ 1. Integration Tests   → blackbox-test-spec/SKILL.md            │
+│ 1. Blackbox Tests      → test-spec/SKILL.md                     │
 │    [BLOCKING: user confirms test coverage]                     │
 │ 2. Solution Analysis   → architecture, data model, deployment   │
 │    [BLOCKING: user confirms architecture]                      │
diff --git a/.cursor/skills/plan/steps/01_artifact-management.md b/.cursor/skills/plan/steps/01_artifact-management.md
index 7e09a42..1a5a9cf 100644
--- a/.cursor/skills/plan/steps/01_artifact-management.md
+++ b/.cursor/skills/plan/steps/01_artifact-management.md
@@ -6,12 +6,15 @@ All artifacts are written directly under DOCUMENT_DIR:
 
 ```
 DOCUMENT_DIR/
-├── integration_tests/
-│   ├── environment.md
-│   ├── test_data.md
-│   ├── functional_tests.md
-│   ├── non_functional_tests.md
-│   └── traceability_matrix.md
+├── tests/
+│   ├── test-environment.md
+│   ├── test-data.md
+│   ├── blackbox-tests.md
+│   ├── performance-tests.md
+│   ├── resilience-tests.md
+│   ├── security-tests.md
+│   ├── resource-limit-tests.md
+│   └── traceability-matrix.md
 ├── architecture.md
 ├── system-flows.md
 ├── data_model.md
@@ -47,11 +50,14 @@ DOCUMENT_DIR/
 
 | Step | Save immediately after | Filename |
 |------|------------------------|----------|
-| Step 1 | Integration test environment spec | `integration_tests/environment.md` |
-| Step 1 | Integration test data spec | `integration_tests/test_data.md` |
-| Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
-| Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
-| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
+| Step 1 | Blackbox test environment spec | `tests/test-environment.md` |
+| Step 1 | Blackbox test data spec | `tests/test-data.md` |
+| Step 1 | Blackbox tests | `tests/blackbox-tests.md` |
+| Step 1 | Blackbox performance tests | `tests/performance-tests.md` |
+| Step 1 | Blackbox resilience tests | `tests/resilience-tests.md` |
+| Step 1 | Blackbox security tests | `tests/security-tests.md` |
+| Step 1 | Blackbox resource limit tests | `tests/resource-limit-tests.md` |
+| Step 1 | Blackbox traceability matrix | `tests/traceability-matrix.md` |
 | Step 2 | Architecture analysis complete | `architecture.md` |
 | Step 2 | System flows documented | `system-flows.md` |
 | Step 2 | Data model documented | `data_model.md` |
diff --git a/.cursor/skills/plan/steps/02_solution-analysis.md b/.cursor/skills/plan/steps/02_solution-analysis.md
index 74f1554..701f409 100644
--- a/.cursor/skills/plan/steps/02_solution-analysis.md
+++ b/.cursor/skills/plan/steps/02_solution-analysis.md
@@ -7,7 +7,7 @@
 ### Phase 2a: Architecture & Flows
 
 1. Read all input files thoroughly
-2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
+2. Incorporate findings, questions, and insights discovered during Step 1 (blackbox tests)
 3. Research unknown or questionable topics via internet; ask user about ambiguities
 4. Document architecture using `templates/architecture.md` as structure
 5. Document system flows using `templates/system-flows.md` as structure
@@ -17,7 +17,7 @@
 - [ ] System flows cover all main user/system interactions
 - [ ] No contradictions with problem.md or restrictions.md
 - [ ] Technology choices are justified
-- [ ] Integration test findings are reflected in architecture decisions
+- [ ] Blackbox test findings are reflected in architecture decisions
 
 **Save action**: Write `architecture.md` and `system-flows.md`
 
diff --git a/.cursor/skills/plan/steps/03_component-decomposition.md b/.cursor/skills/plan/steps/03_component-decomposition.md
index daadd3c..c026e65 100644
--- a/.cursor/skills/plan/steps/03_component-decomposition.md
+++ b/.cursor/skills/plan/steps/03_component-decomposition.md
@@ -5,7 +5,7 @@
 **Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
 
 1. Identify components from the architecture; think about separation, reusability, and communication patterns
-2. Use integration test scenarios from Step 1 to validate component boundaries
+2. Use blackbox test scenarios from Step 1 to validate component boundaries
 3. If additional components are needed (data preparation, shared helpers), create them
 4. For each component, write a spec using `templates/component-spec.md` as structure
 5. Generate diagrams:
@@ -19,7 +19,7 @@
 - [ ] All inter-component interfaces are defined (who calls whom, with what)
 - [ ] Component dependency graph has no circular dependencies
 - [ ] All components from architecture.md are accounted for
-- [ ] Every integration test scenario can be traced through component interactions
+- [ ] Every blackbox test scenario can be traced through component interactions
 
 **Save action**: Write:
  - each component `components/[##]_[name]/description.md`
diff --git a/.cursor/skills/plan/steps/06_jira-epics.md b/.cursor/skills/plan/steps/06_jira-epics.md
index 3195684..b9a1ecd 100644
--- a/.cursor/skills/plan/steps/06_jira-epics.md
+++ b/.cursor/skills/plan/steps/06_jira-epics.md
@@ -35,7 +35,7 @@ Do NOT create minimal epics with just a summary and short description. The Jira
 
 **Self-verification**:
 - [ ] "Bootstrap & Initial Structure" epic exists and is first in order
-- [ ] "Integration Tests" epic exists
+- [ ] "Blackbox Tests" epic exists
 - [ ] Every component maps to exactly one epic
 - [ ] Dependency order is respected (no epic depends on a later one)
 - [ ] Acceptance criteria are measurable
@@ -43,6 +43,6 @@ Do NOT create minimal epics with just a summary and short description. The Jira
 - [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
 - [ ] Epic descriptions are self-contained — readable without opening other files
 
-7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
+7. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`.
 
 **Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
diff --git a/.cursor/skills/plan/steps/07_quality-checklist.md b/.cursor/skills/plan/steps/07_quality-checklist.md
index 0eff978..f883e88 100644
--- a/.cursor/skills/plan/steps/07_quality-checklist.md
+++ b/.cursor/skills/plan/steps/07_quality-checklist.md
@@ -2,8 +2,8 @@
 
 Before writing the final report, verify ALL of the following:
 
-### Integration Tests
-- [ ] Every acceptance criterion is covered in traceability_matrix.md
+### Blackbox Tests
+- [ ] Every acceptance criterion is covered in traceability-matrix.md
 - [ ] Every restriction is verified by at least one test
 - [ ] Positive and negative scenarios are balanced
 - [ ] Docker environment is self-contained
@@ -14,7 +14,7 @@ Before writing the final report, verify ALL of the following:
 - [ ] Covers all capabilities from solution.md
 - [ ] Technology choices are justified
 - [ ] Deployment model is defined
-- [ ] Integration test findings are reflected in architecture decisions
+- [ ] Blackbox test findings are reflected in architecture decisions
 
 ### Data Model
 - [ ] Every entity from architecture.md is defined
@@ -35,7 +35,7 @@ Before writing the final report, verify ALL of the following:
 - [ ] No circular dependencies
 - [ ] All inter-component interfaces are defined and consistent
 - [ ] No orphan components (unused by any flow)
-- [ ] Every integration test scenario can be traced through component interactions
+- [ ] Every blackbox test scenario can be traced through component interactions
 
 ### Risks
 - [ ] All High/Critical risks have mitigations
@@ -49,7 +49,7 @@ Before writing the final report, verify ALL of the following:
 
 ### Epics
 - [ ] "Bootstrap & Initial Structure" epic exists
-- [ ] "Integration Tests" epic exists
+- [ ] "Blackbox Tests" epic exists
 - [ ] Every component maps to an epic
 - [ ] Dependency order is correct
 - [ ] Acceptance criteria are measurable
diff --git a/.cursor/skills/plan/templates/integration-functional-tests.md b/.cursor/skills/plan/templates/blackbox-tests.md
similarity index 83%
rename from .cursor/skills/plan/templates/integration-functional-tests.md
rename to .cursor/skills/plan/templates/blackbox-tests.md
index e57f7d4..d522698 100644
--- a/.cursor/skills/plan/templates/integration-functional-tests.md
+++ b/.cursor/skills/plan/templates/blackbox-tests.md
@@ -1,24 +1,24 @@
-# E2E Functional Tests Template
+# Blackbox Tests Template
 
-Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`.
+Save as `DOCUMENT_DIR/tests/blackbox-tests.md`.
 
 ---
 
 ```markdown
-# E2E Functional Tests
+# Blackbox Tests
 
 ## Positive Scenarios
 
 ### FT-P-01: [Scenario Name]
 
-**Summary**: [One sentence: what end-to-end use case this validates]
+**Summary**: [One sentence: what black-box use case this validates]
 **Traces to**: AC-[ID], AC-[ID]
 **Category**: [which AC category — e.g., Position Accuracy, Image Processing, etc.]
 
 **Preconditions**:
 - [System state required before test]
 
-**Input data**: [reference to specific data set or file from test_data.md]
+**Input data**: [reference to specific data set or file from test-data.md]
 
 **Steps**:
 
@@ -71,8 +71,8 @@ Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`.
 
 ## Guidance Notes
 
-- Functional tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
+- Blackbox tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
 - Positive scenarios validate the system does what it should.
 - Negative scenarios validate the system rejects or handles gracefully what it shouldn't accept.
 - Expected outcomes must be specific and measurable — not "works correctly" but "returns position within 50m of ground truth."
-- Input data references should point to specific entries in test_data.md.
+- Input data references should point to specific entries in test-data.md.
diff --git a/.cursor/skills/plan/templates/epic-spec.md b/.cursor/skills/plan/templates/epic-spec.md
index 872e99e..3157a84 100644
--- a/.cursor/skills/plan/templates/epic-spec.md
+++ b/.cursor/skills/plan/templates/epic-spec.md
@@ -80,7 +80,7 @@ Link to architecture.md and relevant component spec.]
 ### Definition of Done
 
 - [ ] All in-scope capabilities implemented
-- [ ] Automated tests pass (unit + integration + e2e)
+- [ ] Automated tests pass (unit + blackbox)
 - [ ] Minimum coverage threshold met (75%)
 - [ ] Runbooks written (if applicable)
 - [ ] Documentation updated
diff --git a/.cursor/skills/plan/templates/integration-non-functional-tests.md b/.cursor/skills/plan/templates/integration-non-functional-tests.md
deleted file mode 100644
index 6bf4c54..0000000
--- a/.cursor/skills/plan/templates/integration-non-functional-tests.md
+++ /dev/null
@@ -1,97 +0,0 @@
-# E2E Non-Functional Tests Template
-
-Save as `DOCUMENT_DIR/integration_tests/non_functional_tests.md`.
-
----
-
-```markdown
-# E2E Non-Functional Tests
-
-## Performance Tests
-
-### NFT-PERF-01: [Test Name]
-
-**Summary**: [What performance characteristic this validates]
-**Traces to**: AC-[ID]
-**Metric**: [what is measured — latency, throughput, frame rate, etc.]
-
-**Preconditions**:
-- [System state, load profile, data volume]
-
-**Steps**:
-
-| Step | Consumer Action | Measurement |
-|------|----------------|-------------|
-| 1 | [action] | [what to measure and how] |
-
-**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
-**Duration**: [how long the test runs]
-
----
-
-## Resilience Tests
-
-### NFT-RES-01: [Test Name]
-
-**Summary**: [What failure/recovery scenario this validates]
-**Traces to**: AC-[ID]
-
-**Preconditions**:
-- [System state before fault injection]
-
-**Fault injection**:
-- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
-
-**Steps**:
-
-| Step | Action | Expected Behavior |
-|------|--------|------------------|
-| 1 | [inject fault] | [system behavior during fault] |
-| 2 | [observe recovery] | [system behavior after recovery] |
-
-**Pass criteria**: [recovery time, data integrity, continued operation]
-
----
-
-## Security Tests
-
-### NFT-SEC-01: [Test Name]
-
-**Summary**: [What security property this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Steps**:
-
-| Step | Consumer Action | Expected Response |
-|------|----------------|------------------|
-| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
-
-**Pass criteria**: [specific security outcome]
-
----
-
-## Resource Limit Tests
-
-### NFT-RES-LIM-01: [Test Name]
-
-**Summary**: [What resource constraint this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Preconditions**:
-- [System running under specified constraints]
-
-**Monitoring**:
-- [What resources to monitor — memory, CPU, GPU, disk, temperature]
-
-**Duration**: [how long to run]
-**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
-```
-
----
-
-## Guidance Notes
-
-- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
-- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
-- Security tests at E2E level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
-- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
diff --git a/.cursor/skills/plan/templates/performance-tests.md b/.cursor/skills/plan/templates/performance-tests.md
new file mode 100644
index 0000000..dfbcd14
--- /dev/null
+++ b/.cursor/skills/plan/templates/performance-tests.md
@@ -0,0 +1,35 @@
+# Performance Tests Template
+
+Save as `DOCUMENT_DIR/tests/performance-tests.md`.
+
+---
+
+```markdown
+# Performance Tests
+
+### NFT-PERF-01: [Test Name]
+
+**Summary**: [What performance characteristic this validates]
+**Traces to**: AC-[ID]
+**Metric**: [what is measured — latency, throughput, frame rate, etc.]
+
+**Preconditions**:
+- [System state, load profile, data volume]
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | [action] | [what to measure and how] |
+
+**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
+**Duration**: [how long the test runs]
+```
+
+---
+
+## Guidance Notes
+
+- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
+- Define clear pass/fail thresholds with specific metrics (p50, p95, p99 latency, throughput, etc.).
+- Include warm-up preconditions to separate initialization cost from steady-state performance.
diff --git a/.cursor/skills/plan/templates/resilience-tests.md b/.cursor/skills/plan/templates/resilience-tests.md
new file mode 100644
index 0000000..72890ae
--- /dev/null
+++ b/.cursor/skills/plan/templates/resilience-tests.md
@@ -0,0 +1,37 @@
+# Resilience Tests Template
+
+Save as `DOCUMENT_DIR/tests/resilience-tests.md`.
+
+---
+
+```markdown
+# Resilience Tests
+
+### NFT-RES-01: [Test Name]
+
+**Summary**: [What failure/recovery scenario this validates]
+**Traces to**: AC-[ID]
+
+**Preconditions**:
+- [System state before fault injection]
+
+**Fault injection**:
+- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | [inject fault] | [system behavior during fault] |
+| 2 | [observe recovery] | [system behavior after recovery] |
+
+**Pass criteria**: [recovery time, data integrity, continued operation]
+```
+
+---
+
+## Guidance Notes
+
+- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
+- Include specific recovery time expectations and data integrity checks.
+- Test both graceful degradation (partial failure) and full recovery scenarios.
diff --git a/.cursor/skills/plan/templates/resource-limit-tests.md b/.cursor/skills/plan/templates/resource-limit-tests.md
new file mode 100644
index 0000000..53779e3
--- /dev/null
+++ b/.cursor/skills/plan/templates/resource-limit-tests.md
@@ -0,0 +1,31 @@
+# Resource Limit Tests Template
+
+Save as `DOCUMENT_DIR/tests/resource-limit-tests.md`.
+
+---
+
+```markdown
+# Resource Limit Tests
+
+### NFT-RES-LIM-01: [Test Name]
+
+**Summary**: [What resource constraint this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Preconditions**:
+- [System running under specified constraints]
+
+**Monitoring**:
+- [What resources to monitor — memory, CPU, GPU, disk, temperature]
+
+**Duration**: [how long to run]
+**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
+```
+
+---
+
+## Guidance Notes
+
+- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
+- Define specific numeric limits that can be programmatically checked.
+- Include both the monitoring method and the threshold in the pass criteria.
diff --git a/.cursor/skills/plan/templates/security-tests.md b/.cursor/skills/plan/templates/security-tests.md
new file mode 100644
index 0000000..b243404
--- /dev/null
+++ b/.cursor/skills/plan/templates/security-tests.md
@@ -0,0 +1,30 @@
+# Security Tests Template
+
+Save as `DOCUMENT_DIR/tests/security-tests.md`.
+
+---
+
+```markdown
+# Security Tests
+
+### NFT-SEC-01: [Test Name]
+
+**Summary**: [What security property this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
+
+**Pass criteria**: [specific security outcome]
+```
+
+---
+
+## Guidance Notes
+
+- Security tests at blackbox level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
+- Verify the system remains operational after security-related edge cases (no crash, no hang).
+- Test authentication/authorization boundaries from the consumer's perspective.
diff --git a/.cursor/skills/plan/templates/integration-test-data.md b/.cursor/skills/plan/templates/test-data.md
similarity index 62%
rename from .cursor/skills/plan/templates/integration-test-data.md
rename to .cursor/skills/plan/templates/test-data.md
index 1ee4afe..0cee7fa 100644
--- a/.cursor/skills/plan/templates/integration-test-data.md
+++ b/.cursor/skills/plan/templates/test-data.md
@@ -1,11 +1,11 @@
-# E2E Test Data Template
+# Test Data Template
 
-Save as `DOCUMENT_DIR/integration_tests/test_data.md`.
+Save as `DOCUMENT_DIR/tests/test-data.md`.
 
 ---
 
 ```markdown
-# E2E Test Data Management
+# Test Data Management
 
 ## Seed Data Sets
 
@@ -23,6 +23,12 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`.
 |-----------------|----------------|-------------|-----------------|
 | [filename] | `_docs/00_problem/input_data/[filename]` | [what it contains] | [test IDs that use this data] |
 
+## Expected Results Mapping
+
+| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
+|-----------------|------------|-----------------|-------------------|-----------|----------------------|
+| [test ID] | `input_data/[filename]` | [quantifiable expected output] | [exact / tolerance / pattern / threshold / file-diff] | [± value or N/A] | `input_data/expected_results/[filename]` or inline |
+
 ## External Dependency Mocks
 
 | External Service | Mock/Stub | How Provided | Behavior |
@@ -42,5 +48,8 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`.
 
 - Every seed data set should be traceable to specific test scenarios.
 - Input data from `_docs/00_problem/input_data/` should be mapped to test scenarios that use it.
+- Every input data item MUST have a corresponding expected result in the Expected Results Mapping table.
+- Expected results MUST be quantifiable: exact values, numeric tolerances, pattern matches, thresholds, or reference files. "Works correctly" is never acceptable.
+- For complex expected outputs, provide machine-readable reference files (JSON, CSV) in `_docs/00_problem/input_data/expected_results/` and reference them in the mapping.
 - External mocks must be deterministic — same input always produces same output.
 - Data isolation must guarantee no test can affect another test's outcome.
diff --git a/.cursor/skills/plan/templates/integration-environment.md b/.cursor/skills/plan/templates/test-environment.md
similarity index 92%
rename from .cursor/skills/plan/templates/integration-environment.md
rename to .cursor/skills/plan/templates/test-environment.md
index 9382dfa..b5d74fa 100644
--- a/.cursor/skills/plan/templates/integration-environment.md
+++ b/.cursor/skills/plan/templates/test-environment.md
@@ -1,16 +1,16 @@
-# E2E Test Environment Template
+# Test Environment Template
 
-Save as `DOCUMENT_DIR/integration_tests/environment.md`.
+Save as `DOCUMENT_DIR/tests/environment.md`.
 
 ---
 
 ```markdown
-# E2E Test Environment
+# Test Environment
 
 ## Overview
 
 **System under test**: [main system name and entry points — API URLs, message queues, serial ports, etc.]
-**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals.
+**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating black-box use cases without access to internals.
 
 ## Docker Environment
 
diff --git a/.cursor/skills/plan/templates/test-spec.md b/.cursor/skills/plan/templates/test-spec.md
index 2b6ee44..5b7b83e 100644
--- a/.cursor/skills/plan/templates/test-spec.md
+++ b/.cursor/skills/plan/templates/test-spec.md
@@ -17,7 +17,7 @@ Use this template for each component's test spec. Save as `components/[##]_[name
 
 ---
 
-## Integration Tests
+## Blackbox Tests
 
 ### IT-01: [Test Name]
 
@@ -169,4 +169,4 @@ Use this template for each component's test spec. Save as `components/[##]_[name
 - If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2").
 - Performance test targets should come from the NFR section in `architecture.md`.
 - Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component.
-- Not every component needs all 4 test types. A stateless utility component may only need integration tests.
+- Not every component needs all 4 test types. A stateless utility component may only need blackbox tests.
diff --git a/.cursor/skills/plan/templates/integration-traceability-matrix.md b/.cursor/skills/plan/templates/traceability-matrix.md
similarity index 82%
rename from .cursor/skills/plan/templates/integration-traceability-matrix.md
rename to .cursor/skills/plan/templates/traceability-matrix.md
index 0d63d81..e0192ac 100644
--- a/.cursor/skills/plan/templates/integration-traceability-matrix.md
+++ b/.cursor/skills/plan/templates/traceability-matrix.md
@@ -1,11 +1,11 @@
-# E2E Traceability Matrix Template
+# Traceability Matrix Template
 
-Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.
+Save as `DOCUMENT_DIR/tests/traceability-matrix.md`.
 
 ---
 
 ```markdown
-# E2E Traceability Matrix
+# Traceability Matrix
 
 ## Acceptance Criteria Coverage
 
@@ -34,7 +34,7 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.
 
 | Item | Reason Not Covered | Risk | Mitigation |
 |------|-------------------|------|-----------|
-| [AC/Restriction ID] | [why it cannot be tested at E2E level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
+| [AC/Restriction ID] | [why it cannot be tested at blackbox level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
 ```
 
 ---
@@ -44,4 +44,4 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.
 - Every acceptance criterion must appear in the matrix — either covered or explicitly marked as not covered with a reason.
 - Every restriction must appear in the matrix.
 - NOT COVERED items must have a reason and a mitigation strategy (e.g., "covered at component test level" or "requires real hardware").
-- Coverage percentage should be at least 75% for acceptance criteria at the E2E level.
+- Coverage percentage should be at least 75% for acceptance criteria at the blackbox test level.
diff --git a/.cursor/skills/refactor/SKILL.md b/.cursor/skills/refactor/SKILL.md
index e2124ff..1099328 100644
--- a/.cursor/skills/refactor/SKILL.md
+++ b/.cursor/skills/refactor/SKILL.md
@@ -155,7 +155,7 @@ Store in PROBLEM_DIR.
 
 | Metric Category | What to Capture |
 |----------------|-----------------|
-| **Coverage** | Overall, unit, integration, critical paths |
+| **Coverage** | Overall, unit, blackbox, critical paths |
 | **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
 | **Code Smells** | Total, critical, major |
 | **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
@@ -279,11 +279,11 @@ Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
 Coverage requirements (must meet before refactoring):
 - Minimum overall coverage: 75%
 - Critical path coverage: 90%
-- All public APIs must have integration tests
+- All public APIs must have blackbox tests
 - All error handling paths must be tested
 
 For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
-- Integration tests: summary, current behavior, input data, expected result, max expected time
+- Blackbox tests: summary, current behavior, input data, expected result, max expected time
 - Acceptance tests: summary, preconditions, steps with expected results
 - Coverage analysis: current %, target %, uncovered critical paths
 
@@ -297,7 +297,7 @@ For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_
 **Self-verification**:
 - [ ] Coverage requirements met (75% overall, 90% critical paths)
 - [ ] All tests pass on current codebase
-- [ ] All public APIs have integration tests
+- [ ] All public APIs have blackbox tests
 - [ ] Test data fixtures are configured
 
 **Save action**: Write test specs; implemented tests go into the project's test folder
@@ -332,7 +332,7 @@ Write `REFACTOR_DIR/coupling_analysis.md`:
 For each change in the decoupling strategy:
 
 1. Implement the change
-2. Run integration tests
+2. Run blackbox tests
 3. Fix any failures
 4. Commit with descriptive message
 
diff --git a/.cursor/skills/test-spec/SKILL.md b/.cursor/skills/test-spec/SKILL.md
new file mode 100644
index 0000000..3c0892f
--- /dev/null
+++ b/.cursor/skills/test-spec/SKILL.md
@@ -0,0 +1,411 @@
+---
+name: test-spec
+description: |
+  Test specification skill. Analyzes input data and expected results completeness,
+  then produces detailed test scenarios (blackbox, performance, resilience, security, resource limits)
+  that treat the system as a black box. Every test pairs input data with quantifiable expected results
+  so tests can verify correctness, not just execution.
+  3-phase workflow: input data + expected results analysis, test scenario specification, data + results validation gate.
+  Produces 8 artifacts under tests/.
+  Trigger phrases:
+  - "test spec", "test specification", "test scenarios"
+  - "blackbox test spec", "black box tests", "blackbox tests"
+  - "performance tests", "resilience tests", "security tests"
+category: build
+tags: [testing, black-box, blackbox-tests, test-specification, qa]
+disable-model-invocation: true
+---
+
+# Test Scenario Specification
+
+Analyze input data completeness and produce detailed black-box test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
+
+## Core Principles
+
+- **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
+- **Traceability**: every test traces to at least one acceptance criterion or restriction
+- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
+- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
+- **Spec, don't code**: this workflow produces test specifications, never test implementation code
+- **No test without data**: every test scenario MUST have concrete test data; tests without data are removed
+- **No test without expected result**: every test scenario MUST pair input data with a quantifiable expected result; a test that cannot compare actual output against a known-correct answer is not verifiable and must be removed
+
+## Context Resolution
+
+Fixed paths — no mode detection needed:
+
+- PROBLEM_DIR: `_docs/00_problem/`
+- SOLUTION_DIR: `_docs/01_solution/`
+- DOCUMENT_DIR: `_docs/02_document/`
+- TESTS_OUTPUT_DIR: `_docs/02_document/tests/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+| File | Purpose |
+|------|---------|
+| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations |
+| `_docs/00_problem/input_data/` | Reference data examples, expected results, and optional reference files |
+| `_docs/01_solution/solution.md` | Finalized solution |
+
+### Expected Results Specification
+
+Every input data item MUST have a corresponding expected result that defines what the system should produce. Expected results MUST be **quantifiable** — the test must be able to programmatically compare actual system output against the expected result and produce a pass/fail verdict.
+
+Expected results live inside `_docs/00_problem/input_data/` in one or both of:
+
+1. **Mapping file** (`input_data/expected_results.md`): a table pairing each input with its quantifiable expected output, using the format defined in `.cursor/skills/test-spec/templates/expected-results.md`
+
+2. **Reference files folder** (`input_data/expected_results/`): machine-readable files (JSON, CSV, etc.) containing full expected outputs for complex cases, referenced from the mapping file
+
+```
+input_data/
+├── expected_results.md          ← required: input→expected result mapping
+├── expected_results/            ← optional: complex reference files
+│   ├── image_01_detections.json
+│   └── batch_A_results.json
+├── image_01.jpg
+├── empty_scene.jpg
+└── data_parameters.md
+```
+
+**Quantifiability requirements** (see template for full format and examples):
+- Numeric values: exact value or value ± tolerance (e.g., `confidence ≥ 0.85`, `position ± 10px`)
+- Structured data: exact JSON/CSV values, or a reference file in `expected_results/`
+- Counts: exact counts (e.g., "3 detections", "0 errors")
+- Text/patterns: exact string or regex pattern to match
+- Timing: threshold (e.g., "response ≤ 500ms")
+- Error cases: expected error code, message pattern, or HTTP status
+
+### Optional Files (used when available)
+
+| File | Purpose |
+|------|---------|
+| `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
+| `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
+| `DOCUMENT_DIR/components/` | Component specs for interface identification |
+
+### Prerequisite Checks (BLOCKING)
+
+1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
+2. `restrictions.md` exists and is non-empty — **STOP if missing**
+3. `input_data/` exists and contains at least one file — **STOP if missing**
+4. `input_data/expected_results.md` exists and is non-empty — **STOP if missing**. Prompt the user: *"Expected results mapping is required. Please create `_docs/00_problem/input_data/expected_results.md` pairing each input with its quantifiable expected output. Use `.cursor/skills/test-spec/templates/expected-results.md` as the format reference."*
+5. `problem.md` exists and is non-empty — **STOP if missing**
+6. `solution.md` exists and is non-empty — **STOP if missing**
+7. Create TESTS_OUTPUT_DIR if it does not exist
+8. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**
+
+## Artifact Management
+
+### Directory Structure
+
+```
+TESTS_OUTPUT_DIR/
+├── environment.md
+├── test-data.md
+├── blackbox-tests.md
+├── performance-tests.md
+├── resilience-tests.md
+├── security-tests.md
+├── resource-limit-tests.md
+└── traceability-matrix.md
+```
+
+### Save Timing
+
+| Phase | Save immediately after | Filename |
+|-------|------------------------|----------|
+| Phase 1 | Input data analysis (no file — findings feed Phase 2) | — |
+| Phase 2 | Environment spec | `environment.md` |
+| Phase 2 | Test data spec | `test-data.md` |
+| Phase 2 | Blackbox tests | `blackbox-tests.md` |
+| Phase 2 | Performance tests | `performance-tests.md` |
+| Phase 2 | Resilience tests | `resilience-tests.md` |
+| Phase 2 | Security tests | `security-tests.md` |
+| Phase 2 | Resource limit tests | `resource-limit-tests.md` |
+| Phase 2 | Traceability matrix | `traceability-matrix.md` |
+| Phase 3 | Updated test data spec (if data added) | `test-data.md` |
+| Phase 3 | Updated test files (if tests removed) | respective test file |
+| Phase 3 | Updated traceability matrix (if tests removed) | `traceability-matrix.md` |
+
+### Resumability
+
+If TESTS_OUTPUT_DIR already contains files:
+
+1. List existing files and match them to the save timing table above
+2. Identify which phase/artifacts are complete
+3. Resume from the next incomplete artifact
+4. Inform the user which artifacts are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all three phases. Update status as each phase completes.
+
+## Workflow
+
+### Phase 1: Input Data Completeness Analysis
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
+**Constraints**: Analysis only — no test specs yet
+
+1. Read `_docs/01_solution/solution.md`
+2. Read `acceptance_criteria.md`, `restrictions.md`
+3. Read testing strategy from solution.md (if present)
+4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
+5. Read `input_data/expected_results.md` and any referenced files in `input_data/expected_results/`
+6. Analyze `input_data/` contents against:
+   - Coverage of acceptance criteria scenarios
+   - Coverage of restriction edge cases
+   - Coverage of testing strategy requirements
+7. Analyze `input_data/expected_results.md` completeness:
+   - Every input data item has a corresponding expected result row in the mapping
+   - Expected results are quantifiable (contain numeric thresholds, exact values, patterns, or file references — not vague descriptions like "works correctly" or "returns result")
+   - Expected results specify a comparison method (exact match, tolerance range, pattern match, threshold) per the template
+   - Reference files in `input_data/expected_results/` that are cited in the mapping actually exist and are valid
+8. Present input-to-expected-result pairing assessment:
+
+| Input Data | Expected Result Provided? | Quantifiable? | Issue (if any) |
+|------------|--------------------------|---------------|----------------|
+| [file/data] | Yes/No | Yes/No | [missing, vague, no tolerance, etc.] |
+
+9. Threshold: at least 70% coverage of scenarios AND every covered scenario has a quantifiable expected result
+10. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/` and update `input_data/expected_results.md`
+11. If expected results are missing or not quantifiable, ask user to provide them before proceeding
+
+**BLOCKING**: Do NOT proceed until user confirms both input data coverage AND expected results completeness are sufficient.
+
+---
+
+### Phase 2: Test Scenario Specification
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Produce detailed black-box test specifications covering blackbox, performance, resilience, security, and resource limit scenarios
+**Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
+
+Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
+
+1. Define test environment using `.cursor/skills/plan/templates/test-environment.md` as structure
+2. Define test data management using `.cursor/skills/plan/templates/test-data.md` as structure
+3. Write blackbox test scenarios (positive + negative) using `.cursor/skills/plan/templates/blackbox-tests.md` as structure
+4. Write performance test scenarios using `.cursor/skills/plan/templates/performance-tests.md` as structure
+5. Write resilience test scenarios using `.cursor/skills/plan/templates/resilience-tests.md` as structure
+6. Write security test scenarios using `.cursor/skills/plan/templates/security-tests.md` as structure
+7. Write resource limit test scenarios using `.cursor/skills/plan/templates/resource-limit-tests.md` as structure
+8. Build traceability matrix using `.cursor/skills/plan/templates/traceability-matrix.md` as structure
+
+**Self-verification**:
+- [ ] Every acceptance criterion is covered by at least one test scenario
+- [ ] Every restriction is verified by at least one test scenario
+- [ ] Every test scenario has a quantifiable expected result from `input_data/expected_results.md`
+- [ ] Expected results use comparison methods from `.cursor/skills/test-spec/templates/expected-results.md`
+- [ ] Positive and negative scenarios are balanced
+- [ ] Consumer app has no direct access to system internals
+- [ ] Docker environment is self-contained (`docker compose up` sufficient)
+- [ ] External dependencies have mock/stub services defined
+- [ ] Traceability matrix has no uncovered AC or restrictions
+
+**Save action**: Write all files under TESTS_OUTPUT_DIR:
+- `environment.md`
+- `test-data.md`
+- `blackbox-tests.md`
+- `performance-tests.md`
+- `resilience-tests.md`
+- `security-tests.md`
+- `resource-limit-tests.md`
+- `traceability-matrix.md`
+
+**BLOCKING**: Present test coverage summary (from traceability-matrix.md) to user. Do NOT proceed until confirmed.
+
+Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
+
+---
+
+### Phase 3: Test Data Validation Gate (HARD GATE)
+
+**Role**: Professional Quality Assurance Engineer
+**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 70%.
+**Constraints**: This phase is MANDATORY and cannot be skipped.
+
+#### Step 1 — Build the test-data and expected-result requirements checklist
+
+Scan `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, and `resource-limit-tests.md`. For every test scenario, extract:
+
+| # | Test Scenario ID | Test Name | Required Input Data | Required Expected Result | Result Quantifiable? | Comparison Method | Input Provided? | Expected Result Provided? |
+|---|-----------------|-----------|---------------------|-------------------------|---------------------|-------------------|----------------|--------------------------|
+| 1 | [ID] | [name] | [data description] | [what system should output] | [Yes/No] | [exact/tolerance/pattern/threshold] | [Yes/No] | [Yes/No] |
+
+Present this table to the user.
+
+#### Step 2 — Ask user to provide missing test data AND expected results
+
+For each row where **Input Provided?** is **No** OR **Expected Result Provided?** is **No**, ask the user:
+
+> **Option A — Provide the missing items**: Supply what is missing:
+> - **Missing input data**: Place test data files in `_docs/00_problem/input_data/` or indicate the location.
+> - **Missing expected result**: Provide the quantifiable expected result for this input. Update `_docs/00_problem/input_data/expected_results.md` with a row mapping the input to its expected output. If the expected result is complex, provide a reference file in `_docs/00_problem/input_data/expected_results/`. Use `.cursor/skills/test-spec/templates/expected-results.md` for format guidance.
+>
+> Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:
+> - "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
+> - "HTTP 200 with JSON body matching `expected_response_01.json`"
+> - "Processing time < 500ms"
+> - "0 false positives in the output set"
+>
+> **Option B — Skip this test**: If you cannot provide the data or expected result, this test scenario will be **removed** from the specification.
+
+**BLOCKING**: Wait for the user's response for every missing item.
+
+#### Step 3 — Validate provided data and expected results
+
+For each item where the user chose **Option A**:
+
+**Input data validation**:
+1. Verify the data file(s) exist at the indicated location
+2. Verify **quality**: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
+3. Verify **quantity**: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)
+
+**Expected result validation**:
+4. Verify the expected result exists in `input_data/expected_results.md` or as a referenced file in `input_data/expected_results/`
+5. Verify **quantifiability**: the expected result can be evaluated programmatically — it must contain at least one of:
+   - Exact values (counts, strings, status codes)
+   - Numeric values with tolerance (e.g., `± 10px`, `≥ 0.85`)
+   - Pattern matches (regex, substring, JSON schema)
+   - Thresholds (e.g., `< 500ms`, `≤ 5% error rate`)
+   - Reference file for structural comparison (JSON diff, CSV diff)
+6. Verify **completeness**: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
+7. Verify **consistency**: the expected result is consistent with the acceptance criteria it traces to
+
+If any validation fails, report the specific issue and loop back to Step 2 for that item.
+
+#### Step 4 — Remove tests without data or expected results
+
+For each item where the user chose **Option B**:
+
+1. Warn the user: `⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.`
+2. Remove the test scenario from the respective test file
+3. Remove corresponding rows from `traceability-matrix.md`
+4. Update `test-data.md` to reflect the removal
+
+**Save action**: Write updated files under TESTS_OUTPUT_DIR:
+- `test-data.md`
+- Affected test files (if tests removed)
+- `traceability-matrix.md` (if tests removed)
+
+#### Step 5 — Final coverage check
+
+After all removals, recalculate coverage:
+
+1. Count remaining test scenarios that trace to acceptance criteria
+2. Count total acceptance criteria + restrictions
+3. Calculate coverage percentage: `covered_items / total_items * 100`
+
+| Metric | Value |
+|--------|-------|
+| Total AC + Restrictions | ? |
+| Covered by remaining tests | ? |
+| **Coverage %** | **?%** |
+
+**Decision**:
+
+- **Coverage ≥ 70%** → Phase 3 **PASSED**. Present final summary to user.
+- **Coverage < 70%** → Phase 3 **FAILED**. Report:
+  > ❌ Test coverage dropped to **X%** (minimum 70% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:
+  >
+  > | Uncovered Item | Type (AC/Restriction) | Missing Test Data Needed |
+  > |---|---|---|
+  >
+  > **Action required**: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.
+
+  **BLOCKING**: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 70%.
+
+#### Phase 3 Completion
+
+When coverage ≥ 70% and all remaining tests have validated data AND quantifiable expected results:
+
+1. Present the final coverage report
+2. List all removed tests (if any) with reasons
+3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
+4. Confirm all artifacts are saved and consistent
+
+---
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
+| Missing input_data/expected_results.md | **STOP** — ask user to provide expected results mapping using the template |
+| Ambiguous requirements | ASK user |
+| Input data coverage below 70% (Phase 1) | Search internet for supplementary data, ASK user to validate |
+| Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
+| Test scenario conflicts with restrictions | ASK user to clarify intent |
+| System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
+| Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
+| Final coverage below 70% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
+
+## Common Mistakes
+
+- **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
+- **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
+- **Missing expected results**: input data without a paired expected result is useless — the test cannot determine pass/fail without knowing what "correct" looks like
+- **Non-quantifiable expected results**: "should return good results" is not verifiable; expected results must have exact values, tolerances, thresholds, or pattern matches that code can evaluate
+- **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
+- **Untraceable tests**: every test should trace to at least one AC or restriction
+- **Writing test code**: this skill produces specifications, never implementation code
+- **Tests without data**: every test scenario MUST have concrete test data AND a quantifiable expected result; a test spec without either is not executable and must be removed
+
+## Trigger Conditions
+
+When the user wants to:
+- Specify blackbox tests before implementation or refactoring
+- Analyze input data completeness for test coverage
+- Produce test scenarios from acceptance criteria
+
+**Keywords**: "test spec", "test specification", "blackbox test spec", "black box tests", "blackbox tests", "test scenarios"
+
+## Methodology Quick Reference
+
+```
+┌──────────────────────────────────────────────────────────────────────┐
+│              Test Scenario Specification (3-Phase)                    │
+├──────────────────────────────────────────────────────────────────────┤
+│ PREREQ: Data Gate (BLOCKING)                                         │
+│   → verify AC, restrictions, input_data (incl. expected_results.md)  │
+│                                                                      │
+│ Phase 1: Input Data & Expected Results Completeness Analysis         │
+│   → assess input_data/ coverage vs AC scenarios (≥70%)               │
+│   → verify every input has a quantifiable expected result            │
+│   → present input→expected-result pairing assessment                 │
+│   [BLOCKING: user confirms input data + expected results coverage]   │
+│                                                                      │
+│ Phase 2: Test Scenario Specification                                 │
+│   → environment.md                                                   │
+│   → test-data.md (with expected results mapping)                     │
+│   → blackbox-tests.md (positive + negative)                          │
+│   → performance-tests.md                                             │
+│   → resilience-tests.md                                              │
+│   → security-tests.md                                                │
+│   → resource-limit-tests.md                                          │
+│   → traceability-matrix.md                                           │
+│   [BLOCKING: user confirms test coverage]                            │
+│                                                                      │
+│ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)    │
+│   → build test-data + expected-result requirements checklist         │
+│   → ask user: provide data+result (A) or remove test (B)            │
+│   → validate input data (quality + quantity)                         │
+│   → validate expected results (quantifiable + comparison method)     │
+│   → remove tests without data or expected result, warn user          │
+│   → final coverage check (≥70% or FAIL + loop back)                 │
+│   [BLOCKING: coverage ≥ 70% required to pass]                       │
+├──────────────────────────────────────────────────────────────────────┤
+│ Principles: Black-box only · Traceability · Save immediately         │
+│             Ask don't assume · Spec don't code                       │
+│             No test without data · No test without expected result    │
+└──────────────────────────────────────────────────────────────────────┘
+```
diff --git a/.cursor/skills/test-spec/templates/expected-results.md b/.cursor/skills/test-spec/templates/expected-results.md
new file mode 100644
index 0000000..0700733
--- /dev/null
+++ b/.cursor/skills/test-spec/templates/expected-results.md
@@ -0,0 +1,135 @@
+# Expected Results Template
+
+Save as `_docs/00_problem/input_data/expected_results.md`.
+For complex expected outputs, create `_docs/00_problem/input_data/expected_results/` and place reference files there.
+Referenced by the test-spec skill (`.cursor/skills/test-spec/SKILL.md`).
+
+---
+
+```markdown
+# Expected Results
+
+Maps every input data item to its quantifiable expected result.
+Tests use this mapping to compare actual system output against known-correct answers.
+
+## Result Format Legend
+
+| Result Type | When to Use | Example |
+|-------------|-------------|---------|
+| Exact value | Output must match precisely | `status_code: 200`, `detection_count: 3` |
+| Tolerance range | Numeric output with acceptable variance | `confidence: 0.92 ± 0.05`, `bbox_x: 120 ± 10px` |
+| Threshold | Output must exceed or stay below a limit | `latency < 500ms`, `confidence ≥ 0.85` |
+| Pattern match | Output must match a string/regex pattern | `error_message contains "invalid format"` |
+| File reference | Complex output compared against a reference file | `match expected_results/case_01.json` |
+| Schema match | Output structure must conform to a schema | `response matches DetectionResultSchema` |
+| Set/count | Output must contain specific items or counts | `classes ⊇ {"car", "person"}`, `detections.length == 5` |
+
+## Comparison Methods
+
+| Method | Description | Tolerance Syntax |
+|--------|-------------|-----------------|
+| `exact` | Actual == Expected | N/A |
+| `numeric_tolerance` | abs(actual - expected) ≤ tolerance | `± <value>` or `± <percent>%` |
+| `range` | min ≤ actual ≤ max | `[min, max]` |
+| `threshold_min` | actual ≥ threshold | `≥ <value>` |
+| `threshold_max` | actual ≤ threshold | `≤ <value>` |
+| `regex` | actual matches regex pattern | regex string |
+| `substring` | actual contains substring | substring |
+| `json_diff` | structural comparison against reference JSON | diff tolerance per field |
+| `set_contains` | actual output set contains expected items | subset notation |
+| `file_reference` | compare against reference file in expected_results/ | file path |
+
+## Input → Expected Result Mapping
+
+### [Scenario Group Name, e.g. "Single Image Detection"]
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `[file or parameters]` | [what this input represents] | [quantifiable expected output] | [method from table above] | [± value, range, or N/A] | [path in expected_results/ or N/A] |
+
+#### Example — Object Detection
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `image_01.jpg` | Aerial photo, 3 vehicles visible | `detection_count: 3`, classes: `["ArmorVehicle", "ArmorVehicle", "Truck"]` | exact (count), set_contains (classes) | N/A | N/A |
+| 2 | `image_01.jpg` | Same image, bbox positions | bboxes: `[(120,80,340,290), (400,150,580,310), (50,400,200,520)]` | numeric_tolerance | ± 15px per coordinate | `expected_results/image_01_detections.json` |
+| 3 | `image_01.jpg` | Same image, confidence scores | confidences: `[0.94, 0.88, 0.91]` | threshold_min | each ≥ 0.85 | N/A |
+| 4 | `empty_scene.jpg` | Aerial photo, no objects | `detection_count: 0`, empty detections array | exact | N/A | N/A |
+| 5 | `corrupted.dat` | Invalid file format | HTTP 400, body contains `"error"` key | exact (status), substring (body) | N/A | N/A |
+
+#### Example — Performance
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `standard_image.jpg` | 1920x1080 single image | Response time | threshold_max | ≤ 2000ms | N/A |
+| 2 | `large_image.jpg` | 8000x6000 tiled image | Response time | threshold_max | ≤ 10000ms | N/A |
+
+#### Example — Error Handling
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `POST /detect` with no file | Missing required input | HTTP 422, message matches `"file.*required"` | exact (status), regex (message) | N/A | N/A |
+| 2 | `POST /detect` with `probability_threshold: 5.0` | Out-of-range config | HTTP 422 or clamped to valid range | exact (status) or range [0.0, 1.0] | N/A | N/A |
+
+## Expected Result Reference Files
+
+When the expected output is too complex for an inline table cell (e.g., full JSON response with nested objects), place a reference file in `_docs/00_problem/input_data/expected_results/`.
+
+### File Naming Convention
+
+`<input_name>_expected.<format>`
+
+Examples:
+- `image_01_detections.json`
+- `batch_A_results.csv`
+- `video_01_annotations.json`
+
+### Reference File Requirements
+
+- Must be machine-readable (JSON, CSV, YAML — not prose)
+- Must contain only the expected output structure and values
+- Must include tolerance annotations where applicable (as metadata fields or comments)
+- Must be valid and parseable by standard libraries
+
+### Reference File Example (JSON)
+
+File: `expected_results/image_01_detections.json`
+
+​```json
+{
+  "input": "image_01.jpg",
+  "expected": {
+    "detection_count": 3,
+    "detections": [
+      {
+        "class": "ArmorVehicle",
+        "confidence": { "min": 0.85 },
+        "bbox": { "x1": 120, "y1": 80, "x2": 340, "y2": 290, "tolerance_px": 15 }
+      },
+      {
+        "class": "ArmorVehicle",
+        "confidence": { "min": 0.85 },
+        "bbox": { "x1": 400, "y1": 150, "x2": 580, "y2": 310, "tolerance_px": 15 }
+      },
+      {
+        "class": "Truck",
+        "confidence": { "min": 0.85 },
+        "bbox": { "x1": 50, "y1": 400, "x2": 200, "y2": 520, "tolerance_px": 15 }
+      }
+    ]
+  }
+}
+​```
+```
+
+---
+
+## Guidance Notes
+
+- Every row in the mapping table must have at least one quantifiable comparison — no row should say only "should work" or "returns result".
+- Use `exact` comparison for counts, status codes, and discrete values.
+- Use `numeric_tolerance` for floating-point values and spatial coordinates where minor variance is expected.
+- Use `threshold_min`/`threshold_max` for performance metrics and confidence scores.
+- Use `file_reference` when the expected output has more than ~3 fields or nested structures.
+- Reference files must be committed alongside input data — they are part of the test specification.
+- When the system has non-deterministic behavior (e.g., model inference variance across hardware), document the expected tolerance explicitly and justify it.
diff --git a/.cursor/skills/ui-design/SKILL.md b/.cursor/skills/ui-design/SKILL.md
new file mode 100644
index 0000000..afbd431
--- /dev/null
+++ b/.cursor/skills/ui-design/SKILL.md
@@ -0,0 +1,254 @@
+---
+name: ui-design
+description: |
+  End-to-end UI design workflow: requirements gathering → design system synthesis → HTML+CSS mockup generation → visual verification → iterative refinement.
+  Zero external dependencies. Optional MCP enhancements (RenderLens, AccessLint).
+  Two modes:
+  - Full workflow: phases 0-8 for complex design tasks
+  - Quick mode: skip to code generation for simple requests
+  Command entry points:
+  - /design-audit — quality checks on existing mockup
+  - /design-polish — final refinement pass
+  - /design-critique — UX review with feedback
+  - /design-regen — regenerate with different direction
+  Trigger phrases:
+  - "design a UI", "create a mockup", "build a page"
+  - "make a landing page", "design a dashboard"
+  - "mockup", "design system", "UI design"
+category: create
+tags: [ui-design, mockup, html, css, tailwind, design-system, accessibility]
+disable-model-invocation: true
+---
+
+# UI Design Skill
+
+End-to-end UI design workflow producing production-quality HTML+CSS mockups entirely within Cursor, with zero external tool dependencies.
+
+## Core Principles
+
+- **Design intent over defaults**: never settle for generic AI output; every visual choice must trace to user requirements
+- **Verify visually**: AI must see what it generates whenever possible (browser screenshots)
+- **Tokens over hardcoded values**: use CSS custom properties with semantic naming, not raw hex
+- **Restraint over decoration**: less is more; every visual element must earn its place
+- **Ask, don't assume**: when design direction is ambiguous, STOP and ask the user
+- **One screen at a time**: generate individual screens, not entire applications at once
+
+## Context Resolution
+
+Determine the operating mode based on invocation before any other logic runs.
+
+**Project mode** (default — `_docs/` structure exists):
+- MOCKUPS_DIR: `_docs/02_document/ui_mockups/`
+
+**Standalone mode** (explicit input file provided, e.g. `/ui-design @some_brief.md`):
+- INPUT_FILE: the provided file (treated as design brief)
+- MOCKUPS_DIR: `_standalone/ui_mockups/`
+
+Create MOCKUPS_DIR if it does not exist. Announce the detected mode and resolved path to the user.
+
+## Output Directory
+
+All generated artifacts go to `MOCKUPS_DIR`:
+
+```
+MOCKUPS_DIR/
+├── DESIGN.md              # Generated design system (three-layer tokens)
+├── index.html             # Main mockup (or named per page)
+└── [page-name].html       # Additional pages if multi-page
+```
+
+## Complexity Detection (Phase 0)
+
+Before starting the workflow, classify the request:
+
+**Quick mode** — skip to Phase 5 (Code Generation):
+- Request is a single component or screen
+- User provides enough style context in their message
+- `MOCKUPS_DIR/DESIGN.md` already exists
+- Signals: "just make a...", "quick mockup of...", single component name, less than 2 sentences
+
+**Full mode** — run phases 1-8:
+- Multi-page request
+- Brand-specific requirements
+- "design system for...", complex layouts, dashboard/admin panel
+- No existing DESIGN.md
+
+Announce the detected mode to the user.
+
+## Phase 1: Context Check
+
+1. Check for existing project documentation: PRD, design specs, README with design notes
+2. Check for existing `MOCKUPS_DIR/DESIGN.md`
+3. Check for existing mockups in `MOCKUPS_DIR/`
+4. If DESIGN.md exists → announce "Using existing design system" → skip to Phase 5
+5. If project docs with design info exist → extract requirements from them, skip to Phase 3
+
+## Phase 2: Requirements Gathering
+
+Use the AskQuestion tool for structured input. Adapt based on what Phase 1 found — only ask for what's missing.
+
+**Round 1 — Structural:**
+
+Ask using AskQuestion with these questions:
+- **Page type**: landing, dashboard, form, settings, profile, admin panel, e-commerce, blog, documentation, other
+- **Target audience**: developers, business users, consumers, internal team, general public
+- **Platform**: web desktop-first, web mobile-first
+- **Key sections**: header, hero, sidebar, main content, cards grid, data table, form, footer (allow multiple)
+
+**Round 2 — Design Intent:**
+
+Ask using AskQuestion with these questions:
+- **Visual atmosphere**: Airy & spacious / Dense & data-rich / Warm & approachable / Sharp & technical / Luxurious & premium
+- **Color mood**: Cool blues & grays / Warm earth tones / Bold & vibrant / Monochrome / Dark mode / Let AI choose based on atmosphere / Custom (specify brand colors)
+- **Typography mood**: Geometric (modern, clean) / Humanist (friendly, readable) / Monospace (technical, code-like) / Serif (editorial, premium)
+
+Then ask in free-form:
+- "Name an app or website whose look you admire" (optional, helps anchor style)
+- "Any specific content, copy, or data to include?"
+
+## Phase 3: Direction Exploration
+
+Generate 2-3 text-based direction summaries. Each direction is 3-5 sentences describing:
+- Visual approach and mood
+- Color palette direction (specific hues, not just "blue")
+- Layout strategy (grid type, density, whitespace approach)
+- Typography choice (specific font suggestions, not just "sans-serif")
+
+Present to user: "Here are 2-3 possible directions. Which resonates? Or describe a blend."
+
+Wait for user to pick before proceeding.
+
+## Phase 4: Design System Synthesis
+
+Generate `MOCKUPS_DIR/DESIGN.md` using the template from `templates/design-system.md`.
+
+The generated DESIGN.md must include all 6 sections:
+1. Visual Atmosphere — descriptive mood (never "clean and modern")
+2. Color System — three-layer CSS custom properties (primitives → semantic → component)
+3. Typography — specific font family, weight hierarchy, size scale with rem values
+4. Spacing & Layout — base unit, spacing scale, grid, breakpoints
+5. Component Styling Defaults — buttons, cards, inputs, navigation with all states
+6. Interaction States — loading, error, empty, hover, focus, disabled patterns
+
+Read `references/design-vocabulary.md` for atmosphere descriptors and style vocabulary to use when writing the DESIGN.md.
+
+## Phase 5: Code Generation
+
+Construct the generation by combining context from multiple sources:
+
+1. Read `MOCKUPS_DIR/DESIGN.md` for the design system
+2. Read `references/components.md` for component best practices relevant to the page type
+3. Read `references/anti-patterns.md` for explicit avoidance instructions
+
+Generate `MOCKUPS_DIR/[page-name].html` as a single file with:
+- `<script src="https://cdn.tailwindcss.com"></script>` for Tailwind
+- `<style>` block with all CSS custom properties from DESIGN.md
+- Tailwind config override in `<script>` to map tokens to Tailwind theme
+- Semantic HTML (nav, main, section, article, footer)
+- Mobile-first responsive design
+- All interactive elements with hover, focus, active states
+- At least one loading skeleton example
+- Proper heading hierarchy (single h1)
+
+**Anti-AI-Slop guard clauses** (MANDATORY — read `references/anti-patterns.md` for full list):
+- Do NOT use Inter or Roboto unless user explicitly requested them
+- Do NOT default to purple/indigo accent color
+- Do NOT create "card soup" — vary layout patterns
+- Do NOT make all buttons equal weight
+- Do NOT over-decorate
+- Use the actual tokens from DESIGN.md, not hardcoded values
+
+For quick mode without DESIGN.md: use a sensible default design system matching the request context. Still follow all anti-slop rules.
+
+## Phase 6: Visual Verification
+
+Tiered verification — use the best available tool:
+
+**Layer 1 — Structural Check** (always runs):
+Read `references/quality-checklist.md` and verify against the structural checklist.
+
+**Layer 2 — Visual Check** (when browser tool is available):
+1. Open the generated HTML file using the browser tool
+2. Take screenshots at desktop (1440px) width
+3. Examine the screenshot for: spacing consistency, alignment, color rendering, typography hierarchy, overall visual balance
+4. Compare against DESIGN.md's intended atmosphere
+5. Flag issues: cramped areas, orphan text, broken layouts, invisible elements
+
+**Layer 3 — Compliance Check** (when MCP tools are available):
+- If AccessLint MCP is configured: audit HTML for WCAG violations, auto-fix flagged issues
+- If RenderLens MCP is configured: render + audit (Lighthouse + WCAG scores) + diff
+
+Auto-fix any issues found. Re-verify after fixes.
+
+## Phase 7: User Review
+
+1. Open mockup in browser for the user:
+   - Primary: use Cursor browser tool (AI can see and discuss the same view)
+   - Fallback: use OS-appropriate command (`open` on macOS, `xdg-open` on Linux, `start` on Windows)
+2. Present assessment summary: structural check results, visual observations, compliance scores if available
+3. Ask: "How does this look? What would you like me to change?"
+
+## Phase 8: Iteration
+
+1. Parse user feedback into specific changes
+2. Apply targeted edits via StrReplace (not full regeneration unless user requests a fundamentally different direction)
+3. Re-run visual verification (Phase 6)
+4. Present changes to user
+5. Repeat until user approves
+
+## Command Entry Points
+
+These commands bypass the full workflow for targeted operations on existing mockups:
+
+### /design-audit
+Run quality checks on an existing mockup in `MOCKUPS_DIR/`.
+1. Read the HTML file
+2. Run structural checklist from `references/quality-checklist.md`
+3. If browser tool available: take screenshot and visual check
+4. If AccessLint MCP available: WCAG audit
+5. Report findings with severity levels
+
+### /design-polish
+Final refinement pass on an existing mockup.
+1. Read the HTML file and DESIGN.md
+2. Check token usage (no hardcoded values that should be tokens)
+3. Verify all interaction states are present
+4. Refine spacing consistency, typography hierarchy
+5. Apply micro-improvements (subtle shadows, transitions, hover states)
+
+### /design-critique
+UX review with specific feedback.
+1. Read the HTML file
+2. Evaluate: information hierarchy, call-to-action clarity, cognitive load, navigation flow
+3. Check against anti-patterns from `references/anti-patterns.md`
+4. Provide a structured critique with specific improvement suggestions
+
+### /design-regen
+Regenerate mockup with a different design direction.
+1. Keep the existing page structure and content
+2. Ask user what direction to change (atmosphere, colors, layout, typography)
+3. Update DESIGN.md tokens accordingly
+4. Regenerate the HTML with the new design system
+
+## Optional MCP Enhancements
+
+When configured, these MCP servers enhance the workflow:
+
+| MCP Server | Phase | What It Adds |
+|------------|-------|-------------|
+| RenderLens | 6 | HTML→screenshot, Lighthouse audit, pixel-level diff |
+| AccessLint | 6 | WCAG violation detection + auto-fix (99.5% fix rate) |
+| Playwright | 6 | Screenshot at multiple viewports, visual regression |
+
+The skill works fully without any MCP servers. MCPs are enhancements, not requirements.
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Unclear design direction | **ASK user** — present direction options |
+| Conflicting requirements (e.g., "minimal but feature-rich") | **ASK user** which to prioritize |
+| User asks for a framework-specific output (React, Vue) | **WARN**: this skill generates HTML+CSS mockups; suggest adapting after approval |
+| Generated mockup looks wrong in visual verification | Auto-fix if possible; **ASK user** if the issue is subjective |
+| User requests multi-page site | Generate one page at a time; maintain DESIGN.md consistency across pages |
+| Accessibility audit fails | Auto-fix violations; **WARN user** about remaining manual-check items |
diff --git a/.cursor/skills/ui-design/references/anti-patterns.md b/.cursor/skills/ui-design/references/anti-patterns.md
new file mode 100644
index 0000000..800fe8e
--- /dev/null
+++ b/.cursor/skills/ui-design/references/anti-patterns.md
@@ -0,0 +1,69 @@
+# Anti-Patterns — AI Slop Prevention
+
+Read this file before generating any HTML/CSS. These are explicit instructions for what NOT to do.
+
+## Typography Anti-Patterns
+
+- **Do NOT default to Inter or Roboto.** These are the #1 signal of AI-generated UI. Choose a font that matches the atmosphere from `design-vocabulary.md`. Only use Inter/Roboto if the user explicitly requests them.
+- **Do NOT use the same font weight everywhere.** Establish a clear weight hierarchy: 600-700 for headings, 400 for body, 500 for UI elements.
+- **Do NOT set body text smaller than 14px (0.875rem).** Prefer 16px (1rem) for body.
+- **Do NOT skip heading levels.** Go h1 → h2 → h3, never h1 → h3.
+- **Do NOT use placeholder-only form fields.** Labels above inputs are mandatory; placeholders are hints only.
+
+## Color Anti-Patterns
+
+- **Do NOT default to purple or indigo accent colors.** Purple/indigo is the second-biggest AI-slop signal. Use the accent color from DESIGN.md tokens.
+- **Do NOT use more than 1 strong accent color** in the same view. Secondary accents should be muted or derived from the primary.
+- **Do NOT use gray text on colored backgrounds** without checking contrast. WCAG AA requires 4.5:1 for normal text, 3:1 for large text.
+- **Do NOT use rainbow color coding** for categories. Limit to 5-6 carefully chosen, distinguishable colors.
+- **Do NOT apply background gradients to text** (gradient text is fragile and often unreadable).
+
+## Layout Anti-Patterns
+
+- **Do NOT create "card soup"** — rows of identical cards with no visual break. Vary layout patterns: full-width sections, split layouts, featured items, asymmetric grids.
+- **Do NOT center everything.** Left-align body text. Center only headings, short captions, and CTAs.
+- **Do NOT use fixed pixel widths** for layout. Use relative units (%, fr, auto, minmax).
+- **Do NOT nest excessive containers.** Avoid "div soup" — use semantic elements (nav, main, section, article, aside, footer).
+- **Do NOT ignore mobile.** Design mobile-first; every component must work at 375px width.
+
+## Component Anti-Patterns
+
+- **Do NOT make all buttons equal weight.** Establish clear hierarchy: one primary (filled), secondary (outline), ghost (text-only) per visible area.
+- **Do NOT use spinners for content with known layout.** Use skeleton loaders that match the shape of the content.
+- **Do NOT put a modal inside a modal.** If you need nested interaction, use a slide-over or expand the current modal.
+- **Do NOT disable buttons without explanation.** Every disabled button needs a title attribute or adjacent text explaining why.
+- **Do NOT use "Click here" as link text.** Links should describe the destination: "View documentation", "Download report".
+- **Do NOT show hamburger menus on desktop.** Hamburgers are for mobile only; use full navigation on desktop.
+- **Do NOT use equal-weight buttons in a pair.** One must be visually primary, the other secondary.
+
+## Interaction Anti-Patterns
+
+- **Do NOT skip hover states on interactive elements.** Every clickable element needs a visible hover change.
+- **Do NOT skip focus states.** Keyboard users need visible focus indicators on every interactive element.
+- **Do NOT omit loading states.** If data loads asynchronously, show a skeleton or progress indicator.
+- **Do NOT omit empty states.** When a list or section has no data, show an illustration + explanation + action CTA.
+- **Do NOT omit error states.** Form validation errors need inline messages below the field with an icon.
+- **Do NOT use bare alert() for messages.** Use toast notifications or inline banners.
+
+## Decoration Anti-Patterns
+
+- **Do NOT over-decorate.** Restraint over decoration. Every visual element must earn its place.
+- **Do NOT apply shadows AND borders AND background fills simultaneously** on the same element. Pick one or two.
+- **Do NOT use generic stock-photo placeholder images.** Use SVG illustrations, solid color blocks with icons, or real content.
+- **Do NOT use decorative backgrounds** that reduce text readability.
+- **Do NOT animate everything.** Use motion sparingly and purposefully: transitions for state changes (200-300ms), not decorative animation.
+
+## Spacing Anti-Patterns
+
+- **Do NOT use inconsistent spacing.** Stick to the spacing scale from DESIGN.md (multiples of 4px or 8px base unit).
+- **Do NOT use zero padding inside containers.** Minimum 12-16px padding for any content container.
+- **Do NOT crowd elements.** When in doubt, add more whitespace, not less.
+- **Do NOT use different spacing systems** in different parts of the same page. One scale for the whole page.
+
+## Accessibility Anti-Patterns
+
+- **Do NOT rely on color alone** to convey information. Add icons, text, or patterns.
+- **Do NOT use thin font weights (100-300) for body text.** Minimum 400 for readability.
+- **Do NOT create custom controls** without proper ARIA attributes. Prefer native HTML elements.
+- **Do NOT trap keyboard focus** outside of modals. Only modals should have focus traps.
+- **Do NOT auto-play media** without user consent and a visible stop/mute control.
diff --git a/.cursor/skills/ui-design/references/components.md b/.cursor/skills/ui-design/references/components.md
new file mode 100644
index 0000000..9aaf542
--- /dev/null
+++ b/.cursor/skills/ui-design/references/components.md
@@ -0,0 +1,307 @@
+# Component Reference
+
+Use this reference when generating UI mockups. Each component includes best practices, required states, and accessibility requirements.
+
+## Navigation
+
+### Top Navigation Bar
+- Fixed or sticky at top; z-index above content
+- Logo/brand left, primary nav center or right, actions (search, profile, CTA) far right
+- Active state: underline, background highlight, or bold — pick one, be consistent
+- Mobile: collapse to hamburger menu at `md` breakpoint; never show hamburger on desktop
+- Height: 56-72px; padding inline 16-24px
+- Aliases: navbar, header nav, app bar, top bar
+
+### Sidebar Navigation
+- Width: 240-280px expanded, 64-72px collapsed
+- Sections with labels; icons + text for each item
+- Active item: background fill + accent color text/icon
+- Collapse/expand toggle; responsive: overlay on mobile
+- Scroll independently from main content if taller than viewport
+- Aliases: side nav, drawer, rail
+
+### Breadcrumbs
+- Show hierarchy path; separator: `/` or `>`
+- Current page is plain text (not a link); parent pages are links
+- Truncate with ellipsis if more than 4-5 levels
+- Aliases: path indicator, navigation trail
+
+### Tabs
+- Use for switching between related content views within the same context
+- Active tab: border-bottom accent or filled background
+- Never nest tabs inside tabs
+- Scrollable when too many to fit; show scroll indicators
+- Aliases: tab bar, segmented control, view switcher
+
+### Pagination
+- Show current page, first, last, and 2-3 surrounding pages
+- Previous/Next buttons always visible; disabled at boundaries
+- Show total count when available: "Showing 1-20 of 342"
+- Aliases: pager, page navigation
+
+## Content Display
+
+### Card
+- Border-radius: 8-12px; subtle shadow or border (not both unless intentional)
+- Padding: 16-24px; consistent within the same card grid
+- Content order: image/visual → title → description → metadata → actions
+- Hover: subtle shadow lift or border-color change (not both)
+- Never stack more than 3 cards vertically without visual break
+- Aliases: tile, panel, content block
+
+### Data Table
+- Header row: sticky, slightly bolder background, sort indicators
+- Row hover: subtle background change
+- Striped rows optional; alternate between base and surface colors
+- Cell padding: 12-16px vertical, 16px horizontal
+- Truncate long text with ellipsis + tooltip on hover
+- Responsive: horizontal scroll with frozen first column, or stack to card layout on mobile
+- Include empty state when no data
+- Aliases: grid, spreadsheet, list view
+
+### List
+- Consistent item height or padding
+- Dividers between items: subtle border or spacing (not both)
+- Interactive lists: hover state on entire row
+- Leading element (icon/avatar) + content (title + subtitle) + trailing element (action/badge)
+- Aliases: item list, feed, timeline
+
+### Stat/Metric Card
+- Large number/value prominently displayed
+- Label above or below the value; comparison/trend indicator optional
+- Color-code trends: green up, red down, gray neutral
+- Aliases: KPI card, metric tile, stat block
+
+### Avatar
+- Circular; sizes: 24/32/40/48/64px
+- Fallback: initials on colored background when no image
+- Status indicator: small circle at bottom-right (green=online, gray=offline)
+- Group: overlap with z-index stacking; show "+N" for overflow
+- Aliases: profile picture, user icon
+
+### Badge/Tag
+- Small, pill-shaped or rounded-rectangle
+- Color indicates category or status; limit to 5-6 distinct colors
+- Text: short (1-3 words); truncate if longer
+- Removable variant: include x button
+- Aliases: chip, label, status indicator
+
+### Hero Section
+- Full-width; height 400-600px or viewport-relative
+- Strong headline (h1) + supporting text + primary CTA
+- Background: gradient, image with overlay, or solid color — not all three
+- Text must have sufficient contrast over any background
+- Aliases: banner, jumbotron, splash
+
+### Empty State
+- Illustration or icon (not a generic placeholder)
+- Explanatory text: what this area will contain
+- Primary action CTA: "Create your first...", "Add...", "Import..."
+- Never show just blank space
+- Aliases: zero state, no data, blank slate
+
+### Skeleton Loader
+- Match the shape and layout of the content being loaded
+- Animate with subtle pulse or shimmer (left-to-right gradient)
+- Show for predictable content; use progress bar for uploads/processes
+- Never use spinning loaders for content that has a known layout
+- Aliases: placeholder, loading state, content loader
+
+## Forms & Input
+
+### Text Input
+- Height: 40-48px; padding inline 12-16px
+- Label above the input (not placeholder-only); placeholder as hint only
+- States: default, hover, focus (accent ring), error (red border + message), disabled (reduced opacity)
+- Error message below the field with icon; don't use red placeholder
+- Aliases: text field, input box, form field
+
+### Textarea
+- Minimum height: 80-120px; resizable vertically
+- Character count when there's a limit
+- Same states as text input
+- Aliases: multiline input, text area, comment box
+
+### Select/Dropdown
+- Match text input height and styling
+- Chevron indicator on the right
+- Options list: max height with scroll; selected item checkmark
+- Search/filter for lists longer than 10 items
+- Aliases: combo box, picker, dropdown menu
+
+### Checkbox
+- Size: 16-20px; rounded corners (2-4px)
+- Label to the right; clickable area includes the label
+- States: unchecked, checked (accent fill + white check), indeterminate (dash), disabled
+- Group: vertical stack with 8-12px gap
+- Aliases: check box, toggle option, multi-select
+
+### Radio Button
+- Size: 16-20px; circular
+- Same interaction patterns as checkbox but single-select
+- Group: vertical stack; minimum 2 options
+- Aliases: radio, option button, single-select
+
+### Toggle/Switch
+- Width: 40-52px; height: 20-28px; thumb is circular
+- Off: gray track; On: accent color track
+- Label to the left or right; describe the "on" state
+- Never use for actions that require a submit; toggles are instant
+- Aliases: switch, on/off toggle
+
+### File Upload
+- Drop zone with dashed border; icon + "Drag & drop or click to upload"
+- Show file type restrictions and size limit
+- Progress indicator during upload
+- File list after upload: name, size, remove button
+- Aliases: file picker, upload area, attachment
+
+### Form Layout
+- Single column for most forms; two columns only for related short fields (first/last name, city/state)
+- Group related fields with section headings
+- Required field indicator: asterisk after label
+- Submit button: right-aligned or full-width; clearly primary
+- Inline validation: show errors on blur, not on every keystroke
+
+## Actions
+
+### Button
+- Primary: filled accent color, white text; one per visible area
+- Secondary: outline or subtle background; supports primary action
+- Ghost/tertiary: text-only with hover background
+- Sizes: sm (32px), md (40px), lg (48px); padding inline 16-24px
+- States: default, hover (darken/lighten 10%), active (darken 15%), focus (ring), disabled (opacity 0.5 + not-allowed cursor)
+- Disabled buttons must have a title attribute explaining why
+- Icon-only buttons: need aria-label; minimum 40px touch target
+- Aliases: action, CTA, submit
+
+### Icon Button
+- Circular or rounded-square; minimum 40px for touch targets
+- Tooltip on hover showing the action name
+- Visually lighter than text buttons
+- Aliases: toolbar button, action icon
+
+### Dropdown Menu
+- Trigger: button or icon button
+- Menu: elevated surface (shadow), rounded corners
+- Items: 36-44px height; icon + label; hover background
+- Dividers between groups; section labels for grouped items
+- Keyboard navigable: arrow keys, enter to select, escape to close
+- Aliases: context menu, action menu, overflow menu
+
+### Floating Action Button (FAB)
+- Circular, 56px; elevated with shadow
+- One per screen maximum; bottom-right placement
+- Primary creation action only
+- Extended variant: pill-shape with icon + label
+- Aliases: FAB, add button, create button
+
+## Feedback
+
+### Toast/Notification
+- Position: top-right or bottom-right; stack vertically
+- Auto-dismiss: 4-6 seconds for info; persist for errors until dismissed
+- Types: success (green), error (red), warning (amber), info (blue)
+- Content: icon + message + optional action link + close button
+- Maximum 3 visible at once; queue the rest
+- Aliases: snackbar, alert toast, flash message
+
+### Alert/Banner
+- Full-width within its container; not floating
+- Types: info, success, warning, error with corresponding colors
+- Icon left, message center, dismiss button right
+- Persistent until user dismisses or condition changes
+- Aliases: notice, inline alert, status banner
+
+### Modal/Dialog
+- Centered; overlay dims background (opacity 0.5 black)
+- Max width: 480-640px for standard, 800px for complex
+- Header (title + close button) + body + footer (actions)
+- Actions: right-aligned; primary right, secondary left
+- Close on overlay click and Escape key
+- Never put a modal inside a modal
+- Focus trap: tab cycles within modal while open
+- Aliases: popup, dialog box, lightbox
+
+### Tooltip
+- Appears on hover after 300-500ms delay; disappears on mouse leave
+- Position: above element by default; flip if near viewport edge
+- Max width: 200-280px; short text only
+- Arrow/caret pointing to trigger element
+- Aliases: hint, info popup, hover text
+
+### Progress Indicator
+- Linear bar: for known duration/percentage; show percentage text
+- Skeleton: for content loading with known layout
+- Spinner: only for indeterminate short waits (< 3 seconds) where layout is unknown
+- Step indicator: for multi-step flows; show completed/current/upcoming
+- Aliases: loading bar, progress bar, stepper
+
+## Layout
+
+### Page Shell
+- Max content width: 1200-1440px; centered with auto margins
+- Sidebar + main content pattern: sidebar fixed, main scrolls
+- Header/footer outside max-width for full-bleed effect
+- Consistent padding: 16px mobile, 24px tablet, 32px desktop
+
+### Grid
+- CSS Grid or Flexbox; 12-column system or auto-fit with minmax
+- Gap: 16-24px between items
+- Responsive: 1 column mobile, 2 columns tablet, 3-4 columns desktop
+- Never rely on fixed pixel widths; use fr units or percentages
+
+### Section Divider
+- Use spacing (48-96px margin) as primary divider; use lines sparingly
+- If using lines: subtle (1px, border color); full-width or indented
+- Alternate section backgrounds (base/surface) for clear separation without lines
+
+### Responsive Breakpoints
+- sm: 640px (large phone landscape)
+- md: 768px (tablet)
+- lg: 1024px (small laptop)
+- xl: 1280px (desktop)
+- Design mobile-first: base styles are mobile, layer up with breakpoints
+
+## Specialized
+
+### Pricing Table
+- 2-4 tiers side by side; highlight recommended tier
+- Feature comparison with checkmarks; group features by category
+- CTA button per tier; recommended tier has primary button, others secondary
+- Monthly/annual toggle if applicable
+- Aliases: pricing cards, plan comparison
+
+### Testimonial
+- Quote text (large, italic or with quotation marks)
+- Attribution: avatar + name + title/company
+- Layout: single featured or carousel/grid of multiple
+- Aliases: review, customer quote, social proof
+
+### Footer
+- Full-width; darker background than body
+- Column layout: links grouped by category; 3-5 columns
+- Bottom row: copyright, legal links, social icons
+- Responsive: columns stack on mobile
+- Aliases: site footer, bottom navigation
+
+### Search
+- Input with search icon; expand on focus or always visible
+- Results: dropdown with highlighted matching text
+- Recent searches and suggestions
+- Keyboard shortcut hint (Cmd+K / Ctrl+K)
+- Aliases: search bar, omnibar, search field
+
+### Date Picker
+- Input that opens a calendar dropdown
+- Navigate months with arrows; today highlighted
+- Range selection: two calendars side by side
+- Presets: "Today", "Last 7 days", "This month"
+- Aliases: calendar picker, date selector
+
+### Chart/Graph Placeholder
+- Container with appropriate aspect ratio (16:9 for line/bar, 1:1 for pie)
+- Include chart title, legend, and axis labels in the mockup
+- Use representative fake data; label as "Sample Data"
+- Tooltip placeholder on hover
+- Aliases: data visualization, graph, analytics chart
diff --git a/.cursor/skills/ui-design/references/design-vocabulary.md b/.cursor/skills/ui-design/references/design-vocabulary.md
new file mode 100644
index 0000000..3f275f1
--- /dev/null
+++ b/.cursor/skills/ui-design/references/design-vocabulary.md
@@ -0,0 +1,139 @@
+# Design Vocabulary
+
+Use this reference when writing DESIGN.md files and constructing generation prompts. Replace vague descriptors with specific, actionable terms.
+
+## Atmosphere Descriptors
+
+Use these instead of "clean and modern":
+
+| Atmosphere | Characteristics | Font Direction | Color Direction | Spacing |
+|------------|----------------|---------------|-----------------|---------|
+| **Airy & Spacious** | Generous whitespace, light backgrounds, floating elements, subtle shadows | Thin/light weights, generous letter-spacing | Soft pastels, whites, muted accents | Large margins, open padding |
+| **Dense & Data-Rich** | Compact spacing, information-heavy, efficient use of space | Medium weights, tighter line-heights, smaller sizes | Neutral grays, high-contrast data colors | Tight but consistent padding |
+| **Warm & Approachable** | Rounded corners, friendly illustrations, organic shapes | Rounded/humanist typefaces, comfortable sizes | Earth tones, warm neutrals, amber/coral accents | Medium spacing, generous touch targets |
+| **Sharp & Technical** | Crisp edges, precise alignment, monospace elements, dark themes | Geometric or monospace, precise sizing | Cool grays, electric blues/greens, dark backgrounds | Grid-strict, mathematical spacing |
+| **Luxurious & Premium** | Generous space, refined details, serif accents, subtle animations | Serif or elegant sans-serif, generous sizing | Deep darks, gold/champagne accents, rich jewel tones | Expansive whitespace, dramatic padding |
+| **Playful & Creative** | Asymmetric layouts, bold colors, hand-drawn elements, motion | Display fonts, variable weights, expressive sizing | Bright saturated colors, unexpected combinations | Dynamic, deliberately uneven |
+| **Corporate & Enterprise** | Structured grids, predictable patterns, dense but organized | System fonts or conservative sans-serif | Brand blues/grays, accent for status indicators | Systematic, spec-driven |
+| **Editorial & Content** | Typography-forward, reading-focused, long-form layout | Serif for body text, sans for UI elements | Near-monochrome, sparse accent color | Generous line-height, wide columns |
+
+## Style-Specific Vocabulary
+
+### When user says... → Use these terms in DESIGN.md
+
+| Vague Input | Professional Translation |
+|-------------|------------------------|
+| "clean" | Restrained palette, generous whitespace, consistent alignment grid |
+| "modern" | Current design patterns (2024-2026), subtle depth, micro-interactions |
+| "minimal" | Single accent color, maximum negative space, typography-driven hierarchy |
+| "professional" | Structured grid, conservative palette, system fonts, clear navigation |
+| "fun" | Saturated palette, rounded elements, playful illustrations, motion |
+| "elegant" | Serif typography, muted palette, generous spacing, refined details |
+| "techy" | Dark theme, monospace accents, neon highlights, sharp corners |
+| "bold" | High contrast, large type, strong color blocks, dramatic layout |
+| "friendly" | Rounded corners (12-16px), humanist fonts, warm colors, illustrations |
+| "corporate" | Blue-gray palette, structured grid, conventional layout, data tables |
+
+## Color Mood Palettes
+
+### Cool Blues & Grays
+- Background: #f8fafc → #f1f5f9
+- Surface: #ffffff
+- Text: #0f172a → #475569
+- Accent: #2563eb (blue-600)
+- Pairs well with: Airy, Sharp, Corporate atmospheres
+
+### Warm Earth Tones
+- Background: #faf8f5 → #f5f0eb
+- Surface: #ffffff
+- Text: #292524 → #78716c
+- Accent: #c2410c (orange-700) or #b45309 (amber-700)
+- Pairs well with: Warm, Editorial atmospheres
+
+### Bold & Vibrant
+- Background: #fafafa → #f5f5f5
+- Surface: #ffffff
+- Text: #171717 → #525252
+- Accent: #dc2626 (red-600) or #7c3aed (violet-600) or #059669 (emerald-600)
+- Pairs well with: Playful, Creative atmospheres
+
+### Monochrome
+- Background: #fafafa → #f5f5f5
+- Surface: #ffffff
+- Text: #171717 → #737373
+- Accent: #171717 (black) with #e5e5e5 borders
+- Pairs well with: Minimal, Luxurious, Editorial atmospheres
+
+### Dark Mode
+- Background: #09090b → #18181b
+- Surface: #27272a → #3f3f46
+- Text: #fafafa → #a1a1aa
+- Accent: #3b82f6 (blue-500) or #22d3ee (cyan-400)
+- Pairs well with: Sharp, Technical, Dense atmospheres
+
+## Typography Mood Mapping
+
+### Geometric (Modern, Clean)
+Fonts: DM Sans, Plus Jakarta Sans, Outfit, General Sans, Satoshi
+- Characteristics: even stroke weight, circular letter forms, precise geometry
+- Best for: SaaS, tech products, dashboards, landing pages
+
+### Humanist (Friendly, Readable)
+Fonts: Source Sans 3, Nunito, Lato, Open Sans, Noto Sans
+- Characteristics: organic curves, varying stroke, warm feel
+- Best for: consumer apps, health/wellness, education, community platforms
+
+### Monospace (Technical, Code-Like)
+Fonts: JetBrains Mono, Fira Code, IBM Plex Mono, Space Mono
+- Characteristics: fixed-width, technical aesthetic, raw precision
+- Best for: developer tools, terminals, data displays, documentation
+
+### Serif (Editorial, Premium)
+Fonts: Playfair Display, Lora, Merriweather, Crimson Pro, Libre Baskerville
+- Characteristics: traditional elegance, reading comfort, authority
+- Best for: blogs, magazines, luxury brands, portfolio sites
+
+### Display (Expressive, Bold)
+Fonts: Cabinet Grotesk, Clash Display, Archivo Black, Space Grotesk
+- Characteristics: high impact, personality-driven, attention-grabbing
+- Best for: hero sections, headlines, creative portfolios, marketing pages
+- Use for headings only; pair with a readable body font
+
+## Shape & Depth Vocabulary
+
+### Border Radius Scale
+| Term | Value | Use for |
+|------|-------|---------|
+| Sharp | 0-2px | Technical, enterprise, data-heavy |
+| Subtle | 4-6px | Professional, balanced |
+| Rounded | 8-12px | Friendly, modern SaaS |
+| Pill | 16-24px or full | Playful, badges, tags |
+| Circle | 50% | Avatars, icon buttons |
+
+### Shadow Scale
+| Term | Value | Use for |
+|------|-------|---------|
+| None | none | Flat design, minimal |
+| Whisper | 0 1px 2px rgba(0,0,0,0.05) | Subtle elevation, cards |
+| Soft | 0 4px 6px rgba(0,0,0,0.07) | Standard cards, dropdowns |
+| Medium | 0 10px 15px rgba(0,0,0,0.1) | Elevated elements, modals |
+| Strong | 0 20px 25px rgba(0,0,0,0.15) | Floating elements, popovers |
+
+### Surface Hierarchy
+1. **Background** — deepest layer, covers viewport
+2. **Surface** — content containers (cards, panels) sitting on background
+3. **Elevated** — elements above surface (modals, dropdowns, tooltips)
+4. **Overlay** — dimming layer between surface and elevated elements
+
+## Layout Pattern Names
+
+| Pattern | Description | Best for |
+|---------|-------------|----------|
+| **Holy grail** | Header + sidebar + main + footer | Admin dashboards, apps |
+| **Magazine** | Multi-column with varied widths | Content sites, blogs |
+| **Single column** | Centered narrow content | Landing pages, articles, forms |
+| **Split screen** | Two equal or 60/40 halves | Comparison pages, sign-up flows |
+| **Card grid** | Uniform grid of cards | Product listings, portfolios |
+| **Asymmetric** | Deliberately unequal columns | Creative, editorial layouts |
+| **Full bleed** | Edge-to-edge sections, no max-width | Marketing pages, portfolios |
+| **Dashboard** | Stat cards + charts + tables in grid | Analytics, admin panels |
diff --git a/.cursor/skills/ui-design/references/quality-checklist.md b/.cursor/skills/ui-design/references/quality-checklist.md
new file mode 100644
index 0000000..db75b04
--- /dev/null
+++ b/.cursor/skills/ui-design/references/quality-checklist.md
@@ -0,0 +1,109 @@
+# Quality Checklist
+
+Run through this checklist after generating or modifying a mockup. Three layers; run all that apply.
+
+## Layer 1: Structural Check (Always Run)
+
+### Semantic HTML
+- [ ] Uses `nav`, `main`, `section`, `article`, `aside`, `footer` — not just `div`
+- [ ] Single `h1` per page
+- [ ] Heading hierarchy follows h1 → h2 → h3 without skipping levels
+- [ ] Lists use `ul`/`ol`/`li`, not styled `div`s
+- [ ] Interactive elements are `button` or `a`, not clickable `div`s
+
+### Design Tokens
+- [ ] CSS custom properties defined in `<style>` block
+- [ ] Colors in HTML reference tokens (e.g., `var(--color-accent)`) not raw hex
+- [ ] Spacing follows the defined scale, not arbitrary pixel values
+- [ ] Font family matches DESIGN.md, not browser default or Inter/Roboto
+
+### Responsive Design
+- [ ] Mobile-first: base styles work at 375px
+- [ ] Content readable without horizontal scroll at all breakpoints
+- [ ] Navigation adapts: full nav on desktop, collapsed on mobile
+- [ ] Images/media have max-width: 100%
+- [ ] Touch targets minimum 44px on mobile
+
+### Interaction States
+- [ ] All buttons have hover, focus, active states
+- [ ] All links have hover and focus states
+- [ ] At least one loading state example (skeleton loader preferred)
+- [ ] At least one empty state with illustration + CTA
+- [ ] Disabled elements have visual indicator + explanation (title attribute)
+- [ ] Form inputs have focus ring using accent color
+
+### Component Quality
+- [ ] Button hierarchy: one primary per visible area, secondary and ghost variants present
+- [ ] Forms: labels above inputs, not placeholder-only
+- [ ] Error states: inline message below field with icon
+- [ ] No hamburger menu on desktop
+- [ ] No modal inside modal
+- [ ] No "Click here" links
+
+### Code Quality
+- [ ] Valid HTML (no unclosed tags, no duplicate IDs)
+- [ ] Tailwind classes are valid (no made-up utilities)
+- [ ] No inline styles that duplicate token values
+- [ ] File is self-contained (single HTML file, no external dependencies except Tailwind CDN)
+- [ ] Total file size under 50KB
+
+## Layer 2: Visual Check (When Browser Tool Available)
+
+Take a screenshot and examine:
+
+### Spacing & Alignment
+- [ ] Consistent margins between sections
+- [ ] Elements within the same row are vertically aligned
+- [ ] Padding within cards/containers is consistent
+- [ ] No orphan text (single word on its own line in headings)
+- [ ] Grid alignment: elements on the same row have matching heights or intentional variation
+
+### Typography
+- [ ] Heading sizes create clear hierarchy (visible difference between h1, h2, h3)
+- [ ] Body text is comfortable reading size (not tiny)
+- [ ] Font rendering looks correct (font loaded or appropriate fallback)
+- [ ] Line length: body text 50-75 characters per line
+
+### Color & Contrast
+- [ ] Primary accent is visible but not overwhelming
+- [ ] Text is readable over all backgrounds
+- [ ] No elements blend into their backgrounds
+- [ ] Status colors (success/error/warning) are distinguishable
+
+### Overall Composition
+- [ ] Visual weight is balanced (not all content on one side)
+- [ ] Clear focal point on the page (hero, headline, or primary CTA)
+- [ ] Appropriate whitespace: not cramped, not excessively empty
+- [ ] Consistent visual language throughout the page
+
+### Atmosphere Match
+- [ ] Overall feel matches the DESIGN.md atmosphere description
+- [ ] Not generic "AI generated" look
+- [ ] Color palette is cohesive (no unexpected color outliers)
+- [ ] Typography choice matches the intended mood
+
+## Layer 3: Compliance Check (When MCP Tools Available)
+
+### AccessLint MCP
+- [ ] Run `audit_html` on the generated file
+- [ ] Fix all violations with fixability "fixable" or "potentially_fixable"
+- [ ] Document any remaining violations that require manual judgment
+- [ ] Re-run `diff_html` to confirm fixes resolved violations
+
+### RenderLens MCP
+- [ ] Render at 1440px and 375px widths
+- [ ] Lighthouse accessibility score ≥ 80
+- [ ] Lighthouse performance score ≥ 70
+- [ ] Lighthouse best practices score ≥ 80
+- [ ] If iterating: run diff between previous and current version
+
+## Severity Classification
+
+When reporting issues found during the checklist:
+
+| Severity | Criteria | Action |
+|----------|----------|--------|
+| **Critical** | Broken layout, invisible content, no mobile support | Fix immediately before showing to user |
+| **High** | Missing interaction states, accessibility violations, token misuse | Fix before showing to user |
+| **Medium** | Minor spacing inconsistency, non-ideal font weight, slight alignment issue | Note in assessment, fix if easy |
+| **Low** | Style preference, minor polish opportunity | Note in assessment, fix during /design-polish |
diff --git a/.cursor/skills/ui-design/templates/design-system.md b/.cursor/skills/ui-design/templates/design-system.md
new file mode 100644
index 0000000..a5d8712
--- /dev/null
+++ b/.cursor/skills/ui-design/templates/design-system.md
@@ -0,0 +1,199 @@
+# Design System: [Project Name]
+
+## 1. Visual Atmosphere
+
+[Describe the mood, density, and aesthetic philosophy in 2-3 sentences. Be specific — never use "clean and modern". Reference the atmosphere type from design-vocabulary.md. Example: "A spacious, light-filled interface with generous whitespace that feels calm and unhurried. Elements float on a near-white canvas with subtle shadows providing depth. The overall impression is sophisticated simplicity — premium without being cold."]
+
+## 2. Color System
+
+### Primitives
+
+```css
+:root {
+  --white: #ffffff;
+  --black: #000000;
+
+  --gray-50: #______;
+  --gray-100: #______;
+  --gray-200: #______;
+  --gray-300: #______;
+  --gray-400: #______;
+  --gray-500: #______;
+  --gray-600: #______;
+  --gray-700: #______;
+  --gray-800: #______;
+  --gray-900: #______;
+  --gray-950: #______;
+
+  --accent-50: #______;
+  --accent-100: #______;
+  --accent-200: #______;
+  --accent-300: #______;
+  --accent-400: #______;
+  --accent-500: #______;
+  --accent-600: #______;
+  --accent-700: #______;
+  --accent-800: #______;
+  --accent-900: #______;
+
+  --red-500: #______;
+  --red-600: #______;
+  --green-500: #______;
+  --green-600: #______;
+  --amber-500: #______;
+  --amber-600: #______;
+}
+```
+
+### Semantic Tokens
+
+```css
+:root {
+  --color-bg-primary: var(--gray-50);
+  --color-bg-secondary: var(--gray-100);
+  --color-bg-surface: var(--white);
+  --color-bg-inverse: var(--gray-900);
+
+  --color-text-primary: var(--gray-900);
+  --color-text-secondary: var(--gray-500);
+  --color-text-tertiary: var(--gray-400);
+  --color-text-inverse: var(--white);
+  --color-text-link: var(--accent-600);
+
+  --color-accent: var(--accent-600);
+  --color-accent-hover: var(--accent-700);
+  --color-accent-light: var(--accent-50);
+
+  --color-border: var(--gray-200);
+  --color-border-strong: var(--gray-300);
+  --color-divider: var(--gray-100);
+
+  --color-error: var(--red-600);
+  --color-error-light: var(--red-500);
+  --color-success: var(--green-600);
+  --color-success-light: var(--green-500);
+  --color-warning: var(--amber-600);
+  --color-warning-light: var(--amber-500);
+}
+```
+
+### Component Tokens
+
+```css
+:root {
+  --button-primary-bg: var(--color-accent);
+  --button-primary-text: var(--color-text-inverse);
+  --button-primary-hover: var(--color-accent-hover);
+  --button-secondary-bg: transparent;
+  --button-secondary-border: var(--color-border-strong);
+  --button-secondary-text: var(--color-text-primary);
+
+  --card-bg: var(--color-bg-surface);
+  --card-border: var(--color-border);
+  --card-shadow: 0 1px 3px rgba(0, 0, 0, 0.08);
+
+  --input-bg: var(--color-bg-surface);
+  --input-border: var(--color-border);
+  --input-border-focus: var(--color-accent);
+  --input-text: var(--color-text-primary);
+  --input-placeholder: var(--color-text-tertiary);
+
+  --nav-bg: var(--color-bg-surface);
+  --nav-active-bg: var(--color-accent-light);
+  --nav-active-text: var(--color-accent);
+}
+```
+
+## 3. Typography
+
+- **Font family**: [Specific font name], [fallback], system-ui, sans-serif
+- **Font source**: Google Fonts link or system font
+
+| Level | Element | Size | Weight | Line Height | Letter Spacing |
+|-------|---------|------|--------|-------------|----------------|
+| Display | Hero headlines | 3rem (48px) | 700 | 1.1 | -0.02em |
+| H1 | Page title | 2.25rem (36px) | 700 | 1.2 | -0.01em |
+| H2 | Section title | 1.5rem (24px) | 600 | 1.3 | 0 |
+| H3 | Subsection | 1.25rem (20px) | 600 | 1.4 | 0 |
+| H4 | Card/group title | 1.125rem (18px) | 600 | 1.4 | 0 |
+| Body | Default text | 1rem (16px) | 400 | 1.5 | 0 |
+| Small | Captions, meta | 0.875rem (14px) | 400 | 1.5 | 0.01em |
+| XS | Labels, badges | 0.75rem (12px) | 500 | 1.4 | 0.02em |
+
+## 4. Spacing & Layout
+
+- **Base unit**: 4px (0.25rem)
+- **Spacing scale**: 1 (4px), 2 (8px), 3 (12px), 4 (16px), 5 (20px), 6 (24px), 8 (32px), 10 (40px), 12 (48px), 16 (64px), 20 (80px), 24 (96px)
+- **Content max-width**: [1200px / 1280px / 1440px]
+- **Grid**: [12-column / auto-fit] with [16px / 24px] gap
+
+| Breakpoint | Name | Min Width | Columns | Padding |
+|------------|------|-----------|---------|---------|
+| Mobile | sm | 0 | 1 | 16px |
+| Tablet | md | 768px | 2 | 24px |
+| Laptop | lg | 1024px | 3-4 | 32px |
+| Desktop | xl | 1280px | 4+ | 32px |
+
+## 5. Component Styling Defaults
+
+### Buttons
+- Border radius: [6px / 8px / full]
+- Padding: 10px 20px (md), 8px 16px (sm), 12px 24px (lg)
+- Font weight: 500
+- Transition: background-color 150ms ease, box-shadow 150ms ease
+- Focus: 2px ring with 2px offset using `--color-accent`
+- Disabled: opacity 0.5, cursor not-allowed
+
+### Cards
+- Border radius: [8px / 12px]
+- Border: 1px solid var(--card-border)
+- Shadow: var(--card-shadow)
+- Padding: 20-24px
+- Hover (if interactive): shadow increase or border-color change
+
+### Inputs
+- Height: 40px (md), 36px (sm), 48px (lg)
+- Border radius: 6px
+- Border: 1px solid var(--input-border)
+- Padding: 0 12px
+- Focus: border-color var(--input-border-focus) + 2px ring
+- Error: border-color var(--color-error) + error message below
+
+### Navigation
+- Item height: 40px
+- Active: background var(--nav-active-bg), text var(--nav-active-text)
+- Hover: background var(--color-bg-secondary)
+- Transition: background-color 150ms ease
+
+## 6. Interaction States (MANDATORY)
+
+### Loading
+- Use skeleton loaders matching content shape
+- Pulse animation: opacity 0.4 → 1.0, duration 1.5s, ease-in-out
+- Background: var(--color-bg-secondary)
+
+### Error
+- Inline message below the element
+- Icon (circle-exclamation) + red text using var(--color-error)
+- Border change on the input/container to var(--color-error)
+
+### Empty
+- Centered illustration or icon (64-96px)
+- Heading: "No [items] yet" or similar
+- Descriptive text: one sentence explaining what will appear
+- Primary CTA button: "Create first...", "Add...", "Import..."
+
+### Hover
+- Interactive elements: subtle background shift or underline
+- Cards: shadow increase or border-color change
+- Transition: 150ms ease
+
+### Focus
+- Visible ring: 2px solid var(--color-accent), 2px offset
+- Applied to all interactive elements (buttons, inputs, links, tabs)
+- Never remove outline without providing alternative focus indicator
+
+### Disabled
+- Opacity: 0.5
+- Cursor: not-allowed
+- Title attribute explaining why the element is disabled
diff --git a/_docs/02_document/integration_tests/non_functional_tests.md b/_docs/02_document/integration_tests/non_functional_tests.md
deleted file mode 100644
index 8ca0eca..0000000
--- a/_docs/02_document/integration_tests/non_functional_tests.md
+++ /dev/null
@@ -1,325 +0,0 @@
-# E2E Non-Functional Tests
-
-## Performance Tests
-
-### NFT-PERF-01: Single image detection latency
-
-**Summary**: Measure end-to-end latency for a single small image detection request after engine is warm.
-**Traces to**: AC-API-2
-**Metric**: Request-to-response latency (ms)
-
-**Preconditions**:
-- Engine is initialized and warm (at least 1 prior detection)
-
-**Steps**:
-
-| Step | Consumer Action | Measurement |
-|------|----------------|-------------|
-| 1 | Send 10 sequential `POST /detect` with small-image | Record each request-response latency |
-| 2 | Compute p50, p95, p99 | — |
-
-**Pass criteria**: p95 latency < 5000ms for ONNX CPU, p95 < 1000ms for TensorRT GPU
-**Duration**: ~60s (10 requests)
-
----
-
-### NFT-PERF-02: Concurrent inference throughput
-
-**Summary**: Verify the system handles 2 concurrent inference requests (ThreadPoolExecutor limit).
-**Traces to**: RESTRICT-HW-3
-**Metric**: Throughput (requests/second), latency under concurrency
-
-**Preconditions**:
-- Engine is initialized and warm
-
-**Steps**:
-
-| Step | Consumer Action | Measurement |
-|------|----------------|-------------|
-| 1 | Send 2 concurrent `POST /detect` requests with small-image | Measure both response times |
-| 2 | Send 3 concurrent requests | Third request should queue behind the first two |
-| 3 | Record total time for 3 concurrent requests vs 2 concurrent | — |
-
-**Pass criteria**: 2 concurrent requests complete without error. 3 concurrent requests: total time > time for 2 (queuing observed).
-**Duration**: ~30s
-
----
-
-### NFT-PERF-03: Large image tiling processing time
-
-**Summary**: Measure processing time for a large image that triggers GSD-based tiling.
-**Traces to**: AC-IP-2
-**Metric**: Total processing time (ms), tiles processed
-
-**Preconditions**:
-- Engine is initialized and warm
-
-**Steps**:
-
-| Step | Consumer Action | Measurement |
-|------|----------------|-------------|
-| 1 | `POST /detect` with large-image (4000×3000) and GSD config | Record total response time |
-| 2 | Compare with small-image baseline from NFT-PERF-01 | Ratio indicates tiling overhead |
-
-**Pass criteria**: Request completes within 120s. Processing time scales proportionally with number of tiles (not exponentially).
-**Duration**: ~120s
-
----
-
-### NFT-PERF-04: Video processing frame rate
-
-**Summary**: Measure effective frame processing rate during video detection.
-**Traces to**: AC-VP-1
-**Metric**: Frames processed per second, total processing time
-
-**Preconditions**:
-- Engine is initialized and warm
-- SSE client connected
-
-**Steps**:
-
-| Step | Consumer Action | Measurement |
-|------|----------------|-------------|
-| 1 | `POST /detect/test-media-perf` with test-video and `frame_period_recognition: 4` | — |
-| 2 | Count SSE events and measure total time from "started" to "AIProcessed" | Compute frames/second |
-
-**Pass criteria**: Processing completes within 5× video duration (10s video → < 50s processing). Frame processing rate is consistent (no stalls > 10s between events).
-**Duration**: ~120s
-
----
-
-## Resilience Tests
-
-### NFT-RES-01: Loader service outage after engine initialization
-
-**Summary**: Verify that detections continue working when the Loader service goes down after the engine is already loaded.
-**Traces to**: RESTRICT-ENV-1
-
-**Preconditions**:
-- Engine is initialized (model already downloaded)
-
-**Fault injection**:
-- Stop mock-loader service
-
-**Steps**:
-
-| Step | Action | Expected Behavior |
-|------|--------|------------------|
-| 1 | Stop mock-loader | — |
-| 2 | `POST /detect` with small-image | 200 OK — detection succeeds (engine already in memory) |
-| 3 | `GET /health` | `aiAvailability` remains "Enabled" |
-
-**Pass criteria**: Detection continues to work. Health status remains stable. No errors from loader unavailability.
-
----
-
-### NFT-RES-02: Annotations service outage during async detection
-
-**Summary**: Verify that async detection completes and delivers SSE events even when Annotations service is down.
-**Traces to**: RESTRICT-ENV-2
-
-**Preconditions**:
-- Engine is initialized
-- SSE client connected
-
-**Fault injection**:
-- Stop mock-annotations mid-processing
-
-**Steps**:
-
-| Step | Action | Expected Behavior |
-|------|--------|------------------|
-| 1 | Start async detection: `POST /detect/test-media-res01` | `{"status": "started"}` |
-| 2 | After first few SSE events, stop mock-annotations | — |
-| 3 | Continue listening to SSE | Events continue arriving. Annotations POST failures are silently caught |
-| 4 | Wait for completion | Final `AIProcessed` event received |
-
-**Pass criteria**: Detection pipeline completes fully. SSE delivery is unaffected. No crash or 500 errors.
-
----
-
-### NFT-RES-03: Engine initialization retry after transient loader failure
-
-**Summary**: Verify that if model download fails on first attempt, a subsequent detection request retries initialization.
-**Traces to**: AC-EL-2
-
-**Preconditions**:
-- Fresh service (engine not initialized)
-
-**Fault injection**:
-- Mock-loader returns 503 on first model request, then recovers
-
-**Steps**:
-
-| Step | Action | Expected Behavior |
-|------|--------|------------------|
-| 1 | Configure mock-loader to fail first request | — |
-| 2 | `POST /detect` with small-image | Error (503 or 422) |
-| 3 | Configure mock-loader to succeed | — |
-| 4 | `POST /detect` with small-image | 200 OK — engine initializes on retry |
-
-**Pass criteria**: Second detection succeeds after loader recovers. System does not permanently lock into error state.
-
----
-
-### NFT-RES-04: Service restart with in-memory state loss
-
-**Summary**: Verify that after a service restart, all in-memory state (_active_detections, _event_queues) is cleanly reset.
-**Traces to**: RESTRICT-OP-5, RESTRICT-OP-6
-
-**Preconditions**:
-- Previous detection may have been in progress
-
-**Fault injection**:
-- Restart detections container
-
-**Steps**:
-
-| Step | Action | Expected Behavior |
-|------|--------|------------------|
-| 1 | Restart detections container | — |
-| 2 | `GET /health` | Returns `aiAvailability: "None"` (fresh start) |
-| 3 | `POST /detect/any-media-id` | Accepted (no stale _active_detections blocking it) |
-
-**Pass criteria**: No stale state from previous session. All endpoints functional after restart.
-
----
-
-## Security Tests
-
-### NFT-SEC-01: Malformed multipart payload handling
-
-**Summary**: Verify that the service handles malformed multipart requests without crashing.
-**Traces to**: AC-API-2 (security)
-
-**Steps**:
-
-| Step | Consumer Action | Expected Response |
-|------|----------------|------------------|
-| 1 | Send `POST /detect` with truncated multipart body (missing boundary) | 400 or 422 — not 500 |
-| 2 | Send `POST /detect` with Content-Type: multipart but no file part | 400 — empty image |
-| 3 | `GET /health` after malformed requests | Service is still healthy |
-
-**Pass criteria**: All malformed requests return 4xx. Service remains operational.
-
----
-
-### NFT-SEC-02: Oversized request body
-
-**Summary**: Verify system behavior when an extremely large file is uploaded.
-**Traces to**: RESTRICT-OP-4
-
-**Steps**:
-
-| Step | Consumer Action | Expected Response |
-|------|----------------|------------------|
-| 1 | Send `POST /detect` with a 500 MB random file | Error response (413, 400, or timeout) — not OOM crash |
-| 2 | `GET /health` | Service is still running |
-
-**Pass criteria**: Service does not crash or run out of memory. Returns an error or times out gracefully.
-
----
-
-### NFT-SEC-03: JWT token is forwarded without modification
-
-**Summary**: Verify that the Authorization header is forwarded to the Annotations service as-is.
-**Traces to**: AC-API-3
-
-**Steps**:
-
-| Step | Consumer Action | Expected Response |
-|------|----------------|------------------|
-| 1 | `POST /detect/test-media-sec` with `Authorization: Bearer test-jwt-123` and `x-refresh-token: refresh-456` | `{"status": "started"}` |
-| 2 | After processing, query mock-annotations `GET /mock/annotations` | Recorded request contains `Authorization: Bearer test-jwt-123` header |
-
-**Pass criteria**: Exact token received by mock-annotations matches what the consumer sent.
-
----
-
-## Resource Limit Tests
-
-### NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
-
-**Summary**: Verify that no more than 2 inference operations run simultaneously.
-**Traces to**: RESTRICT-HW-3
-
-**Preconditions**:
-- Engine is initialized
-
-**Monitoring**:
-- Track concurrent request timings
-
-**Steps**:
-
-| Step | Consumer Action | Expected Behavior |
-|------|----------------|------------------|
-| 1 | Send 4 concurrent `POST /detect` requests | — |
-| 2 | Measure response arrival times | First 2 complete roughly together; next 2 complete after |
-
-**Duration**: ~60s
-**Pass criteria**: Clear evidence of 2-at-a-time processing (second batch starts after first completes). All 4 requests eventually succeed.
-
----
-
-### NFT-RES-LIM-02: SSE queue depth limit (100 events)
-
-**Summary**: Verify that the SSE queue per client does not exceed 100 events.
-**Traces to**: AC-API-4
-
-**Preconditions**:
-- Engine is initialized
-
-**Monitoring**:
-- SSE event count
-
-**Steps**:
-
-| Step | Consumer Action | Expected Behavior |
-|------|----------------|------------------|
-| 1 | Open SSE connection but do not read (stall client) | — |
-| 2 | Trigger async detection that produces > 100 events | — |
-| 3 | After processing completes, drain the SSE queue | ≤ 100 events received |
-
-**Duration**: ~120s
-**Pass criteria**: No more than 100 events buffered. No OOM or connection errors from queue growth.
-
----
-
-### NFT-RES-LIM-03: Max 300 detections per frame
-
-**Summary**: Verify that the system returns at most 300 detections per frame (model output limit).
-**Traces to**: RESTRICT-SW-6
-
-**Preconditions**:
-- Engine is initialized
-- Image with dense scene expected to produce many detections
-
-**Monitoring**:
-- Detection count per response
-
-**Duration**: ~30s
-**Pass criteria**: No response contains more than 300 detections. Dense images hit the cap without errors.
-
----
-
-### NFT-RES-LIM-04: Log file rotation and retention
-
-**Summary**: Verify that log files rotate daily and are retained for 30 days.
-**Traces to**: AC-LOG-1, AC-LOG-2
-
-**Preconditions**:
-- Detections service running with Logs/ volume mounted for inspection
-
-**Monitoring**:
-- Log file creation, naming, and count
-
-**Steps**:
-
-| Step | Consumer Action | Expected Behavior |
-|------|----------------|------------------|
-| 1 | Make several detection requests | Logs written to `Logs/log_inference_YYYYMMDD.txt` |
-| 2 | Verify log file name matches current date | File name contains today's date |
-| 3 | Verify log content format | Contains INFO/DEBUG/WARNING entries with timestamps |
-
-**Duration**: ~10s
-**Pass criteria**: Log file exists with correct date-based naming. Content includes structured log entries.
diff --git a/_docs/02_document/integration_tests/functional_tests.md b/_docs/02_document/tests/blackbox-tests.md
similarity index 99%
rename from _docs/02_document/integration_tests/functional_tests.md
rename to _docs/02_document/tests/blackbox-tests.md
index fabd366..d091481 100644
--- a/_docs/02_document/integration_tests/functional_tests.md
+++ b/_docs/02_document/tests/blackbox-tests.md
@@ -1,4 +1,4 @@
-# E2E Functional Tests
+# Blackbox Tests
 
 ## Positive Scenarios
 
diff --git a/_docs/02_document/integration_tests/environment.md b/_docs/02_document/tests/environment.md
similarity index 98%
rename from _docs/02_document/integration_tests/environment.md
rename to _docs/02_document/tests/environment.md
index e8c727e..b1411cc 100644
--- a/_docs/02_document/integration_tests/environment.md
+++ b/_docs/02_document/tests/environment.md
@@ -1,9 +1,9 @@
-# E2E Test Environment
+# Test Environment
 
 ## Overview
 
 **System under test**: Azaion.Detections — FastAPI HTTP service exposing `POST /detect`, `POST /detect/{media_id}`, `GET /detect/stream`, `GET /health`
-**Consumer app purpose**: Standalone test runner that exercises the detection service through its public HTTP/SSE interfaces, validating end-to-end use cases without access to internals.
+**Consumer app purpose**: Standalone test runner that exercises the detection service through its public HTTP/SSE interfaces, validating black-box use cases without access to internals.
 
 ## Docker Environment
 
diff --git a/_docs/02_document/tests/performance-tests.md b/_docs/02_document/tests/performance-tests.md
new file mode 100644
index 0000000..ce6c633
--- /dev/null
+++ b/_docs/02_document/tests/performance-tests.md
@@ -0,0 +1,85 @@
+# Performance Tests
+
+### NFT-PERF-01: Single image detection latency
+
+**Summary**: Measure end-to-end latency for a single small image detection request after engine is warm.
+**Traces to**: AC-API-2
+**Metric**: Request-to-response latency (ms)
+
+**Preconditions**:
+- Engine is initialized and warm (at least 1 prior detection)
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Send 10 sequential `POST /detect` with small-image | Record each request-response latency |
+| 2 | Compute p50, p95, p99 | — |
+
+**Pass criteria**: p95 latency < 5000ms for ONNX CPU, p95 < 1000ms for TensorRT GPU
+**Duration**: ~60s (10 requests)
+
+---
+
+### NFT-PERF-02: Concurrent inference throughput
+
+**Summary**: Verify the system handles 2 concurrent inference requests (ThreadPoolExecutor limit).
+**Traces to**: RESTRICT-HW-3
+**Metric**: Throughput (requests/second), latency under concurrency
+
+**Preconditions**:
+- Engine is initialized and warm
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Send 2 concurrent `POST /detect` requests with small-image | Measure both response times |
+| 2 | Send 3 concurrent requests | Third request should queue behind the first two |
+| 3 | Record total time for 3 concurrent requests vs 2 concurrent | — |
+
+**Pass criteria**: 2 concurrent requests complete without error. 3 concurrent requests: total time > time for 2 (queuing observed).
+**Duration**: ~30s
+
+---
+
+### NFT-PERF-03: Large image tiling processing time
+
+**Summary**: Measure processing time for a large image that triggers GSD-based tiling.
+**Traces to**: AC-IP-2
+**Metric**: Total processing time (ms), tiles processed
+
+**Preconditions**:
+- Engine is initialized and warm
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | `POST /detect` with large-image (4000×3000) and GSD config | Record total response time |
+| 2 | Compare with small-image baseline from NFT-PERF-01 | Ratio indicates tiling overhead |
+
+**Pass criteria**: Request completes within 120s. Processing time scales proportionally with number of tiles (not exponentially).
+**Duration**: ~120s
+
+---
+
+### NFT-PERF-04: Video processing frame rate
+
+**Summary**: Measure effective frame processing rate during video detection.
+**Traces to**: AC-VP-1
+**Metric**: Frames processed per second, total processing time
+
+**Preconditions**:
+- Engine is initialized and warm
+- SSE client connected
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | `POST /detect/test-media-perf` with test-video and `frame_period_recognition: 4` | — |
+| 2 | Count SSE events and measure total time from "started" to "AIProcessed" | Compute frames/second |
+
+**Pass criteria**: Processing completes within 5× video duration (10s video → < 50s processing). Frame processing rate is consistent (no stalls > 10s between events).
+**Duration**: ~120s
diff --git a/_docs/02_document/tests/resilience-tests.md b/_docs/02_document/tests/resilience-tests.md
new file mode 100644
index 0000000..f3e2ced
--- /dev/null
+++ b/_docs/02_document/tests/resilience-tests.md
@@ -0,0 +1,94 @@
+# Resilience Tests
+
+### NFT-RES-01: Loader service outage after engine initialization
+
+**Summary**: Verify that detections continue working when the Loader service goes down after the engine is already loaded.
+**Traces to**: RESTRICT-ENV-1
+
+**Preconditions**:
+- Engine is initialized (model already downloaded)
+
+**Fault injection**:
+- Stop mock-loader service
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Stop mock-loader | — |
+| 2 | `POST /detect` with small-image | 200 OK — detection succeeds (engine already in memory) |
+| 3 | `GET /health` | `aiAvailability` remains "Enabled" |
+
+**Pass criteria**: Detection continues to work. Health status remains stable. No errors from loader unavailability.
+
+---
+
+### NFT-RES-02: Annotations service outage during async detection
+
+**Summary**: Verify that async detection completes and delivers SSE events even when Annotations service is down.
+**Traces to**: RESTRICT-ENV-2
+
+**Preconditions**:
+- Engine is initialized
+- SSE client connected
+
+**Fault injection**:
+- Stop mock-annotations mid-processing
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Start async detection: `POST /detect/test-media-res01` | `{"status": "started"}` |
+| 2 | After first few SSE events, stop mock-annotations | — |
+| 3 | Continue listening to SSE | Events continue arriving. Annotations POST failures are silently caught |
+| 4 | Wait for completion | Final `AIProcessed` event received |
+
+**Pass criteria**: Detection pipeline completes fully. SSE delivery is unaffected. No crash or 500 errors.
+
+---
+
+### NFT-RES-03: Engine initialization retry after transient loader failure
+
+**Summary**: Verify that if model download fails on first attempt, a subsequent detection request retries initialization.
+**Traces to**: AC-EL-2
+
+**Preconditions**:
+- Fresh service (engine not initialized)
+
+**Fault injection**:
+- Mock-loader returns 503 on first model request, then recovers
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Configure mock-loader to fail first request | — |
+| 2 | `POST /detect` with small-image | Error (503 or 422) |
+| 3 | Configure mock-loader to succeed | — |
+| 4 | `POST /detect` with small-image | 200 OK — engine initializes on retry |
+
+**Pass criteria**: Second detection succeeds after loader recovers. System does not permanently lock into error state.
+
+---
+
+### NFT-RES-04: Service restart with in-memory state loss
+
+**Summary**: Verify that after a service restart, all in-memory state (_active_detections, _event_queues) is cleanly reset.
+**Traces to**: RESTRICT-OP-5, RESTRICT-OP-6
+
+**Preconditions**:
+- Previous detection may have been in progress
+
+**Fault injection**:
+- Restart detections container
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Restart detections container | — |
+| 2 | `GET /health` | Returns `aiAvailability: "None"` (fresh start) |
+| 3 | `POST /detect/any-media-id` | Accepted (no stale _active_detections blocking it) |
+
+**Pass criteria**: No stale state from previous session. All endpoints functional after restart.
diff --git a/_docs/02_document/tests/resource-limit-tests.md b/_docs/02_document/tests/resource-limit-tests.md
new file mode 100644
index 0000000..9aad7ba
--- /dev/null
+++ b/_docs/02_document/tests/resource-limit-tests.md
@@ -0,0 +1,87 @@
+# Resource Limit Tests
+
+### NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
+
+**Summary**: Verify that no more than 2 inference operations run simultaneously.
+**Traces to**: RESTRICT-HW-3
+
+**Preconditions**:
+- Engine is initialized
+
+**Monitoring**:
+- Track concurrent request timings
+
+**Steps**:
+
+| Step | Consumer Action | Expected Behavior |
+|------|----------------|------------------|
+| 1 | Send 4 concurrent `POST /detect` requests | — |
+| 2 | Measure response arrival times | First 2 complete roughly together; next 2 complete after |
+
+**Duration**: ~60s
+**Pass criteria**: Clear evidence of 2-at-a-time processing (second batch starts after first completes). All 4 requests eventually succeed.
+
+---
+
+### NFT-RES-LIM-02: SSE queue depth limit (100 events)
+
+**Summary**: Verify that the SSE queue per client does not exceed 100 events.
+**Traces to**: AC-API-4
+
+**Preconditions**:
+- Engine is initialized
+
+**Monitoring**:
+- SSE event count
+
+**Steps**:
+
+| Step | Consumer Action | Expected Behavior |
+|------|----------------|------------------|
+| 1 | Open SSE connection but do not read (stall client) | — |
+| 2 | Trigger async detection that produces > 100 events | — |
+| 3 | After processing completes, drain the SSE queue | ≤ 100 events received |
+
+**Duration**: ~120s
+**Pass criteria**: No more than 100 events buffered. No OOM or connection errors from queue growth.
+
+---
+
+### NFT-RES-LIM-03: Max 300 detections per frame
+
+**Summary**: Verify that the system returns at most 300 detections per frame (model output limit).
+**Traces to**: RESTRICT-SW-6
+
+**Preconditions**:
+- Engine is initialized
+- Image with dense scene expected to produce many detections
+
+**Monitoring**:
+- Detection count per response
+
+**Duration**: ~30s
+**Pass criteria**: No response contains more than 300 detections. Dense images hit the cap without errors.
+
+---
+
+### NFT-RES-LIM-04: Log file rotation and retention
+
+**Summary**: Verify that log files rotate daily and are retained for 30 days.
+**Traces to**: AC-LOG-1, AC-LOG-2
+
+**Preconditions**:
+- Detections service running with Logs/ volume mounted for inspection
+
+**Monitoring**:
+- Log file creation, naming, and count
+
+**Steps**:
+
+| Step | Consumer Action | Expected Behavior |
+|------|----------------|------------------|
+| 1 | Make several detection requests | Logs written to `Logs/log_inference_YYYYMMDD.txt` |
+| 2 | Verify log file name matches current date | File name contains today's date |
+| 3 | Verify log content format | Contains INFO/DEBUG/WARNING entries with timestamps |
+
+**Duration**: ~10s
+**Pass criteria**: Log file exists with correct date-based naming. Content includes structured log entries.
diff --git a/_docs/02_document/tests/security-tests.md b/_docs/02_document/tests/security-tests.md
new file mode 100644
index 0000000..355a771
--- /dev/null
+++ b/_docs/02_document/tests/security-tests.md
@@ -0,0 +1,48 @@
+# Security Tests
+
+### NFT-SEC-01: Malformed multipart payload handling
+
+**Summary**: Verify that the service handles malformed multipart requests without crashing.
+**Traces to**: AC-API-2 (security)
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | Send `POST /detect` with truncated multipart body (missing boundary) | 400 or 422 — not 500 |
+| 2 | Send `POST /detect` with Content-Type: multipart but no file part | 400 — empty image |
+| 3 | `GET /health` after malformed requests | Service is still healthy |
+
+**Pass criteria**: All malformed requests return 4xx. Service remains operational.
+
+---
+
+### NFT-SEC-02: Oversized request body
+
+**Summary**: Verify system behavior when an extremely large file is uploaded.
+**Traces to**: RESTRICT-OP-4
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | Send `POST /detect` with a 500 MB random file | Error response (413, 400, or timeout) — not OOM crash |
+| 2 | `GET /health` | Service is still running |
+
+**Pass criteria**: Service does not crash or run out of memory. Returns an error or times out gracefully.
+
+---
+
+### NFT-SEC-03: JWT token is forwarded without modification
+
+**Summary**: Verify that the Authorization header is forwarded to the Annotations service as-is.
+**Traces to**: AC-API-3
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | `POST /detect/test-media-sec` with `Authorization: Bearer test-jwt-123` and `x-refresh-token: refresh-456` | `{"status": "started"}` |
+| 2 | After processing, query mock-annotations `GET /mock/annotations` | Recorded request contains `Authorization: Bearer test-jwt-123` header |
+
+**Pass criteria**: Exact token received by mock-annotations matches what the consumer sent.
diff --git a/_docs/02_document/integration_tests/test_data.md b/_docs/02_document/tests/test-data.md
similarity index 99%
rename from _docs/02_document/integration_tests/test_data.md
rename to _docs/02_document/tests/test-data.md
index ae7661d..6b02db3 100644
--- a/_docs/02_document/integration_tests/test_data.md
+++ b/_docs/02_document/tests/test-data.md
@@ -1,4 +1,4 @@
-# E2E Test Data Management
+# Test Data Management
 
 ## Seed Data Sets
 
diff --git a/_docs/02_document/integration_tests/traceability_matrix.md b/_docs/02_document/tests/traceability-matrix.md
similarity index 93%
rename from _docs/02_document/integration_tests/traceability_matrix.md
rename to _docs/02_document/tests/traceability-matrix.md
index 135781c..00f126c 100644
--- a/_docs/02_document/integration_tests/traceability_matrix.md
+++ b/_docs/02_document/tests/traceability-matrix.md
@@ -1,4 +1,4 @@
-# E2E Traceability Matrix
+# Traceability Matrix
 
 ## Acceptance Criteria Coverage
 
@@ -44,8 +44,8 @@
 | RESTRICT-ENV-2 | ANNOTATIONS_URL must be reachable for result posting | FT-N-07, NFT-RES-02 | Covered |
 | RESTRICT-ENV-3 | Logs/ directory must be writable | NFT-RES-LIM-04 | Covered |
 | RESTRICT-OP-1 | Stateless — no local persistence of detection results | NFT-RES-04 | Covered |
-| RESTRICT-OP-2 | No TLS at application level | — | NOT COVERED — infrastructure-level concern; out of scope for application E2E tests |
-| RESTRICT-OP-3 | No CORS configuration | — | NOT COVERED — requires browser-based testing; out of scope for API-level E2E |
+| RESTRICT-OP-2 | No TLS at application level | — | NOT COVERED — infrastructure-level concern; out of scope for application blackbox tests |
+| RESTRICT-OP-3 | No CORS configuration | — | NOT COVERED — requires browser-based testing; out of scope for API-level blackbox tests |
 | RESTRICT-OP-4 | No rate limiting | NFT-SEC-02 | Covered |
 | RESTRICT-OP-5 | No graceful shutdown — in-progress detections not drained | NFT-RES-04 | Covered |
 | RESTRICT-OP-6 | Single-instance in-memory state (not shared across instances) | NFT-RES-04 | Covered |
@@ -62,9 +62,9 @@
 
 | Item | Reason Not Covered | Risk | Mitigation |
 |------|-------------------|------|-----------|
-| AC-EL-4 (FP16 TensorRT) | Requires specific GPU with FP16 support; E2E test cannot control hardware capabilities | Low — TensorRT builder auto-detects FP16 | Verified during manual TensorRT build; logged by engine |
+| AC-EL-4 (FP16 TensorRT) | Requires specific GPU with FP16 support; blackbox test cannot control hardware capabilities | Low — TensorRT builder auto-detects FP16 | Verified during manual TensorRT build; logged by engine |
 | RESTRICT-HW-2 (90% GPU memory) | Requires controlled GPU memory environment with specific memory sizes | Low — hardcoded workspace fraction | Verified by observing TensorRT build logs on target hardware |
 | RESTRICT-SW-1 (Cython compilation) | Build-time constraint, not runtime behavior | Low — Docker build validates this | Docker build step serves as the validation gate |
 | RESTRICT-SW-3 (TensorRT non-portable) | Requires multiple GPU architectures in test environment | Low — engine filename encodes architecture | Architecture-specific filenames prevent incorrect loading |
 | RESTRICT-OP-2 (No TLS) | Infrastructure-level concern; application does not implement TLS | None — by design | TLS handled by reverse proxy / service mesh in deployment |
-| RESTRICT-OP-3 (No CORS) | Browser-specific concern; API-level E2E tests don't use browsers | Low — known limitation | Can be tested separately with browser automation if needed |
+| RESTRICT-OP-3 (No CORS) | Browser-specific concern; API-level blackbox tests don't use browsers | Low — known limitation | Can be tested separately with browser automation if needed |
diff --git a/design_skill.md b/design_skill.md
new file mode 100644
index 0000000..d76e68c
--- /dev/null
+++ b/design_skill.md
@@ -0,0 +1 @@
+find out complete skill for making design for UI. Starting from asking user, finishing generating nice html + css mockup. No figma involved, all completely on AI
\ No newline at end of file