Refactor testing framework to replace integration tests with blackbox tests across various skills and documentation. Update related workflows, templates, and task specifications to align with the new blackbox testing approach. Remove obsolete integration test files and enhance clarity in task management and reporting structures.

2026-06-22 06:01:07 +00:00 · 2026-03-24 03:38:36 +02:00
parent ae3ad50b9e
commit e609586c7c
49 changed files with 2222 additions and 872 deletions
@@ -59,9 +59,9 @@ At the start of execution, create a TodoWrite with all steps (1 through 6 plus F

 ## Workflow

-### Step 1: Integration Tests
+### Step 1: Blackbox Tests

-Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`.
+Read and execute `.cursor/skills/test-spec/SKILL.md`.

 Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3.

@@ -111,7 +111,7 @@ Read and follow `steps/07_quality-checklist.md`.
 - **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input
 - **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output
 - **Ignoring restrictions.md**: every constraint must be traceable in the architecture or risk register
- **Ignoring integration test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)
+- **Ignoring blackbox test findings**: insights from Step 1 must feed into architecture (Step 2) and component decomposition (Step 3)

 ## Escalation Rules

@@ -135,7 +135,7 @@ Read and follow `steps/07_quality-checklist.md`.
 │ PREREQ: Data Gate (BLOCKING)                                    │
 │   → verify AC, restrictions, input_data, solution exist         │
 │                                                                │
-│ 1. Integration Tests   → blackbox-test-spec/SKILL.md            │
+│ 1. Blackbox Tests      → test-spec/SKILL.md                     │
 │    [BLOCKING: user confirms test coverage]                     │
 │ 2. Solution Analysis   → architecture, data model, deployment   │
 │    [BLOCKING: user confirms architecture]                      │
@@ -6,12 +6,15 @@ All artifacts are written directly under DOCUMENT_DIR:

 ```
 DOCUMENT_DIR/
-├── integration_tests/
-│   ├── environment.md
-│   ├── test_data.md
-│   ├── functional_tests.md
-│   ├── non_functional_tests.md
-│   └── traceability_matrix.md
+├── tests/
+│   ├── test-environment.md
+│   ├── test-data.md
+│   ├── blackbox-tests.md
+│   ├── performance-tests.md
+│   ├── resilience-tests.md
+│   ├── security-tests.md
+│   ├── resource-limit-tests.md
+│   └── traceability-matrix.md
 ├── architecture.md
 ├── system-flows.md
 ├── data_model.md
@@ -47,11 +50,14 @@ DOCUMENT_DIR/

 | Step | Save immediately after | Filename |
 |------|------------------------|----------|
-| Step 1 | Integration test environment spec | `integration_tests/environment.md` |
-| Step 1 | Integration test data spec | `integration_tests/test_data.md` |
-| Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
-| Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
-| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
+| Step 1 | Blackbox test environment spec | `tests/test-environment.md` |
+| Step 1 | Blackbox test data spec | `tests/test-data.md` |
+| Step 1 | Blackbox tests | `tests/blackbox-tests.md` |
+| Step 1 | Blackbox performance tests | `tests/performance-tests.md` |
+| Step 1 | Blackbox resilience tests | `tests/resilience-tests.md` |
+| Step 1 | Blackbox security tests | `tests/security-tests.md` |
+| Step 1 | Blackbox resource limit tests | `tests/resource-limit-tests.md` |
+| Step 1 | Blackbox traceability matrix | `tests/traceability-matrix.md` |
 | Step 2 | Architecture analysis complete | `architecture.md` |
 | Step 2 | System flows documented | `system-flows.md` |
 | Step 2 | Data model documented | `data_model.md` |
@@ -7,7 +7,7 @@
 ### Phase 2a: Architecture & Flows

 1. Read all input files thoroughly
-2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
+2. Incorporate findings, questions, and insights discovered during Step 1 (blackbox tests)
 3. Research unknown or questionable topics via internet; ask user about ambiguities
 4. Document architecture using `templates/architecture.md` as structure
 5. Document system flows using `templates/system-flows.md` as structure
@@ -17,7 +17,7 @@
 - [ ] System flows cover all main user/system interactions
 - [ ] No contradictions with problem.md or restrictions.md
 - [ ] Technology choices are justified
- [ ] Integration test findings are reflected in architecture decisions
+- [ ] Blackbox test findings are reflected in architecture decisions

 **Save action**: Write `architecture.md` and `system-flows.md`

@@ -5,7 +5,7 @@
 **Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.

 1. Identify components from the architecture; think about separation, reusability, and communication patterns
-2. Use integration test scenarios from Step 1 to validate component boundaries
+2. Use blackbox test scenarios from Step 1 to validate component boundaries
 3. If additional components are needed (data preparation, shared helpers), create them
 4. For each component, write a spec using `templates/component-spec.md` as structure
 5. Generate diagrams:
@@ -19,7 +19,7 @@
 - [ ] All inter-component interfaces are defined (who calls whom, with what)
 - [ ] Component dependency graph has no circular dependencies
 - [ ] All components from architecture.md are accounted for
- [ ] Every integration test scenario can be traced through component interactions
+- [ ] Every blackbox test scenario can be traced through component interactions

 **Save action**: Write:
 - each component `components/[##]_[name]/description.md`
@@ -35,7 +35,7 @@ Do NOT create minimal epics with just a summary and short description. The Jira

 **Self-verification**:
 - [ ] "Bootstrap & Initial Structure" epic exists and is first in order
- [ ] "Integration Tests" epic exists
+- [ ] "Blackbox Tests" epic exists
 - [ ] Every component maps to exactly one epic
 - [ ] Dependency order is respected (no epic depends on a later one)
 - [ ] Acceptance criteria are measurable
@@ -43,6 +43,6 @@ Do NOT create minimal epics with just a summary and short description. The Jira
 - [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
 - [ ] Epic descriptions are self-contained — readable without opening other files

-7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
+7. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`.

 **Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
@@ -2,8 +2,8 @@

 Before writing the final report, verify ALL of the following:

-### Integration Tests
- [ ] Every acceptance criterion is covered in traceability_matrix.md
+### Blackbox Tests
+- [ ] Every acceptance criterion is covered in traceability-matrix.md
 - [ ] Every restriction is verified by at least one test
 - [ ] Positive and negative scenarios are balanced
 - [ ] Docker environment is self-contained
@@ -14,7 +14,7 @@ Before writing the final report, verify ALL of the following:
 - [ ] Covers all capabilities from solution.md
 - [ ] Technology choices are justified
 - [ ] Deployment model is defined
- [ ] Integration test findings are reflected in architecture decisions
+- [ ] Blackbox test findings are reflected in architecture decisions

 ### Data Model
 - [ ] Every entity from architecture.md is defined
@@ -35,7 +35,7 @@ Before writing the final report, verify ALL of the following:
 - [ ] No circular dependencies
 - [ ] All inter-component interfaces are defined and consistent
 - [ ] No orphan components (unused by any flow)
- [ ] Every integration test scenario can be traced through component interactions
+- [ ] Every blackbox test scenario can be traced through component interactions

 ### Risks
 - [ ] All High/Critical risks have mitigations
@@ -49,7 +49,7 @@ Before writing the final report, verify ALL of the following:

 ### Epics
 - [ ] "Bootstrap & Initial Structure" epic exists
- [ ] "Integration Tests" epic exists
+- [ ] "Blackbox Tests" epic exists
 - [ ] Every component maps to an epic
 - [ ] Dependency order is correct
 - [ ] Acceptance criteria are measurable
@@ -1,24 +1,24 @@
-# E2E Functional Tests Template
+# Blackbox Tests Template

-Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`.
+Save as `DOCUMENT_DIR/tests/blackbox-tests.md`.

 ---

 ```markdown
-# E2E Functional Tests
+# Blackbox Tests

 ## Positive Scenarios

 ### FT-P-01: [Scenario Name]

-**Summary**: [One sentence: what end-to-end use case this validates]
+**Summary**: [One sentence: what black-box use case this validates]
 **Traces to**: AC-[ID], AC-[ID]
 **Category**: [which AC category — e.g., Position Accuracy, Image Processing, etc.]

 **Preconditions**:
 - [System state required before test]

-**Input data**: [reference to specific data set or file from test_data.md]
+**Input data**: [reference to specific data set or file from test-data.md]

 **Steps**:

@@ -71,8 +71,8 @@ Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`.

 ## Guidance Notes

- Functional tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
+- Blackbox tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
 - Positive scenarios validate the system does what it should.
 - Negative scenarios validate the system rejects or handles gracefully what it shouldn't accept.
 - Expected outcomes must be specific and measurable — not "works correctly" but "returns position within 50m of ground truth."
- Input data references should point to specific entries in test_data.md.
+- Input data references should point to specific entries in test-data.md.
@@ -80,7 +80,7 @@ Link to architecture.md and relevant component spec.]
 ### Definition of Done

 - [ ] All in-scope capabilities implemented
- [ ] Automated tests pass (unit + integration + e2e)
+- [ ] Automated tests pass (unit + blackbox)
 - [ ] Minimum coverage threshold met (75%)
 - [ ] Runbooks written (if applicable)
 - [ ] Documentation updated
@@ -1,97 +0,0 @@
-# E2E Non-Functional Tests Template
-
-Save as `DOCUMENT_DIR/integration_tests/non_functional_tests.md`.
-
---
-
-```markdown
-# E2E Non-Functional Tests
-
-## Performance Tests
-
-### NFT-PERF-01: [Test Name]
-
-**Summary**: [What performance characteristic this validates]
-**Traces to**: AC-[ID]
-**Metric**: [what is measured — latency, throughput, frame rate, etc.]
-
-**Preconditions**:
- [System state, load profile, data volume]
-
-**Steps**:
-
-| Step | Consumer Action | Measurement |
-|------|----------------|-------------|
-| 1 | [action] | [what to measure and how] |
-
-**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
-**Duration**: [how long the test runs]
-
---
-
-## Resilience Tests
-
-### NFT-RES-01: [Test Name]
-
-**Summary**: [What failure/recovery scenario this validates]
-**Traces to**: AC-[ID]
-
-**Preconditions**:
- [System state before fault injection]
-
-**Fault injection**:
- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
-
-**Steps**:
-
-| Step | Action | Expected Behavior |
-|------|--------|------------------|
-| 1 | [inject fault] | [system behavior during fault] |
-| 2 | [observe recovery] | [system behavior after recovery] |
-
-**Pass criteria**: [recovery time, data integrity, continued operation]
-
---
-
-## Security Tests
-
-### NFT-SEC-01: [Test Name]
-
-**Summary**: [What security property this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Steps**:
-
-| Step | Consumer Action | Expected Response |
-|------|----------------|------------------|
-| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
-
-**Pass criteria**: [specific security outcome]
-
---
-
-## Resource Limit Tests
-
-### NFT-RES-LIM-01: [Test Name]
-
-**Summary**: [What resource constraint this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Preconditions**:
- [System running under specified constraints]
-
-**Monitoring**:
- [What resources to monitor — memory, CPU, GPU, disk, temperature]
-
-**Duration**: [how long to run]
-**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
-```
-
---
-
-## Guidance Notes
-
- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
- Security tests at E2E level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
@@ -0,0 +1,35 @@
+# Performance Tests Template
+
+Save as `DOCUMENT_DIR/tests/performance-tests.md`.
+
+---
+
+```markdown
+# Performance Tests
+
+### NFT-PERF-01: [Test Name]
+
+**Summary**: [What performance characteristic this validates]
+**Traces to**: AC-[ID]
+**Metric**: [what is measured — latency, throughput, frame rate, etc.]
+
+**Preconditions**:
+- [System state, load profile, data volume]
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | [action] | [what to measure and how] |
+
+**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
+**Duration**: [how long the test runs]
+```
+
+---
+
+## Guidance Notes
+
+- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
+- Define clear pass/fail thresholds with specific metrics (p50, p95, p99 latency, throughput, etc.).
+- Include warm-up preconditions to separate initialization cost from steady-state performance.
@@ -0,0 +1,37 @@
+# Resilience Tests Template
+
+Save as `DOCUMENT_DIR/tests/resilience-tests.md`.
+
+---
+
+```markdown
+# Resilience Tests
+
+### NFT-RES-01: [Test Name]
+
+**Summary**: [What failure/recovery scenario this validates]
+**Traces to**: AC-[ID]
+
+**Preconditions**:
+- [System state before fault injection]
+
+**Fault injection**:
+- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | [inject fault] | [system behavior during fault] |
+| 2 | [observe recovery] | [system behavior after recovery] |
+
+**Pass criteria**: [recovery time, data integrity, continued operation]
+```
+
+---
+
+## Guidance Notes
+
+- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
+- Include specific recovery time expectations and data integrity checks.
+- Test both graceful degradation (partial failure) and full recovery scenarios.
@@ -0,0 +1,31 @@
+# Resource Limit Tests Template
+
+Save as `DOCUMENT_DIR/tests/resource-limit-tests.md`.
+
+---
+
+```markdown
+# Resource Limit Tests
+
+### NFT-RES-LIM-01: [Test Name]
+
+**Summary**: [What resource constraint this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Preconditions**:
+- [System running under specified constraints]
+
+**Monitoring**:
+- [What resources to monitor — memory, CPU, GPU, disk, temperature]
+
+**Duration**: [how long to run]
+**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
+```
+
+---
+
+## Guidance Notes
+
+- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
+- Define specific numeric limits that can be programmatically checked.
+- Include both the monitoring method and the threshold in the pass criteria.
@@ -0,0 +1,30 @@
+# Security Tests Template
+
+Save as `DOCUMENT_DIR/tests/security-tests.md`.
+
+---
+
+```markdown
+# Security Tests
+
+### NFT-SEC-01: [Test Name]
+
+**Summary**: [What security property this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
+
+**Pass criteria**: [specific security outcome]
+```
+
+---
+
+## Guidance Notes
+
+- Security tests at blackbox level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
+- Verify the system remains operational after security-related edge cases (no crash, no hang).
+- Test authentication/authorization boundaries from the consumer's perspective.
@@ -1,11 +1,11 @@
-# E2E Test Data Template
+# Test Data Template

-Save as `DOCUMENT_DIR/integration_tests/test_data.md`.
+Save as `DOCUMENT_DIR/tests/test-data.md`.

 ---

 ```markdown
-# E2E Test Data Management
+# Test Data Management

 ## Seed Data Sets

@@ -23,6 +23,12 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`.
 |-----------------|----------------|-------------|-----------------|
 | [filename] | `_docs/00_problem/input_data/[filename]` | [what it contains] | [test IDs that use this data] |

+## Expected Results Mapping
+
+| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
+|-----------------|------------|-----------------|-------------------|-----------|----------------------|
+| [test ID] | `input_data/[filename]` | [quantifiable expected output] | [exact / tolerance / pattern / threshold / file-diff] | [± value or N/A] | `input_data/expected_results/[filename]` or inline |
+
 ## External Dependency Mocks

 | External Service | Mock/Stub | How Provided | Behavior |
@@ -42,5 +48,8 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`.

 - Every seed data set should be traceable to specific test scenarios.
 - Input data from `_docs/00_problem/input_data/` should be mapped to test scenarios that use it.
+- Every input data item MUST have a corresponding expected result in the Expected Results Mapping table.
+- Expected results MUST be quantifiable: exact values, numeric tolerances, pattern matches, thresholds, or reference files. "Works correctly" is never acceptable.
+- For complex expected outputs, provide machine-readable reference files (JSON, CSV) in `_docs/00_problem/input_data/expected_results/` and reference them in the mapping.
 - External mocks must be deterministic — same input always produces same output.
 - Data isolation must guarantee no test can affect another test's outcome.
@@ -1,16 +1,16 @@
-# E2E Test Environment Template
+# Test Environment Template

-Save as `DOCUMENT_DIR/integration_tests/environment.md`.
+Save as `DOCUMENT_DIR/tests/environment.md`.

 ---

 ```markdown
-# E2E Test Environment
+# Test Environment

 ## Overview

 **System under test**: [main system name and entry points — API URLs, message queues, serial ports, etc.]
-**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals.
+**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating black-box use cases without access to internals.

 ## Docker Environment

@@ -17,7 +17,7 @@ Use this template for each component's test spec. Save as `components/[##]_[name

 ---

-## Integration Tests
+## Blackbox Tests

 ### IT-01: [Test Name]

@@ -169,4 +169,4 @@ Use this template for each component's test spec. Save as `components/[##]_[name
 - If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2").
 - Performance test targets should come from the NFR section in `architecture.md`.
 - Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component.
- Not every component needs all 4 test types. A stateless utility component may only need integration tests.
+- Not every component needs all 4 test types. A stateless utility component may only need blackbox tests.
@@ -1,11 +1,11 @@
-# E2E Traceability Matrix Template
+# Traceability Matrix Template

-Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.
+Save as `DOCUMENT_DIR/tests/traceability-matrix.md`.

 ---

 ```markdown
-# E2E Traceability Matrix
+# Traceability Matrix

 ## Acceptance Criteria Coverage

@@ -34,7 +34,7 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.

 | Item | Reason Not Covered | Risk | Mitigation |
 |------|-------------------|------|-----------|
-| [AC/Restriction ID] | [why it cannot be tested at E2E level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
+| [AC/Restriction ID] | [why it cannot be tested at blackbox level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
 ```

 ---
@@ -44,4 +44,4 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.
 - Every acceptance criterion must appear in the matrix — either covered or explicitly marked as not covered with a reason.
 - Every restriction must appear in the matrix.
 - NOT COVERED items must have a reason and a mitigation strategy (e.g., "covered at component test level" or "requires real hardware").
- Coverage percentage should be at least 75% for acceptance criteria at the E2E level.
+- Coverage percentage should be at least 75% for acceptance criteria at the blackbox test level.