Refactor testing framework to replace integration tests with blackbox tests across various skills and documentation. Update related workflows, templates, and task specifications to align with the new blackbox testing approach. Remove obsolete integration test files and enhance clarity in task management and reporting structures.

2026-06-22 13:21:07 +00:00 · 2026-03-24 03:38:36 +02:00
parent ae3ad50b9e
commit e609586c7c
49 changed files with 2222 additions and 872 deletions
@@ -1,24 +1,24 @@
-# E2E Functional Tests Template
+# Blackbox Tests Template

-Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`.
+Save as `DOCUMENT_DIR/tests/blackbox-tests.md`.

 ---

 ```markdown
-# E2E Functional Tests
+# Blackbox Tests

 ## Positive Scenarios

 ### FT-P-01: [Scenario Name]

-**Summary**: [One sentence: what end-to-end use case this validates]
+**Summary**: [One sentence: what black-box use case this validates]
 **Traces to**: AC-[ID], AC-[ID]
 **Category**: [which AC category — e.g., Position Accuracy, Image Processing, etc.]

 **Preconditions**:
 - [System state required before test]

-**Input data**: [reference to specific data set or file from test_data.md]
+**Input data**: [reference to specific data set or file from test-data.md]

 **Steps**:

@@ -71,8 +71,8 @@ Save as `DOCUMENT_DIR/integration_tests/functional_tests.md`.

 ## Guidance Notes

- Functional tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
+- Blackbox tests should typically trace to at least one acceptance criterion or restriction. Tests without a trace are allowed but should have a clear justification.
 - Positive scenarios validate the system does what it should.
 - Negative scenarios validate the system rejects or handles gracefully what it shouldn't accept.
 - Expected outcomes must be specific and measurable — not "works correctly" but "returns position within 50m of ground truth."
- Input data references should point to specific entries in test_data.md.
+- Input data references should point to specific entries in test-data.md.
@@ -80,7 +80,7 @@ Link to architecture.md and relevant component spec.]
 ### Definition of Done

 - [ ] All in-scope capabilities implemented
- [ ] Automated tests pass (unit + integration + e2e)
+- [ ] Automated tests pass (unit + blackbox)
 - [ ] Minimum coverage threshold met (75%)
 - [ ] Runbooks written (if applicable)
 - [ ] Documentation updated
@@ -1,97 +0,0 @@
-# E2E Non-Functional Tests Template
-
-Save as `DOCUMENT_DIR/integration_tests/non_functional_tests.md`.
-
---
-
-```markdown
-# E2E Non-Functional Tests
-
-## Performance Tests
-
-### NFT-PERF-01: [Test Name]
-
-**Summary**: [What performance characteristic this validates]
-**Traces to**: AC-[ID]
-**Metric**: [what is measured — latency, throughput, frame rate, etc.]
-
-**Preconditions**:
- [System state, load profile, data volume]
-
-**Steps**:
-
-| Step | Consumer Action | Measurement |
-|------|----------------|-------------|
-| 1 | [action] | [what to measure and how] |
-
-**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
-**Duration**: [how long the test runs]
-
---
-
-## Resilience Tests
-
-### NFT-RES-01: [Test Name]
-
-**Summary**: [What failure/recovery scenario this validates]
-**Traces to**: AC-[ID]
-
-**Preconditions**:
- [System state before fault injection]
-
-**Fault injection**:
- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
-
-**Steps**:
-
-| Step | Action | Expected Behavior |
-|------|--------|------------------|
-| 1 | [inject fault] | [system behavior during fault] |
-| 2 | [observe recovery] | [system behavior after recovery] |
-
-**Pass criteria**: [recovery time, data integrity, continued operation]
-
---
-
-## Security Tests
-
-### NFT-SEC-01: [Test Name]
-
-**Summary**: [What security property this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Steps**:
-
-| Step | Consumer Action | Expected Response |
-|------|----------------|------------------|
-| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
-
-**Pass criteria**: [specific security outcome]
-
---
-
-## Resource Limit Tests
-
-### NFT-RES-LIM-01: [Test Name]
-
-**Summary**: [What resource constraint this validates]
-**Traces to**: AC-[ID], RESTRICT-[ID]
-
-**Preconditions**:
- [System running under specified constraints]
-
-**Monitoring**:
- [What resources to monitor — memory, CPU, GPU, disk, temperature]
-
-**Duration**: [how long to run]
-**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
-```
-
---
-
-## Guidance Notes
-
- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
- Security tests at E2E level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
@@ -0,0 +1,35 @@
+# Performance Tests Template
+
+Save as `DOCUMENT_DIR/tests/performance-tests.md`.
+
+---
+
+```markdown
+# Performance Tests
+
+### NFT-PERF-01: [Test Name]
+
+**Summary**: [What performance characteristic this validates]
+**Traces to**: AC-[ID]
+**Metric**: [what is measured — latency, throughput, frame rate, etc.]
+
+**Preconditions**:
+- [System state, load profile, data volume]
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | [action] | [what to measure and how] |
+
+**Pass criteria**: [specific threshold — e.g., p95 latency < 400ms]
+**Duration**: [how long the test runs]
+```
+
+---
+
+## Guidance Notes
+
+- Performance tests should run long enough to capture steady-state behavior, not just cold-start.
+- Define clear pass/fail thresholds with specific metrics (p50, p95, p99 latency, throughput, etc.).
+- Include warm-up preconditions to separate initialization cost from steady-state performance.
@@ -0,0 +1,37 @@
+# Resilience Tests Template
+
+Save as `DOCUMENT_DIR/tests/resilience-tests.md`.
+
+---
+
+```markdown
+# Resilience Tests
+
+### NFT-RES-01: [Test Name]
+
+**Summary**: [What failure/recovery scenario this validates]
+**Traces to**: AC-[ID]
+
+**Preconditions**:
+- [System state before fault injection]
+
+**Fault injection**:
+- [What fault is introduced — process kill, network partition, invalid input sequence, etc.]
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | [inject fault] | [system behavior during fault] |
+| 2 | [observe recovery] | [system behavior after recovery] |
+
+**Pass criteria**: [recovery time, data integrity, continued operation]
+```
+
+---
+
+## Guidance Notes
+
+- Resilience tests must define both the fault and the expected recovery — not just "system should recover."
+- Include specific recovery time expectations and data integrity checks.
+- Test both graceful degradation (partial failure) and full recovery scenarios.
@@ -0,0 +1,31 @@
+# Resource Limit Tests Template
+
+Save as `DOCUMENT_DIR/tests/resource-limit-tests.md`.
+
+---
+
+```markdown
+# Resource Limit Tests
+
+### NFT-RES-LIM-01: [Test Name]
+
+**Summary**: [What resource constraint this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Preconditions**:
+- [System running under specified constraints]
+
+**Monitoring**:
+- [What resources to monitor — memory, CPU, GPU, disk, temperature]
+
+**Duration**: [how long to run]
+**Pass criteria**: [resource stays within limit — e.g., memory < 8GB throughout]
+```
+
+---
+
+## Guidance Notes
+
+- Resource limit tests must specify monitoring duration — short bursts don't prove sustained compliance.
+- Define specific numeric limits that can be programmatically checked.
+- Include both the monitoring method and the threshold in the pass criteria.
@@ -0,0 +1,30 @@
+# Security Tests Template
+
+Save as `DOCUMENT_DIR/tests/security-tests.md`.
+
+---
+
+```markdown
+# Security Tests
+
+### NFT-SEC-01: [Test Name]
+
+**Summary**: [What security property this validates]
+**Traces to**: AC-[ID], RESTRICT-[ID]
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | [attempt unauthorized access / injection / etc.] | [rejection / no data leak / etc.] |
+
+**Pass criteria**: [specific security outcome]
+```
+
+---
+
+## Guidance Notes
+
+- Security tests at blackbox level focus on black-box attacks (unauthorized API calls, malformed input), not code-level vulnerabilities.
+- Verify the system remains operational after security-related edge cases (no crash, no hang).
+- Test authentication/authorization boundaries from the consumer's perspective.
@@ -1,11 +1,11 @@
-# E2E Test Data Template
+# Test Data Template

-Save as `DOCUMENT_DIR/integration_tests/test_data.md`.
+Save as `DOCUMENT_DIR/tests/test-data.md`.

 ---

 ```markdown
-# E2E Test Data Management
+# Test Data Management

 ## Seed Data Sets

@@ -23,6 +23,12 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`.
 |-----------------|----------------|-------------|-----------------|
 | [filename] | `_docs/00_problem/input_data/[filename]` | [what it contains] | [test IDs that use this data] |

+## Expected Results Mapping
+
+| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
+|-----------------|------------|-----------------|-------------------|-----------|----------------------|
+| [test ID] | `input_data/[filename]` | [quantifiable expected output] | [exact / tolerance / pattern / threshold / file-diff] | [± value or N/A] | `input_data/expected_results/[filename]` or inline |
+
 ## External Dependency Mocks

 | External Service | Mock/Stub | How Provided | Behavior |
@@ -42,5 +48,8 @@ Save as `DOCUMENT_DIR/integration_tests/test_data.md`.

 - Every seed data set should be traceable to specific test scenarios.
 - Input data from `_docs/00_problem/input_data/` should be mapped to test scenarios that use it.
+- Every input data item MUST have a corresponding expected result in the Expected Results Mapping table.
+- Expected results MUST be quantifiable: exact values, numeric tolerances, pattern matches, thresholds, or reference files. "Works correctly" is never acceptable.
+- For complex expected outputs, provide machine-readable reference files (JSON, CSV) in `_docs/00_problem/input_data/expected_results/` and reference them in the mapping.
 - External mocks must be deterministic — same input always produces same output.
 - Data isolation must guarantee no test can affect another test's outcome.
@@ -1,16 +1,16 @@
-# E2E Test Environment Template
+# Test Environment Template

-Save as `DOCUMENT_DIR/integration_tests/environment.md`.
+Save as `DOCUMENT_DIR/tests/environment.md`.

 ---

 ```markdown
-# E2E Test Environment
+# Test Environment

 ## Overview

 **System under test**: [main system name and entry points — API URLs, message queues, serial ports, etc.]
-**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals.
+**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating black-box use cases without access to internals.

 ## Docker Environment

@@ -17,7 +17,7 @@ Use this template for each component's test spec. Save as `components/[##]_[name

 ---

-## Integration Tests
+## Blackbox Tests

 ### IT-01: [Test Name]

@@ -169,4 +169,4 @@ Use this template for each component's test spec. Save as `components/[##]_[name
 - If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2").
 - Performance test targets should come from the NFR section in `architecture.md`.
 - Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component.
- Not every component needs all 4 test types. A stateless utility component may only need integration tests.
+- Not every component needs all 4 test types. A stateless utility component may only need blackbox tests.
@@ -1,11 +1,11 @@
-# E2E Traceability Matrix Template
+# Traceability Matrix Template

-Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.
+Save as `DOCUMENT_DIR/tests/traceability-matrix.md`.

 ---

 ```markdown
-# E2E Traceability Matrix
+# Traceability Matrix

 ## Acceptance Criteria Coverage

@@ -34,7 +34,7 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.

 | Item | Reason Not Covered | Risk | Mitigation |
 |------|-------------------|------|-----------|
-| [AC/Restriction ID] | [why it cannot be tested at E2E level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
+| [AC/Restriction ID] | [why it cannot be tested at blackbox level] | [what could go wrong] | [how risk is addressed — e.g., covered by component tests in Step 5] |
 ```

 ---
@@ -44,4 +44,4 @@ Save as `DOCUMENT_DIR/integration_tests/traceability_matrix.md`.
 - Every acceptance criterion must appear in the matrix — either covered or explicitly marked as not covered with a reason.
 - Every restriction must appear in the matrix.
 - NOT COVERED items must have a reason and a mitigation strategy (e.g., "covered at component test level" or "requires real hardware").
- Coverage percentage should be at least 75% for acceptance criteria at the E2E level.
+- Coverage percentage should be at least 75% for acceptance criteria at the blackbox test level.