Files
loader/.cursor/skills/test-spec/templates/expected-results.md
T
Oleksandr Bezdieniezhnykh b0a03d36d6 Add .cursor AI autodevelopment harness (agents, skills, rules)
Made-with: Cursor
2026-03-26 01:06:55 +02:00

136 lines
6.6 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Expected Results Template
Save as `_docs/00_problem/input_data/expected_results/results_report.md`.
For complex expected outputs, place reference CSV files alongside it in `_docs/00_problem/input_data/expected_results/`.
Referenced by the test-spec skill (`.cursor/skills/test-spec/SKILL.md`).
---
```markdown
# Expected Results
Maps every input data item to its quantifiable expected result.
Tests use this mapping to compare actual system output against known-correct answers.
## Result Format Legend
| Result Type | When to Use | Example |
|-------------|-------------|---------|
| Exact value | Output must match precisely | `status_code: 200`, `detection_count: 3` |
| Tolerance range | Numeric output with acceptable variance | `confidence: 0.92 ± 0.05`, `bbox_x: 120 ± 10px` |
| Threshold | Output must exceed or stay below a limit | `latency < 500ms`, `confidence ≥ 0.85` |
| Pattern match | Output must match a string/regex pattern | `error_message contains "invalid format"` |
| File reference | Complex output compared against a reference file | `match expected_results/case_01.json` |
| Schema match | Output structure must conform to a schema | `response matches DetectionResultSchema` |
| Set/count | Output must contain specific items or counts | `classes ⊇ {"car", "person"}`, `detections.length == 5` |
## Comparison Methods
| Method | Description | Tolerance Syntax |
|--------|-------------|-----------------|
| `exact` | Actual == Expected | N/A |
| `numeric_tolerance` | abs(actual - expected) ≤ tolerance | `± <value>` or `± <percent>%` |
| `range` | min ≤ actual ≤ max | `[min, max]` |
| `threshold_min` | actual ≥ threshold | `≥ <value>` |
| `threshold_max` | actual ≤ threshold | `≤ <value>` |
| `regex` | actual matches regex pattern | regex string |
| `substring` | actual contains substring | substring |
| `json_diff` | structural comparison against reference JSON | diff tolerance per field |
| `set_contains` | actual output set contains expected items | subset notation |
| `file_reference` | compare against reference file in expected_results/ | file path |
## Input → Expected Result Mapping
### [Scenario Group Name, e.g. "Single Image Detection"]
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `[file or parameters]` | [what this input represents] | [quantifiable expected output] | [method from table above] | [± value, range, or N/A] | [path in expected_results/ or N/A] |
#### Example — Object Detection
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `image_01.jpg` | Aerial photo, 3 vehicles visible | `detection_count: 3`, classes: `["ArmorVehicle", "ArmorVehicle", "Truck"]` | exact (count), set_contains (classes) | N/A | N/A |
| 2 | `image_01.jpg` | Same image, bbox positions | bboxes: `[(120,80,340,290), (400,150,580,310), (50,400,200,520)]` | numeric_tolerance | ± 15px per coordinate | `expected_results/image_01_detections.json` |
| 3 | `image_01.jpg` | Same image, confidence scores | confidences: `[0.94, 0.88, 0.91]` | threshold_min | each ≥ 0.85 | N/A |
| 4 | `empty_scene.jpg` | Aerial photo, no objects | `detection_count: 0`, empty detections array | exact | N/A | N/A |
| 5 | `corrupted.dat` | Invalid file format | HTTP 400, body contains `"error"` key | exact (status), substring (body) | N/A | N/A |
#### Example — Performance
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `standard_image.jpg` | 1920x1080 single image | Response time | threshold_max | ≤ 2000ms | N/A |
| 2 | `large_image.jpg` | 8000x6000 tiled image | Response time | threshold_max | ≤ 10000ms | N/A |
#### Example — Error Handling
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `POST /detect` with no file | Missing required input | HTTP 422, message matches `"file.*required"` | exact (status), regex (message) | N/A | N/A |
| 2 | `POST /detect` with `probability_threshold: 5.0` | Out-of-range config | HTTP 422 or clamped to valid range | exact (status) or range [0.0, 1.0] | N/A | N/A |
## Expected Result Reference Files
When the expected output is too complex for an inline table cell (e.g., full JSON response with nested objects), place a reference file in `_docs/00_problem/input_data/expected_results/`.
### File Naming Convention
`<input_name>_expected.<format>`
Examples:
- `image_01_detections.json`
- `batch_A_results.csv`
- `video_01_annotations.json`
### Reference File Requirements
- Must be machine-readable (JSON, CSV, YAML — not prose)
- Must contain only the expected output structure and values
- Must include tolerance annotations where applicable (as metadata fields or comments)
- Must be valid and parseable by standard libraries
### Reference File Example (JSON)
File: `expected_results/image_01_detections.json`
```json
{
"input": "image_01.jpg",
"expected": {
"detection_count": 3,
"detections": [
{
"class": "ArmorVehicle",
"confidence": { "min": 0.85 },
"bbox": { "x1": 120, "y1": 80, "x2": 340, "y2": 290, "tolerance_px": 15 }
},
{
"class": "ArmorVehicle",
"confidence": { "min": 0.85 },
"bbox": { "x1": 400, "y1": 150, "x2": 580, "y2": 310, "tolerance_px": 15 }
},
{
"class": "Truck",
"confidence": { "min": 0.85 },
"bbox": { "x1": 50, "y1": 400, "x2": 200, "y2": 520, "tolerance_px": 15 }
}
]
}
}
```
```
---
## Guidance Notes
- Every row in the mapping table must have at least one quantifiable comparison — no row should say only "should work" or "returns result".
- Use `exact` comparison for counts, status codes, and discrete values.
- Use `numeric_tolerance` for floating-point values and spatial coordinates where minor variance is expected.
- Use `threshold_min`/`threshold_max` for performance metrics and confidence scores.
- Use `file_reference` when the expected output has more than ~3 fields or nested structures.
- Reference files must be committed alongside input data — they are part of the test specification.
- When the system has non-deterministic behavior (e.g., model inference variance across hardware), document the expected tolerance explicitly and justify it.