mirror of https://github.com/azaion/loader.git synced 2026-04-22 22:46:32 +00:00

Files

T

Oleksandr Bezdieniezhnykh b0a03d36d6 Add .cursor AI autodevelopment harness (agents, skills, rules)

Made-with: Cursor

2026-03-26 01:06:55 +02:00

6.6 KiB

Raw Blame History

Expected Results Template

Save as _docs/00_problem/input_data/expected_results/results_report.md. For complex expected outputs, place reference CSV files alongside it in _docs/00_problem/input_data/expected_results/. Referenced by the test-spec skill (.cursor/skills/test-spec/SKILL.md).

# Expected Results

Maps every input data item to its quantifiable expected result.
Tests use this mapping to compare actual system output against known-correct answers.

## Result Format Legend

| Result Type | When to Use | Example |
|-------------|-------------|---------|
| Exact value | Output must match precisely | `status_code: 200`, `detection_count: 3` |
| Tolerance range | Numeric output with acceptable variance | `confidence: 0.92 ± 0.05`, `bbox_x: 120 ± 10px` |
| Threshold | Output must exceed or stay below a limit | `latency < 500ms`, `confidence ≥ 0.85` |
| Pattern match | Output must match a string/regex pattern | `error_message contains "invalid format"` |
| File reference | Complex output compared against a reference file | `match expected_results/case_01.json` |
| Schema match | Output structure must conform to a schema | `response matches DetectionResultSchema` |
| Set/count | Output must contain specific items or counts | `classes ⊇ {"car", "person"}`, `detections.length == 5` |

## Comparison Methods

| Method | Description | Tolerance Syntax |
|--------|-------------|-----------------|
| `exact` | Actual == Expected | N/A |
| `numeric_tolerance` | abs(actual - expected) ≤ tolerance | `± <value>` or `± <percent>%` |
| `range` | min ≤ actual ≤ max | `[min, max]` |
| `threshold_min` | actual ≥ threshold | `≥ <value>` |
| `threshold_max` | actual ≤ threshold | `≤ <value>` |
| `regex` | actual matches regex pattern | regex string |
| `substring` | actual contains substring | substring |
| `json_diff` | structural comparison against reference JSON | diff tolerance per field |
| `set_contains` | actual output set contains expected items | subset notation |
| `file_reference` | compare against reference file in expected_results/ | file path |

## Input → Expected Result Mapping

### [Scenario Group Name, e.g. "Single Image Detection"]

| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `[file or parameters]` | [what this input represents] | [quantifiable expected output] | [method from table above] | [± value, range, or N/A] | [path in expected_results/ or N/A] |

#### Example — Object Detection

| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `image_01.jpg` | Aerial photo, 3 vehicles visible | `detection_count: 3`, classes: `["ArmorVehicle", "ArmorVehicle", "Truck"]` | exact (count), set_contains (classes) | N/A | N/A |
| 2 | `image_01.jpg` | Same image, bbox positions | bboxes: `[(120,80,340,290), (400,150,580,310), (50,400,200,520)]` | numeric_tolerance | ± 15px per coordinate | `expected_results/image_01_detections.json` |
| 3 | `image_01.jpg` | Same image, confidence scores | confidences: `[0.94, 0.88, 0.91]` | threshold_min | each ≥ 0.85 | N/A |
| 4 | `empty_scene.jpg` | Aerial photo, no objects | `detection_count: 0`, empty detections array | exact | N/A | N/A |
| 5 | `corrupted.dat` | Invalid file format | HTTP 400, body contains `"error"` key | exact (status), substring (body) | N/A | N/A |

#### Example — Performance

| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `standard_image.jpg` | 1920x1080 single image | Response time | threshold_max | ≤ 2000ms | N/A |
| 2 | `large_image.jpg` | 8000x6000 tiled image | Response time | threshold_max | ≤ 10000ms | N/A |

#### Example — Error Handling

| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `POST /detect` with no file | Missing required input | HTTP 422, message matches `"file.*required"` | exact (status), regex (message) | N/A | N/A |
| 2 | `POST /detect` with `probability_threshold: 5.0` | Out-of-range config | HTTP 422 or clamped to valid range | exact (status) or range [0.0, 1.0] | N/A | N/A |

## Expected Result Reference Files

When the expected output is too complex for an inline table cell (e.g., full JSON response with nested objects), place a reference file in `_docs/00_problem/input_data/expected_results/`.

### File Naming Convention

`<input_name>_expected.<format>`

Examples:
- `image_01_detections.json`
- `batch_A_results.csv`
- `video_01_annotations.json`

### Reference File Requirements

- Must be machine-readable (JSON, CSV, YAML — not prose)
- Must contain only the expected output structure and values
- Must include tolerance annotations where applicable (as metadata fields or comments)
- Must be valid and parseable by standard libraries

### Reference File Example (JSON)

File: `expected_results/image_01_detections.json`

```json
{
  "input": "image_01.jpg",
  "expected": {
    "detection_count": 3,
    "detections": [
      {
        "class": "ArmorVehicle",
        "confidence": { "min": 0.85 },
        "bbox": { "x1": 120, "y1": 80, "x2": 340, "y2": 290, "tolerance_px": 15 }
      },
      {
        "class": "ArmorVehicle",
        "confidence": { "min": 0.85 },
        "bbox": { "x1": 400, "y1": 150, "x2": 580, "y2": 310, "tolerance_px": 15 }
      },
      {
        "class": "Truck",
        "confidence": { "min": 0.85 },
        "bbox": { "x1": 50, "y1": 400, "x2": 200, "y2": 520, "tolerance_px": 15 }
      }
    ]
  }
}
```

Guidance Notes

Every row in the mapping table must have at least one quantifiable comparison — no row should say only "should work" or "returns result".
Use exact comparison for counts, status codes, and discrete values.
Use numeric_tolerance for floating-point values and spatial coordinates where minor variance is expected.
Use threshold_min/threshold_max for performance metrics and confidence scores.
Use file_reference when the expected output has more than ~3 fields or nested structures.
Reference files must be committed alongside input data — they are part of the test specification.
When the system has non-deterministic behavior (e.g., model inference variance across hardware), document the expected tolerance explicitly and justify it.

6.6 KiB Raw Blame History Unescape Escape

Expected Results Template

Guidance Notes

6.6 KiB

Raw Blame History