# Expected Results Template Save as `_docs/00_problem/input_data/expected_results/results_report.md`. For complex expected outputs, place reference CSV files alongside it in `_docs/00_problem/input_data/expected_results/`. Referenced by the test-spec skill (`.cursor/skills/test-spec/SKILL.md`). --- ```markdown # Expected Results Maps every input data item to its quantifiable expected result. Tests use this mapping to compare actual system output against known-correct answers. ## Result Format Legend | Result Type | When to Use | Example | |-------------|-------------|---------| | Exact value | Output must match precisely | `status_code: 200`, `detection_count: 3` | | Tolerance range | Numeric output with acceptable variance | `confidence: 0.92 ± 0.05`, `bbox_x: 120 ± 10px` | | Threshold | Output must exceed or stay below a limit | `latency < 500ms`, `confidence ≥ 0.85` | | Pattern match | Output must match a string/regex pattern | `error_message contains "invalid format"` | | File reference | Complex output compared against a reference file | `match expected_results/case_01.json` | | Schema match | Output structure must conform to a schema | `response matches DetectionResultSchema` | | Set/count | Output must contain specific items or counts | `classes ⊇ {"car", "person"}`, `detections.length == 5` | ## Comparison Methods | Method | Description | Tolerance Syntax | |--------|-------------|-----------------| | `exact` | Actual == Expected | N/A | | `numeric_tolerance` | abs(actual - expected) ≤ tolerance | `± ` or `± %` | | `range` | min ≤ actual ≤ max | `[min, max]` | | `threshold_min` | actual ≥ threshold | `≥ ` | | `threshold_max` | actual ≤ threshold | `≤ ` | | `regex` | actual matches regex pattern | regex string | | `substring` | actual contains substring | substring | | `json_diff` | structural comparison against reference JSON | diff tolerance per field | | `set_contains` | actual output set contains expected items | subset notation | | `file_reference` | compare against reference file in expected_results/ | file path | ## Input → Expected Result Mapping ### [Scenario Group Name, e.g. "Single Image Detection"] | # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | |---|-------|-------------------|-----------------|------------|-----------|---------------| | 1 | `[file or parameters]` | [what this input represents] | [quantifiable expected output] | [method from table above] | [± value, range, or N/A] | [path in expected_results/ or N/A] | #### Example — Object Detection | # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | |---|-------|-------------------|-----------------|------------|-----------|---------------| | 1 | `image_01.jpg` | Aerial photo, 3 vehicles visible | `detection_count: 3`, classes: `["ArmorVehicle", "ArmorVehicle", "Truck"]` | exact (count), set_contains (classes) | N/A | N/A | | 2 | `image_01.jpg` | Same image, bbox positions | bboxes: `[(120,80,340,290), (400,150,580,310), (50,400,200,520)]` | numeric_tolerance | ± 15px per coordinate | `expected_results/image_01_detections.json` | | 3 | `image_01.jpg` | Same image, confidence scores | confidences: `[0.94, 0.88, 0.91]` | threshold_min | each ≥ 0.85 | N/A | | 4 | `empty_scene.jpg` | Aerial photo, no objects | `detection_count: 0`, empty detections array | exact | N/A | N/A | | 5 | `corrupted.dat` | Invalid file format | HTTP 400, body contains `"error"` key | exact (status), substring (body) | N/A | N/A | #### Example — Performance | # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | |---|-------|-------------------|-----------------|------------|-----------|---------------| | 1 | `standard_image.jpg` | 1920x1080 single image | Response time | threshold_max | ≤ 2000ms | N/A | | 2 | `large_image.jpg` | 8000x6000 tiled image | Response time | threshold_max | ≤ 10000ms | N/A | #### Example — Error Handling | # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | |---|-------|-------------------|-----------------|------------|-----------|---------------| | 1 | `POST /detect` with no file | Missing required input | HTTP 422, message matches `"file.*required"` | exact (status), regex (message) | N/A | N/A | | 2 | `POST /detect` with `probability_threshold: 5.0` | Out-of-range config | HTTP 422 or clamped to valid range | exact (status) or range [0.0, 1.0] | N/A | N/A | ## Expected Result Reference Files When the expected output is too complex for an inline table cell (e.g., full JSON response with nested objects), place a reference file in `_docs/00_problem/input_data/expected_results/`. ### File Naming Convention `_expected.` Examples: - `image_01_detections.json` - `batch_A_results.csv` - `video_01_annotations.json` ### Reference File Requirements - Must be machine-readable (JSON, CSV, YAML — not prose) - Must contain only the expected output structure and values - Must include tolerance annotations where applicable (as metadata fields or comments) - Must be valid and parseable by standard libraries ### Reference File Example (JSON) File: `expected_results/image_01_detections.json` ```json { "input": "image_01.jpg", "expected": { "detection_count": 3, "detections": [ { "class": "ArmorVehicle", "confidence": { "min": 0.85 }, "bbox": { "x1": 120, "y1": 80, "x2": 340, "y2": 290, "tolerance_px": 15 } }, { "class": "ArmorVehicle", "confidence": { "min": 0.85 }, "bbox": { "x1": 400, "y1": 150, "x2": 580, "y2": 310, "tolerance_px": 15 } }, { "class": "Truck", "confidence": { "min": 0.85 }, "bbox": { "x1": 50, "y1": 400, "x2": 200, "y2": 520, "tolerance_px": 15 } } ] } } ``` ``` --- ## Guidance Notes - Every row in the mapping table must have at least one quantifiable comparison — no row should say only "should work" or "returns result". - Use `exact` comparison for counts, status codes, and discrete values. - Use `numeric_tolerance` for floating-point values and spatial coordinates where minor variance is expected. - Use `threshold_min`/`threshold_max` for performance metrics and confidence scores. - Use `file_reference` when the expected output has more than ~3 fields or nested structures. - Reference files must be committed alongside input data — they are part of the test specification. - When the system has non-deterministic behavior (e.g., model inference variance across hardware), document the expected tolerance explicitly and justify it.