@@ -0,0 +1,104 @@
# Expected Results
Maps every input data item to its quantifiable expected result.
Tests use this mapping to compare actual system output against known-correct answers.
## Coordinate System
All bounding box coordinates are **normalized to 0.0– 1.0 ** relative to the full image/frame dimensions, matching the API response format:
| Field | Meaning |
|-------|---------|
| `center_x` | Horizontal center of bounding box (0.0 = left edge, 1.0 = right edge) |
| `center_y` | Vertical center of bounding box (0.0 = top edge, 1.0 = bottom edge) |
| `width` | Bounding box width as fraction of image width |
| `height` | Bounding box height as fraction of image height |
| `label` | Class name from `classes.json` (e.g., `ArmorVehicle` , `Car` , `Person` ) |
| `confidence_min` | Minimum acceptable confidence for this detection (threshold comparison, `≥` ) |
For videos, the additional field:
| Field | Meaning |
|-------|---------|
| `time_sec` | Timestamp in seconds from video start when this detection is visible |
## Global Tolerances
| Parameter | Tolerance | Comparison Method |
|-----------|-----------|-------------------|
| Bounding box coordinates (center_x, center_y, width, height) | ± 0.05 | `numeric_tolerance` |
| Detection count | ± 2 | `numeric_tolerance` |
| Confidence | ≥ `confidence_min` value per row | `threshold_min` |
| Label | exact match | `exact` |
| Video time_sec | ± 1.0s | `numeric_tolerance` |
## Input → Expected Result Mapping
### Images
| # | Input File | Description | Expected Result File | Expected Detection Count | Notes |
|---|------------|-------------|---------------------|-------------------------|-------|
| 1 | `image_small.jpg` | 1280× 720 aerial, contains detectable objects | `image_small_expected.csv` | ? | Primary test image for single-frame detection |
| 2 | `image_large.JPG` | 6252× 4168 aerial, triggers GSD-based tiling | `image_large_expected.csv` | ? | Coordinates normalized to full image (not tile) |
| 3 | `image_dense01.jpg` | 1280× 720 dense scene, many clustered objects | `image_dense01_expected.csv` | ? | Used for dedup and max-detection-cap tests |
| 4 | `image_dense02.jpg` | 1920× 1080 dense scene variant | `image_dense02_expected.csv` | ? | Borderline tiling, dedup variant |
| 5 | `image_different_types.jpg` | 900× 1600, varied object classes | `image_different_types_expected.csv` | ? | Must contain multiple distinct class labels |
| 6 | `image_empty_scene.jpg` | 1920× 1080, no detectable objects | `image_empty_scene_expected.csv` | 0 | CSV has headers only — zero detections expected |
### Videos
| # | Input File | Description | Expected Result File | Notes |
|---|------------|-------------|---------------------|-------|
| 7 | `video_short01.mp4` | Standard test video | `video_short01_expected.csv` | Primary async/SSE/video test. List key-frame detections. |
| 8 | `video_short02.mp4` | Video variant | `video_short02_expected.csv` | Used for resilience and concurrent tests |
| 9 | `video_long03.mp4` | Long video (288MB), generates >100 SSE events | `video_long03_expected.csv` | SSE overflow test. Only key-frame samples needed. |
## How to Fill
### Images
1. Run the model on each image (or use the detection service)
2. Record every detection the model returns
3. Fill one row per detection in the CSV:
```
center_x,center_y,width,height,label,confidence_min
0.45,0.32,0.08,0.12,Car,0.25
0.71,0.55,0.06,0.09,Person,0.25
```
4. For `image_empty_scene_expected.csv` — leave only the header row (0 detections)
### Videos
1. Run the model on the video (or use the detection service with `frame_period_recognition: 1` )
2. For key frames where detections appear, record the timestamp and detections
3. Fill one row per detection per timestamp:
```
time_sec,center_x,center_y,width,height,label,confidence_min
2.0,0.45,0.32,0.08,0.12,Car,0.25
2.0,0.71,0.55,0.06,0.09,Person,0.25
4.0,0.46,0.33,0.08,0.12,Car,0.25
```
4. You don't need every single frame — sample at key moments (e.g., every 2– 4 seconds) to validate detection presence and approximate positions
## Non-Detection Expected Results
The following test scenarios have expected results that are not per-file detections. These are defined inline in the test specs and do not need CSV files:
| Scenario | Expected Result | Comparison | Defined In |
|----------|----------------|------------|------------|
| Empty image (FT-N-01) | HTTP 400, `"Image is empty"` | exact | `blackbox-tests.md` |
| Corrupt image (FT-N-02) | HTTP 400 or 422 | exact | `blackbox-tests.md` |
| Engine unavailable (FT-N-03) | HTTP 503 or 422, not 500 | exact | `blackbox-tests.md` |
| Duplicate media_id (FT-N-04) | HTTP 409 | exact | `blackbox-tests.md` |
| Missing classes.json (FT-N-05) | Service fails or empty detections | exact | `blackbox-tests.md` |
| Health pre-init (FT-P-01) | `aiAvailability: "None"` | exact | `blackbox-tests.md` |
| Health post-init (FT-P-02) | `aiAvailability` not "None"/"Downloading" | exact | `blackbox-tests.md` |
| Async start (FT-P-08) | `{"status": "started"}` , response < 1s | exact + threshold_max | `blackbox-tests.md` |
| SSE completion (FT-P-09) | Final event: `mediaStatus: "AIProcessed"` , `percent: 100` | exact | `blackbox-tests.md` |
| Max detections (NFT-RES-LIM-03) | `len(detections) ≤ 300` | threshold_max | `resource-limit-tests.md` |
| Single image latency (NFT-PERF-01) | p95 < 5000ms (ONNX CPU) | threshold_max | `performance-tests.md` |
| Log file naming (NFT-RES-LIM-04) | `log_inference_YYYYMMDD.txt` exists | regex | `resource-limit-tests.md` |