autopilot/_docs/00_problem/input_data/README.md

# Input Data

Runtime inputs the autopilot consumes when flying, plus reference fixtures + expected-output assertions for tests. **All fixtures live inside this workspace** (`fixtures/`) — never reach into sibling repos at `../` for inputs. The autopilot repo is self-sufficient.

## Layout

| Path | Owns |
|---|---|
| `data_parameters.md` | Description of runtime input shapes (camera, telemetry, gRPC, mission JSON, operator commands, VLM IPC) + the categories of reference data tests need + Tier-1/Tier-2 class catalogue. |
| `services.md` | Per-external-service test-mock requirements: what shape of mock/fixture each of the 7 external systems needs and the acquisition status of each. |
| `fixtures/README.md` | File-by-file manifest of every fixture in this directory: SHA-256, size, upstream provenance, which `expected_results/results_report.md` rows consume it. |
| `fixtures/images/` | Real aerial frames (5 images, ~9 MB total) — Tier-1 inputs for detection-quality assertions (L1, D2, D6). |
| `fixtures/videos/` | Real reconnaissance video (1 clip, 12 MB) for frame-rate floor + sequence tests (T3). |
| `fixtures/movement/` | Wide-area movement-detection visual reference clips (4 clips, ~23 MB total). **No paired `gimbal.csv` / `telemetry.csv`** — ego-motion compensation (M1–M4) cannot run against these alone. |
| `fixtures/semantic/` | Concealed-position semantic reference frames (4 PNGs, ~11 MB total) + `data_parameters.md` describing the new YOLO primitive classes the examples motivate. **Starter set only**, not a graded eval set. |
| `fixtures/schemas/` | Detection-result contract schemas (JSON + JSON-schema) for D6. |
| `fixtures/sql/` | Database init script — reference only; not directly asserted by an autopilot AC. |
| `expected_results/results_report.md` | The input → quantifiable-expected-output mapping consumed by `/test-spec` Phase 1. Every row keys off an AC in `../acceptance_criteria.md`; deferred rows carry a structured `<DEFERRED: <shape>; ref <pointer>>` tag. |

## Why fixtures are local

The autopilot repo MUST be self-sufficient — a developer with only the autopilot clone (no parent suite checked out) MUST be able to run the test specifications. Cross-repo `../` paths are forbidden in `results_report.md` and in any test runner script. When a sibling repo (`../detections/`, `../e2e/`, `../missions/`, etc.) is the upstream source of a fixture, we **copy** it in and SHA-pin it in `fixtures/README.md` so upstream drift is detectable.

## Suite-level coupling that still matters

Even though fixtures are local, the underlying contracts the fixtures embody come from suite-level decisions. When those decisions change, the fixtures here go stale:

- **Tier-1 detection model / classes** — when `../detections` ships a new model the `expected_detections.json` baseline goes stale; D1, D2, D6 rows in `results_report.md` must be re-recorded.
- **`mission-schema`** — shared between autopilot and the `missions` repo. Schema changes break the mission JSON contract; the mock fixtures for Mp1–Mp5 (when authored) must re-pin.
- **Detection classes catalogue** — class IDs 0..18 are governed at the suite level. Autopilot's normalised-box output uses the same IDs. The 5 new Tier-1 classes documented in `data_parameters.md → "Class catalogue"` must land in the suite catalogue before D1 can be measured.

Today these couplings are tracked manually. The `monorepo-e2e` skill at the suite root will eventually own the drift detection.

## Fixture gaps and the project policy on `/test-spec` Phase 3

`/test-spec` Phase 3 has a **hard 75% coverage gate** on rows with real input fixtures + real expected results. Today's coverage is well below that gate (see `expected_results/results_report.md → "Coverage Status"`). **Project policy as of 2026-05-19**: rather than block the autodev flow at the gate, each deferred row is registered with a structured `<DEFERRED: <shape>; ref <pointer>>` tag in `results_report.md`, pointing at the per-service acquisition path in `services.md` or at an open architecture question (Q-tag). Deferred rows become **release-gate items**, not development-gate items. The `acceptance_criteria.md → "Acceptance Gates (project-level)"` hardware/replay benchmark requirement remains a hard release blocker.

Summary of open gaps (authoritative list lives in `services.md` and `fixtures/README.md`):

1. **Paired `gimbal.csv` + `telemetry.csv` for the 4 movement clips** — highest priority (blocks M1–M4 + tightens L6/L7). **User-confirmed unavailable today (2026-05-19).**
2. Annotated multi-season eval set (concealed positions + footpaths).
3. Mock `missions` API exchanges + mock `/mapobjects` round-trip.
4. Mock Ground Station session traces.
5. ArduPilot SITL traces.
6. Operator-command envelopes (blocked on Q9).
7. VLM I/O pairs.
8. GPS / NTP drift scripts.

Closing each gap is its own workstream tracked in Jira; the autodev flow does not block on them.

## Adding new fixtures

1. Drop the file under `fixtures/<images|videos|movement|semantic|schemas|sql|gimbal|telemetry|mavlink|vlm|operator|mapobjects>/<descriptive-name>.<ext>` — create the subdirectory if it does not exist.
2. Compute SHA-256 (`shasum -a 256 <file>`).
3. Add a row to the matching subsection in `fixtures/README.md` (file path, size, SHA, upstream provenance, which `results_report.md` rows consume it).
4. Replace the matching `<DEFERRED: ...>` placeholder(s) in `expected_results/results_report.md` with the local path `fixtures/<...>`.
5. If the fixture replaces a service mock, also update `services.md → "Coverage summary by service"` to reflect the new acquisition status.
6. If the fixture is binary and large (> 50 MB) consider gitignoring it + adding an acquisition script per the e2e pattern; for everything in the current set, direct commit is fine.