[AZ-626] Decompose complete: 47 tasks + docs + module layout

Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 10:51:11 +00:00 · 2026-05-19 11:02:01 +03:00
parent f7d6cb4a3a
commit bc40ea7300
235 changed files with 12585 additions and 15097 deletions
@@ -0,0 +1,93 @@
+# Acceptance Criteria
+
+Measurable, design-independent success criteria. Implementation choices (specific models, libraries, components, algorithms) belong in `_docs/01_solution/` and `_docs/02_document/`, NOT here. (Audited against `.cursor/rules/artifact-srp.mdc`.)
+
+Every criterion below is observable through the system's external behaviour and can be evaluated by a black-box test.
+
+## Latency
+
+- Primitive (Tier 1) object detection — per-frame end-to-end on the deployed compute device: **≤100 ms** at 1280 px input.
+- Semantic confirmation (Tier 2) over a single ROI: **≤200 ms**.
+- Deep semantic confirmation (Tier 3 / VLM, when enabled): **≤5 s** per ROI.
+- Camera zoom transition (medium → high): **≤2 s** wall-clock, including the physical zoom traversal.
+- Decision-to-movement latency (internal scan-control decision → camera physically moving): **≤500 ms**.
+- Movement candidate enqueue: **≤1 s** during the wide-area sweep; **≤1.5 s** during the zoomed-in inspection (accommodating gimbal slew).
+- Zoom-out → zoom-in transition (POI detected → ROI fully zoomed): **≤2 s** wall-clock.
+- Operator command → action: **≤500 ms** from operator click to outbound command (modem RTT excluded).
+
+## Throughput / Rate
+
+- POI rate surfaced to the operator: **≤5 POIs / minute** (hard cap; frozen 2026-05-06).
+- Position telemetry rate: **≥1 Hz**, target **10 Hz**.
+- Sustained camera frame-rate floor: **≥10 fps**. Below this, zoom-in transitions MUST be suppressed and overall health MUST surface yellow.
+
+## Detection Quality
+
+(Behaviour as observed at the system boundary. Model identity, training data, and label catalogue live in `_docs/02_document/architecture.md` and the `../ai-training` repo.)
+
+- New target classes (black entrances, branch piles, footpaths, roads, trees, tree blocks): per-class **precision ≥80%** AND **recall ≥80%**.
+- Existing-class regression: per-class precision and recall MUST NOT degrade by more than ±2 percentage points against the documented baseline.
+- Concealed-position recall (initial gate, accepting high false-positive rate): **≥60%**.
+- Concealed-position precision (initial gate, operators filter): **≥20%**.
+- Footpath recall: **≥70%**.
+
+## Movement Detection Behaviour
+
+- Small moving point/cluster candidates that are not yet classifiable MUST be detected during the wide-area sweep and enqueued for zoomed inspection within **≤1 s**.
+- Movement detection MUST continue during the zoomed-in inspection (a moving target that appears inside a held POI must not be lost), with enqueue within **≤1.5 s**.
+- Stable objects (trees, houses, roads, terrain) MUST NOT be treated as moving solely because the camera platform itself moves.
+- A configurable per-zoom-band false-positive budget MUST be honoured (the system must not flood the operator with false candidates by ignoring its own threshold).
+
+## Scan & Camera Control Behaviour
+
+- The wide-area sweep MUST cover the planned route with a left-right gimbal pattern at wide or light/medium zoom.
+- Transition from sweep to detailed inspection MUST complete within **≤2 s** of POI detection (including physical zoom).
+- During detailed inspection the system MUST keep the target locked while the airframe flies, pan to keep features visible, hold endpoints up to **2 s** for deep analysis, and return to the sweep after analysis or a configurable per-POI timeout (default **5 s/POI**).
+- After operator confirmation, target-follow mode MUST keep the target within the **centre 25%** of the frame while visible.
+- Gimbal commands MUST achieve **≤500 ms** decision-to-movement latency with visibly smooth transitions.
+- The POI queue MUST be ordered by confidence × proximity to current camera × age factor (relative ranking, not absolute formula).
+
+## Operator Workflow
+
+- The decision window surfaced to the operator MUST scale linearly with confidence: **40% confidence → 30 s; 100% confidence → 120 s**. Below 40% confidence, the POI MUST NOT be surfaced at all.
+- Operator-decline MUST result in a persistent ignored-item entry for the matching `(MGRS cell, class group)` so the same target is not re-surfaced.
+- Timeout (no operator response within the window) MUST NOT create an ignored-item entry (forget, do not blacklist).
+- A new detection whose `(MGRS cell, class group)` matches an existing ignored-item MUST NOT be surfaced.
+- Operator confirmation MUST result in (a) a middle waypoint inserted into the mission and (b) a transition to target-follow mode.
+- A replayed or unsigned operator command MUST be rejected with a logged security warning; system state MUST NOT change.
+
+## Reliability & Safety
+
+- Pre-flight self-test MUST pass (every dependency healthy OR explicit operator acknowledgement of a known degraded state) before takeoff is permitted.
+- Loss of operator/Ground-Station radio link MUST trigger a known mission-safe outcome within a deterministic, configurable grace window (default **30 s grace → RTL**).
+- Loss of airframe command link MUST surface health red immediately and defer to the airframe autopilot's own failsafe.
+- Battery at or below the configured **RTL floor** (e.g. 25%) MUST trigger RTL automatically; battery at or below the **hard floor** (e.g. 15%) MUST trigger land-now. Only an authenticated operator command may override.
+- MAVLink command exhaustion (bounded retry with exponential backoff fails through max-retry) MUST flip the airframe-link health to red.
+- Wall-clock drift greater than **200 ms** versus GPS or NTP source MUST surface health yellow.
+- Geofence INCLUSION and EXCLUSION violations MUST both result in waypoint refusal + RTL.
+
+## Resources & Data
+
+- Combined RSS on the deployed compute device, for everything autopilot owns onboard (excluding Tier 1), MUST stay within **≤6 GB**.
+- Tier 1 per-frame latency MUST NOT degrade by more than **±5 ms** when autopilot's own onboard workload is running concurrently.
+
+## Map Reconciliation (with the central area-level map)
+
+- Pre-flight map pull for a **30 km × 30 km** mission area: **≤30 s** wall-clock. Cache-fallback on timeout is acceptable only with explicit operator acknowledgement.
+- Post-flight pass diff push for a **60-minute** mission: **≤2 min** wall-clock. Failure MUST persist the pending diff to durable on-device storage with bounded retry.
+
+## Acceptance Gates (project-level)
+
+- A hardware/replay benchmark suite MUST pass before product implementation begins. Specifically: every latency criterion above MUST be measured on the deployed compute device, not on a developer workstation.
+- Per-season dataset coverage MUST be demonstrated before MVP sign-off (winter, spring, summer, autumn).
+- MAVLink command surface MUST pass SITL conformance against ArduPilot.
+
+## Q-tagged criteria (depend on open architecture decisions)
+
+These criteria are real and measurable; their tolerance ranges may sharpen once the linked open question resolves. The questions are tracked in `_docs/02_document/architecture.md §8`.
+
+- Movement detection false-positive rate at zoomed-in inspection — depends on **Q14** (classical-CV adequacy vs learned-CV fallback).
+- MapObjects conflict resolution behaviour — depends on **Q8** (append-only log + projection rules).
+- Operator-command authentication conformance — depends on **Q9** (signing scheme).
+- Airframe MAVLink-2 message signing — depends on **Q6**.
+- Per-season flight-test gates — depends on **Q13**.
@@ -0,0 +1,58 @@
+# Input Data
+
+Runtime inputs the autopilot consumes when flying, plus reference fixtures + expected-output assertions for tests. **All fixtures live inside this workspace** (`fixtures/`) — never reach into sibling repos at `../` for inputs. The autopilot repo is self-sufficient.
+
+## Layout
+
+| Path | Owns |
+|---|---|
+| `data_parameters.md` | Description of runtime input shapes (camera, telemetry, gRPC, mission JSON, operator commands, VLM IPC) + the categories of reference data tests need + Tier-1/Tier-2 class catalogue. |
+| `services.md` | Per-external-service test-mock requirements: what shape of mock/fixture each of the 7 external systems needs and the acquisition status of each. |
+| `fixtures/README.md` | File-by-file manifest of every fixture in this directory: SHA-256, size, upstream provenance, which `expected_results/results_report.md` rows consume it. |
+| `fixtures/images/` | Real aerial frames (5 images, ~9 MB total) — Tier-1 inputs for detection-quality assertions (L1, D2, D6). |
+| `fixtures/videos/` | Real reconnaissance video (1 clip, 12 MB) for frame-rate floor + sequence tests (T3). |
+| `fixtures/movement/` | Wide-area movement-detection visual reference clips (4 clips, ~23 MB total). **No paired `gimbal.csv` / `telemetry.csv`** — ego-motion compensation (M1–M4) cannot run against these alone. |
+| `fixtures/semantic/` | Concealed-position semantic reference frames (4 PNGs, ~11 MB total) + `data_parameters.md` describing the new YOLO primitive classes the examples motivate. **Starter set only**, not a graded eval set. |
+| `fixtures/schemas/` | Detection-result contract schemas (JSON + JSON-schema) for D6. |
+| `fixtures/sql/` | Database init script — reference only; not directly asserted by an autopilot AC. |
+| `expected_results/results_report.md` | The input → quantifiable-expected-output mapping consumed by `/test-spec` Phase 1. Every row keys off an AC in `../acceptance_criteria.md`; deferred rows carry a structured `<DEFERRED: <shape>; ref <pointer>>` tag. |
+
+## Why fixtures are local
+
+The autopilot repo MUST be self-sufficient — a developer with only the autopilot clone (no parent suite checked out) MUST be able to run the test specifications. Cross-repo `../` paths are forbidden in `results_report.md` and in any test runner script. When a sibling repo (`../detections/`, `../e2e/`, `../missions/`, etc.) is the upstream source of a fixture, we **copy** it in and SHA-pin it in `fixtures/README.md` so upstream drift is detectable.
+
+## Suite-level coupling that still matters
+
+Even though fixtures are local, the underlying contracts the fixtures embody come from suite-level decisions. When those decisions change, the fixtures here go stale:
+
+- **Tier-1 detection model / classes** — when `../detections` ships a new model the `expected_detections.json` baseline goes stale; D1, D2, D6 rows in `results_report.md` must be re-recorded.
+- **`mission-schema`** — shared between autopilot and the `missions` repo. Schema changes break the mission JSON contract; the mock fixtures for Mp1–Mp5 (when authored) must re-pin.
+- **Detection classes catalogue** — class IDs 0..18 are governed at the suite level. Autopilot's normalised-box output uses the same IDs. The 5 new Tier-1 classes documented in `data_parameters.md → "Class catalogue"` must land in the suite catalogue before D1 can be measured.
+
+Today these couplings are tracked manually. The `monorepo-e2e` skill at the suite root will eventually own the drift detection.
+
+## Fixture gaps and the project policy on `/test-spec` Phase 3
+
+`/test-spec` Phase 3 has a **hard 75% coverage gate** on rows with real input fixtures + real expected results. Today's coverage is well below that gate (see `expected_results/results_report.md → "Coverage Status"`). **Project policy as of 2026-05-19**: rather than block the autodev flow at the gate, each deferred row is registered with a structured `<DEFERRED: <shape>; ref <pointer>>` tag in `results_report.md`, pointing at the per-service acquisition path in `services.md` or at an open architecture question (Q-tag). Deferred rows become **release-gate items**, not development-gate items. The `acceptance_criteria.md → "Acceptance Gates (project-level)"` hardware/replay benchmark requirement remains a hard release blocker.
+
+Summary of open gaps (authoritative list lives in `services.md` and `fixtures/README.md`):
+
+1. **Paired `gimbal.csv` + `telemetry.csv` for the 4 movement clips** — highest priority (blocks M1–M4 + tightens L6/L7). **User-confirmed unavailable today (2026-05-19).**
+2. Annotated multi-season eval set (concealed positions + footpaths).
+3. Mock `missions` API exchanges + mock `/mapobjects` round-trip.
+4. Mock Ground Station session traces.
+5. ArduPilot SITL traces.
+6. Operator-command envelopes (blocked on Q9).
+7. VLM I/O pairs.
+8. GPS / NTP drift scripts.
+
+Closing each gap is its own workstream tracked in Jira; the autodev flow does not block on them.
+
+## Adding new fixtures
+
+1. Drop the file under `fixtures/<images|videos|movement|semantic|schemas|sql|gimbal|telemetry|mavlink|vlm|operator|mapobjects>/<descriptive-name>.<ext>` — create the subdirectory if it does not exist.
+2. Compute SHA-256 (`shasum -a 256 <file>`).
+3. Add a row to the matching subsection in `fixtures/README.md` (file path, size, SHA, upstream provenance, which `results_report.md` rows consume it).
+4. Replace the matching `<DEFERRED: ...>` placeholder(s) in `expected_results/results_report.md` with the local path `fixtures/<...>`.
+5. If the fixture replaces a service mock, also update `services.md → "Coverage summary by service"` to reflect the new acquisition status.
+6. If the fixture is binary and large (> 50 MB) consider gitignoring it + adding an acquisition script per the e2e pattern; for everything in the current set, direct commit is fine.
@@ -0,0 +1,101 @@
+# Input Data Parameters
+
+Describes the **categories of input data** the system consumes at runtime, and the **categories of reference data** tests need. Internal component names, programming languages, IPC mechanisms, schema class names, and specific model choices are design and live in `_docs/02_document/architecture.md` — they do not belong in this file (per `.cursor/rules/artifact-srp.mdc`).
+
+Local fixtures live in `fixtures/`; see `fixtures/README.md` for the manifest. External-service test-mock requirements live in `services.md`; the per-row binding to AC criteria lives in `expected_results/results_report.md`.
+
+## Runtime inputs (what the system consumes when flying)
+
+| Input | Source | Format | Cadence | Notes |
+|---|---|---|---|---|
+| Camera frames | ViewPro A40 (or alternative ViewPro Z40K) | H.264 / H.265 over RTSP, 1080p (1920×1080) | 30 / 60 fps | Frame timestamps are mandatory. |
+| Primitive (Tier 1) detection responses | `../detections` service over a bi-directional streaming RPC contract | Bounding boxes with class id, confidence, normalised coordinates | Per frame | Same boxes feed Tier-2 ROI selection and the operator overlay. |
+| UAV telemetry | Airframe via MAVLink v2 (UDP or serial) | MAVLink messages: position, attitude, velocity, battery, link health, GPS fix | ≥1 Hz (10 Hz target) | Source-of-truth for ego-motion compensation. |
+| Gimbal feedback | ViewPro A40 vendor protocol over UDP | Yaw / pitch / zoom angle telemetry | per-tick | Source-of-truth for camera-pose compensation. |
+| Mission JSON | `missions` service via HTTPS REST | Shared `mission-schema` JSON | Once at mission start + middle-waypoint updates | Validated against the shared schema. |
+| Area-level map state | `missions` service extension `/missions/{id}/mapobjects` (GET) | Map-object records keyed by spatial cell | Once at mission start | Hydrates the system's local copy of the area map; cache-fallback on timeout. |
+| Operator commands | Ground Station via modem (return path of the outbound telemetry stream) | Authenticated + signed + replay-protected command envelope (scheme open per Q9) | Event-driven | confirm / decline / target-follow start / target-follow release / abort. |
+| Deep-analysis responses (optional) | Local-onboard model accessed via local IPC | Structured assessment schema (validated) | Per zoomed-in endpoint hold (when deep-analysis is enabled) | Schema-violation fails closed. |
+
+## Class catalogue (Tier-1 + Tier-2)
+
+Detection-quality acceptance criteria (`acceptance_criteria.md → Detection Quality`) are evaluated against a class catalogue that combines pre-existing suite-level classes with new autopilot-driven additions. Class IDs are governed at the suite level (`../detections` owns the catalogue); autopilot only consumes the IDs.
+
+### New Tier-1 (YOLO primitive) classes — to be added to the suite catalogue
+
+| # | Class name | Annotation hint | Motivated by |
+|---|---|---|---|
+| 1 | Black entrances | Bounding box; various sizes (small hideout openings to dugout entrances) | Concealed-position detection (D3, D4) |
+| 2 | Branch piles | Bounding box | Concealment material around hideouts (D3, D4) |
+| 3 | Footpaths | **Polyline / segmentation preferred over bbox** for linear features | Footpath recall gate (D5) |
+| 4 | Roads | Polyline / segmentation | Distinguishing roads from footpaths in the same scene |
+| 5 | Trees / tree blocks | Bounding box; tree-block annotation may use larger box for clusters | Concealment-context anchor; reduces false positives around tree-rows in movement detection (M1) |
+
+### Tier-2 semantic attributes — composed by `semantic_analyzer`, NOT added to YOLO catalogue
+
+| # | Attribute | Composed from | Used by |
+|---|---|---|---|
+| 1 | Footpath freshness (fresh / stale) | Footpath bbox + texture/edge analysis + seasonal context | Decision-window scoring, D5 partial coverage |
+| 2 | Concealed-structure inference | Black-entrance + branch-piles + footpath-approach proximity | POI surfacing for D3/D4 (the structure itself is composed, not directly labelled) |
+| 3 | Open clearing connected to path | Cleared-terrain texture + footpath endpoint | FPV-launch-point flagging |
+
+### Existing classes (already in the suite catalogue)
+
+The existing-class baseline (P=0.816, R=0.852 per the AC) covers the suite's pre-autopilot class set (vehicles, military equipment, etc.). Autopilot must not degrade these — see D2.
+
+### Reference for IDs
+
+The 19-id catalogue (0..18) is owned by `../detections`. Autopilot's normalised-box output uses the same IDs. When `../detections` ships a new model or renumbers IDs, the `expected_detections.json` baseline goes stale and D1, D2, D6 rows must be re-recorded.
+
+## Reference data needed for testing
+
+### Local fixtures already on disk
+
+See `fixtures/README.md` for the SHA-pinned manifest. Categorised summary:
+
+| Local fixture category | Files | Purpose | Bound to AC rows |
+|---|---|---|---|
+| `fixtures/images/*.jpg` | 5 aerial frames | Tier-1 detection contract; existing-class regression; normalised-box conformance | L1, D2, D6 |
+| `fixtures/videos/94d42580bd1ad6ff.mp4` | 1 reconnaissance clip | Frame-rate floor scenario, reserved for future movement-sequence tests | T3 |
+| `fixtures/schemas/expected_detections.{json,schema.json}` | 2 schema files | Detection-result contract shape reference | D6 |
+| `fixtures/sql/init.sql` | 1 SQL file | Suite-e2e DB seed reference | (suite-only; no autopilot AC) |
+| `fixtures/movement/video0[1-4].mp4` | 4 wide-area clips | Visual reference for movement-detection scenarios — **no paired telemetry CSVs**, ego-motion assertions unfalsifiable until those land | M1–M4 (visual reference only) |
+| `fixtures/semantic/semantic0[1-4].png` | 4 reference frames | Visual reference for concealed-position semantic targets — **starter set only, not a graded eval set** | D3, D4, D5 (starter only) |
+
+### Reference shapes still needed but not yet on disk
+
+The per-service mock catalogue is in `services.md` (authoritative). Summary of categories tests need:
+
+| Reference shape | Why it's needed | See |
+|---|---|---|
+| Frame sequences with synchronised `gimbal.csv` + `telemetry.csv` | Ego-motion compensation at zoom-out AND zoomed-in inspection | `services.md §6 Gimbal telemetry CSV` |
+| Concealed-position image set across all four seasons (annotated) | Concealed-position recall ≥60% and precision ≥20% | `services.md §5 Camera frame sequences` |
+| Footpath sequences (fresh, stale, all four seasons, polyline-annotated) | Footpath recall ≥70% | `services.md §5` |
+| New-class evaluation set (5 new classes above) | New-class per-class P/R ≥80% without degrading existing-class performance | `services.md §1 Tier-1 detection replay` (plus annotation campaign owned by `../ai-training` repo) |
+| Mock Tier-1 streaming-RPC replays | Detection-consumer isolation tests | `services.md §1` |
+| Mock Ground Station session traces | Lost-link failsafe ladder + operator-link reconnect | `services.md §3` |
+| MAVLink SITL traces | MAVLink conformance + waypoint insertion + geofence enforcement | `services.md §4` |
+| Mock central area-map service responses | Pre-flight pull / post-flight push round-trip; conflict cases (Q8) | `services.md §2` |
+| Operator-command envelopes | Signature + replay-protection tests (once Q9 resolves) | `services.md §8` |
+| VLM I/O pairs | Bounded ROI inputs + structured assessment outputs + schema-violation cases | `services.md §7` |
+| GPS / NTP drift scenarios | Wall-clock drift health-yellow gate | `services.md §9` |
+
+## Data volume targets
+
+- Training data: hundreds to thousands of annotated images/sequences total.
+- Seasonal coverage: winter (snow), spring (mud), summer (vegetation), autumn (mixed leaf + partial snow).
+- Available assembly effort: 1.5 months at 5 hours/day.
+- Movement detection requires **frame sequences** (not still images only) with synchronised camera + gimbal + UAV telemetry.
+- Footpaths require polyline or segmentation annotation rather than bounding boxes (see "Class catalogue" above).
+
+## Gaps that block `/test-spec` downstream
+
+`/test-spec` Phase 1 will pass on prerequisite existence (`expected_results/results_report.md` is non-empty). Phase 3 has a **hard 75% coverage gate** on rows with real input fixtures + real expected results.
+
+**Current coverage state** (re-computed 2026-05-19 after fixture restoration):
+
+- Rows bound to real local fixtures: L1, D2, D6, T3 (~4 rows) — these are also the rows whose fixtures were restored on 2026-05-19 from sibling repos.
+- Rows bound to **starter-only** fixtures (insufficient on their own): D3, D4, D5 (semantic PNGs), M1–M4 (movement videos without CSV).
+- Rows still deferred for fixture acquisition: see `fixtures/README.md → "Gaps still pending fixture acquisition"` and `services.md` for the authoritative list.
+
+**Project policy on the Phase 3 gate**: rather than block `/test-spec` at the 75% gate, the autodev flow registers each deferred row with a structured `<DEFERRED: needs <shape>; blocks AC <id>>` tag in `expected_results/results_report.md`. Test-spec authoring proceeds; deferred rows become release-gate items, not development-gate items. The acceptance_criteria.md project-level gate ("MUST pass before product implementation begins") still applies for the hardware/replay benchmark — that remains a hard release blocker, not deferred.
@@ -0,0 +1,153 @@
+# Expected Results
+
+Maps every quantifiable acceptance criterion from `_docs/00_problem/acceptance_criteria.md` to an input fixture + a measurable expected result. Consumed by `/test-spec` Phase 1.
+
+Per `.cursor/rules/artifact-srp.mdc`, this file uses **role / observable-behaviour language**, not internal component slugs. The system's externally observable behaviour is what's tested. Implementation names (component slugs, libraries, model names) live in `_docs/02_document/`.
+
+**Fixture sourcing**: all fixtures live in `fixtures/` (sibling-repo `../` paths are forbidden). Where no fixture exists yet, the `Input` cell carries a structured `<DEFERRED: <shape>; ref services.md §N>` tag. Phase 3 has a hard 75% coverage gate — the autodev flow registers deferred rows as release-gate items rather than blocking on the gate; see `data_parameters.md → "Gaps that block /test-spec downstream"`.
+
+**Comparison vocabulary**: see `.cursor/skills/test-spec/templates/expected-results.md` for canonical methods (`exact`, `numeric_tolerance`, `threshold_min`, `threshold_max`, `range`, `regex`, `substring`, `set_contains`, `json_diff`, `file_reference`).
+
+**Deferred-tag legend**: `<DEFERRED: <shape>; ref <pointer>>` where `<pointer>` is a section in `../services.md` (per-service mock requirements), an open architecture question (e.g. `Q9`), or `inline-authorable` (no external dependency — just not yet written).
+
+---
+
+## Latency
+
+Source ACs: `acceptance_criteria.md → Latency`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| L1 | `fixtures/images/4d6e1830d211ad50.jpg` | Single 1280 px aerial frame consumed through the Tier-1 contract; measure end-to-end | per-frame end-to-end latency | threshold_max | ≤ 100 ms | N/A |
+| L2 | derived ROI ~640×640 from `fixtures/images/4d6e1830d211ad50.jpg` (inline-cropped by the test runner) | Tier-2 semantic confirmation over a single ROI | per-ROI latency | threshold_max | ≤ 200 ms | N/A |
+| L3 | `<DEFERRED: bounded ROI crop matching the deep-analysis input contract; ref services.md §7>` | Tier-3 deep-analysis (when enabled) local-IPC call | per-ROI call latency | threshold_max | ≤ 5000 ms | N/A |
+| L4 | `<DEFERRED: SITL or hardware-in-loop ViewPro A40 zoom command (medium→high); ref services.md §5>` | A40 physical zoom transition | wall-clock transition duration | threshold_max | ≤ 2000 ms | N/A |
+| L5 | `<DEFERRED: scripted scan decision event followed by camera physical motion; ref services.md §3, §5>` | Decision-to-movement latency end-to-end | wall-clock decision→motion duration | threshold_max | ≤ 500 ms | N/A |
+| L6 | `fixtures/movement/video01.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv; ref services.md §6>` | Movement candidate enqueue at the wide-area sweep | detection→enqueue duration | threshold_max | ≤ 1000 ms | N/A |
+| L7 | `fixtures/movement/video02.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv at zoomed-in band; ref services.md §6>` | Movement candidate enqueue during zoomed inspection | detection→enqueue duration | threshold_max | ≤ 1500 ms | N/A |
+| L8 | `<DEFERRED: full sweep → zoomed-inspection transition (POI detected → ROI fully zoomed); ref services.md §3, §5>` | Scan-mode transition including physical zoom | wall-clock transition | threshold_max | ≤ 2000 ms | N/A |
+| L9 | `<DEFERRED: scripted operator-click → outbound command emitted by the system (modem RTT excluded); ref services.md §3>` | Operator command → action latency | wall-clock click→outbound | threshold_max | ≤ 500 ms | N/A |
+
+## Throughput / Rate
+
+Source ACs: `acceptance_criteria.md → Throughput / Rate`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| T1 | `<DEFERRED: long synthetic POI feed sustained above the cap (e.g. 20 POIs/min); inline-authorable>` | Cap enforcement on POIs surfaced to operator | POI rate surfaced | threshold_max | ≤ 5 / min | N/A |
+| T2 | `<DEFERRED: airframe MAVLink telemetry replay over a 60 s window; ref services.md §4>` | Position telemetry consumed from the airframe link | reported position rate | range | 1 Hz ≤ rate ≤ 10 Hz (10 Hz target) | N/A |
+| T3 | `fixtures/videos/94d42580bd1ad6ff.mp4` replayed with throttled-decode + frame-drop injection to drop below 10 fps for ≥5 s | Frame-rate floor trigger | zoom-in transitions suppressed AND overall health surfaces yellow | exact (suppression bool) + exact (health = yellow) | N/A | N/A |
+
+## Detection Quality
+
+Source ACs: `acceptance_criteria.md → Detection Quality`. Evaluation runs against the Tier-1 detection pipeline that the system consumes; autopilot's role is correct consumption + re-emission of the normalised-box contract. Class catalogue (5 new Tier-1 classes + 3 Tier-2 attributes) is defined in `../data_parameters.md → "Class catalogue"`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| D1 | `<DEFERRED: new-class eval set across all four seasons (black entrances, branch piles, footpaths, roads, trees, tree blocks); ref services.md §1, annotation campaign in ../ai-training>` | Per-class precision/recall for added classes | per-class precision ≥ 0.80 AND recall ≥ 0.80 | threshold_min (both) | N/A | `<DEFERRED: expected_results/new_classes_pr.json>` |
+| D2 | `fixtures/images/{4d6e1830d211ad50,54f6459dbddb93d8,6dd601b7d2dc1b30,805bcf1e9f271a58,f997d0934726b555}.jpg` (5 frames) | Existing-class regression — must not degrade vs documented baseline P=0.816, R=0.852 | per-class precision + recall delta vs baseline | numeric_tolerance | ± 0.02 absolute | `<DEFERRED: expected_results/existing_classes_baseline.json — to be recorded against the pinned ../detections model>` |
+| D3 | `fixtures/semantic/semantic0[1-4].png` (4 starter frames — 1 winter, 3 unmarked season) + `<DEFERRED: full multi-season annotated concealed-position set; ref services.md §5>` | Concealed-position recall (initial gate, accepting high FP) | recall | threshold_min | ≥ 0.60 | `<DEFERRED: expected_results/concealed_positions.json>` |
+| D4 | Same as D3 | Concealed-position precision (operators filter) | precision | threshold_min | ≥ 0.20 | same as D3 |
+| D5 | `fixtures/semantic/semantic0[1-4].png` (all 4 feature footpaths leading to concealment — starter set) + `<DEFERRED: footpath sequences (fresh + stale, all four seasons), polyline-annotated; ref services.md §5>` | Footpath recall | recall | threshold_min | ≥ 0.70 | `<DEFERRED: expected_results/footpaths.json>` |
+| D6 | `fixtures/images/4d6e1830d211ad50.jpg` | Single-frame Tier-1 contract — system must consume the bbox stream and re-emit normalised-box format | output box stream conforms to the suite-level class catalogue (ids 0..18) + normalised coordinates ∈ [0,1] | schema_match + range | each coord ∈ [0,1] | `fixtures/schemas/expected_detections.schema.json` |
+
+## Movement Detection Behaviour
+
+Source ACs: `acceptance_criteria.md → Movement Detection`. Latency aspects (L6, L7) live under Latency.
+
+**Note**: M1–M4 each have a visual-reference video on disk but NO paired `gimbal.csv` / `telemetry.csv`. Ego-motion compensation cannot be verified against these videos — the visual binding is provided so a smoke harness can run, but the assertions in this section require the deferred CSVs to be meaningful. User confirmed 2026-05-19: paired CSVs do not exist today.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| M1 | `fixtures/movement/video01.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv; scene must contain 1 stable tree row + 1 moving vehicle; ref services.md §6>` | Ego-motion compensation — stable objects rejected | system emits exactly 1 movement candidate (the vehicle); does NOT emit a candidate for the tree row | set_contains | candidate set == {vehicle}; ∉ tree row | N/A |
+| M2 | `fixtures/movement/video02.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv at zoomed-in band; 1 small mover; ref services.md §6>` | Movement detection continues during zoomed-in hold | system enqueues 1 candidate while the camera is in the zoomed-in hold; current ROI is not preempted unless the candidate's priority exceeds it | exact | 1 candidate enqueued; ROI preempt decision matches priority rule | N/A |
+| M3 | `fixtures/movement/video03.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv simulating per-zoom-band threshold edge (cluster persistence one frame below threshold); ref services.md §6>` | Per-zoom-band threshold honoured (no false candidate) | no candidate emitted | exact | count == 0 | N/A |
+| M4 | `fixtures/movement/video04.mp4` (visual reference) + `<DEFERRED: zoom-out + zoomed-in benchmark suite measuring false-positive rate at each band; ref services.md §6, Q14>` | Movement zoomed-in benchmark gate (Q14 fallback trigger) | false-positive rate per zoom band | threshold_max | ≤ per-zoom-band budget (configurable; default ≤ 0.5 / minute at zoomed-in) | `<DEFERRED: expected_results/movement_benchmark_caps.json>` |
+
+## Scan & Camera Control Behaviour
+
+Source ACs: `acceptance_criteria.md → Scan and Camera Control`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| S1 | `<DEFERRED: scripted mission with planned route + simulated POI detected mid-sweep; ref services.md §3, §4>` | Sweep → zoomed-inspection transition within 2 s (L8) AND POI properly enqueued | transition completes; ROI matches POI bbox; queue length increments | exact (multiple) | N/A | N/A |
+| S2 | `<DEFERRED: zoomed-inspection hold scenario with footpath polyline overlapping the ROI; ref services.md §5, §6>` | Camera lock + pan along footpath while airframe flies | camera commands keep the footpath in the centre 50% of frame for the duration of the hold | numeric_tolerance | centre offset ≤ 25% per frame | N/A |
+| S3 | `<DEFERRED: operator-confirmed target + 60 s follow window; ref services.md §3>` | Target-follow centre-window | target inside centre 25% of frame while visible | threshold_max | per-frame |dx,dy| ≤ 0.125 × frame_size | N/A |
+| S4 | `<DEFERRED: queue with 3 POIs at varied confidence × proximity scores; inline-authorable>` | POI queue ordering | system pops POIs in order of `confidence × proximity × age_factor` (relative order matches) | exact (order) | N/A | N/A |
+| S5 | `<DEFERRED: hold endpoint with deep-analysis enabled — assessment returns within 2 s; ref services.md §7>` | Zoomed-in hold timeout default 5 s/POI; deep-analysis hold capped at 2 s | hold ends at min(5 s, deep_analysis_complete) | exact | N/A | N/A |
+
+## Operator Workflow
+
+Source ACs: `acceptance_criteria.md → Operator Workflow`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| O1 | `<DEFERRED: synthetic POI at confidence = 0.40; inline-authorable>` | Confidence-scaled decision window lower bound | window duration | exact | 30 s | N/A |
+| O2 | `<DEFERRED: synthetic POI at confidence = 1.00; inline-authorable>` | Confidence-scaled decision window upper bound | window duration | exact | 120 s | N/A |
+| O3 | `<DEFERRED: synthetic POI at confidence = 0.70; inline-authorable>` | Linear interpolation (40% → 30 s, 100% → 120 s) | window duration ≈ 30 + (0.70-0.40)/(1.00-0.40) × (120-30) = 75 s | numeric_tolerance | ± 0.5 s | N/A |
+| O4 | `<DEFERRED: synthetic POI at confidence = 0.39; inline-authorable>` | Below-threshold suppression | POI NOT surfaced to operator | exact | count surfaced == 0 | N/A |
+| O5 | `<DEFERRED: surfaced POI followed by operator decline event; inline-authorable>` | Decline → ignored-item entry persisted | ignored-item appended with `(MGRS, class_group)` matching the declined POI | exact (count delta +1) + schema_match | N/A | N/A |
+| O6 | `<DEFERRED: new detection whose (MGRS, class_group) matches an existing ignored-item; inline-authorable>` | Ignored-item suppression | POI NOT surfaced | exact | count surfaced == 0 | N/A |
+| O7 | `<DEFERRED: surfaced POI + no operator response, > decision-window; inline-authorable>` | Timeout = forget (NOT blacklisted) | POI removed from queue; no ignored-item written | exact (queue −1) + exact (ignored-item count unchanged) | N/A | N/A |
+| O8 | `<DEFERRED: operator confirm command — valid + signed + within sequence; ref services.md §3, §8 (Q9)>` | Confirm → middle waypoint inserted; mode transitions to target-follow | mission update POSTed; scan-mode reports target-follow | exact (HTTP 200) + exact (mode) | N/A | N/A |
+| O9 | `<DEFERRED: replayed operator command — same envelope a second time; ref services.md §8 (blocked on Q9)>` | Replay protection | command rejected; security WARN logged; no state change | exact (state unchanged) + substring (log contains "replay") | N/A | N/A |
+| O10 | `<DEFERRED: malformed / unsigned operator command; ref services.md §8 (blocked on Q9)>` | Signature validation | command rejected; security WARN logged | exact (state unchanged) + substring (log contains "invalid") | N/A | N/A |
+
+## Reliability & Safety
+
+Source ACs: `acceptance_criteria.md → Reliability & Safety` + lost-link failsafe ladder.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| R1 | `<DEFERRED: BIT scenario — every dependency healthy; inline-authorable>` | Pre-flight self-test passes | health endpoint returns all green; takeoff permitted | exact (state) + exact (health.all == "green") | N/A | N/A |
+| R2 | `<DEFERRED: BIT scenario — Tier-1 detection unreachable; inline-authorable>` | BIT fails the takeoff gate | takeoff NOT permitted; detection dependency reports red | exact (takeoff inhibited) | N/A | N/A |
+| R3 | `<DEFERRED: BIT scenario — persistent-store ≥95% full; inline-authorable>` | Storage floor BIT failure | takeoff NOT permitted; storage dependency reports red | exact (takeoff inhibited) | N/A | N/A |
+| R4 | `<DEFERRED: in-flight operator/Ground-Station modem-link loss + 30 s elapsed; ref services.md §3, §4>` | Lost-link failsafe ladder (default 30 s grace → RTL) | system issues RTL at exactly 30 s; operator-link dependency reports red | exact (RTL command at 30s ± 1s) | ± 1 s | N/A |
+| R5 | `<DEFERRED: mid-flight battery sample at RTL-floor (e.g. 25%); ref services.md §4>` | RTL trigger | system issues RTL; health → yellow | exact (RTL command) + exact (health == yellow) | N/A | N/A |
+| R6 | `<DEFERRED: mid-flight battery sample at hard-floor (e.g. 15%); ref services.md §4>` | Land-now trigger (only operator-overridable) | system issues land-now | exact (land_now command) | N/A | N/A |
+| R7 | `<DEFERRED: airframe link command + simulated bounded retry/backoff with peer not responding through max-retries; ref services.md §4>` | Watchdog flips health red on exhaustion | airframe-link dependency reports red after configured max-retry | exact (health == red) | N/A | N/A |
+| R8 | `<DEFERRED: wall-clock drift > 200 ms simulation (GPS lock present, NTP disabled); ref services.md §9>` | Drift alarm | time-source dependency reports yellow; `clock_source` + `last_sync_at` reflect the drift | exact (health == yellow) | N/A | N/A |
+| R9 | `<DEFERRED: geofence EXCLUSION polygon crossed by simulated waypoint; ref services.md §4>` | Symmetric geofence enforcement | waypoint refused; RTL triggered | exact (waypoint rejected) + exact (RTL) | N/A | N/A |
+
+## Resources & Data
+
+Source ACs: `acceptance_criteria.md → Resources & Data`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| Re1 | `<DEFERRED: long-running scenario — system's full onboard workload active for 5 min, monitored via process RSS; inline-authorable harness>` | Onboard memory budget (everything autopilot owns, excluding Tier 1) | combined RSS on the deployed compute device | threshold_max | ≤ 6 GB | N/A |
+| Re2 | Same as Re1 with concurrent Tier-1 traffic | Tier-1 non-degradation | Tier-1 ms/frame delta vs baseline (L1) | numeric_tolerance | ± 5 ms | N/A |
+
+## Map Reconciliation
+
+Source ACs: `acceptance_criteria.md → Map Reconciliation`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| Mp1 | `<DEFERRED: mock central area-map service — 30 km × 30 km region, ~10000 map objects; ref services.md §2>` | Pre-flight pull | wall-clock GET → local copy hydrated | threshold_max | ≤ 30 s | N/A |
+| Mp2 | `<DEFERRED: same mock but unreachable (timeout); ref services.md §2>` | Cache-fallback path | system falls back to last-known cached state; reports `map_sync == "cached_fallback"`; operator MUST acknowledge before takeoff | exact (state) + exact (BIT requires explicit ack) | N/A | N/A |
+| Mp3 | `<DEFERRED: simulated 60-minute mission pass diff (~5000 NEW + ~2000 MOVED + ~500 REMOVED + ~10000 CONFIRMED-EXISTING); ref services.md §2>` | Post-flight push | wall-clock POST → 200 OK | threshold_max | ≤ 120 s | N/A |
+| Mp4 | `<DEFERRED: same as Mp3 but POST returns 5xx; ref services.md §2>` | Persist-on-disk + bounded retry | pending diff written to on-device storage; operator-visible warning surfaced; retry attempts logged | exact (file exists) + exact (warning surfaced) + threshold_max (retries ≤ configured cap) | N/A | N/A |
+| Mp5 | `<DEFERRED: two map updates with conflicting state for same (spatial-cell, class_group) — append-only log scenario; ref services.md §2, Q8>` | Conflict-resolution rule (Q8 placeholder) | append-only observation log + computed current view; conflict resolution per documented rule | json_diff | N/A | `<DEFERRED: expected_results/mapobjects_conflict_resolution.json — pending Q8>` |
+
+---
+
+## Coverage Status (auto-recomputed 2026-05-19)
+
+- **Total rows**: 56 (L1–L9, T1–T3, D1–D6, M1–M4, S1–S5, O1–O10, R1–R9, Re1–Re2, Mp1–Mp5).
+- **Fully bound to real fixtures**: L1, T3, D2, D6 = **4 rows (~7%)**.
+- **Bound to derived inline fixture** (no external acquisition needed): L2 = **+1 row (5 total, ~9%)**.
+- **Bound to starter/partial fixtures** (visual reference only — assertions need additional deferred inputs to be meaningful): D3, D4, D5, M1, M2, M3, M4 = **+7 rows (12 total partial, ~21%)**.
+- **Inline-authorable but not yet authored** (no external dependency — can be unblocked anytime by writing the fixture): T1, S4, O1–O7, R1–R3, R8, Re1, Re2 = **15 rows (~27%)**. Lifting these alone would bring effective coverage to ~48%.
+- **Blocked on external acquisition** (real recordings, SITL, annotated eval sets, mock services): L3–L9 (minus L6/L7 partial), T2, D1, M1–M4 (CSV pairs), S1, S2, S3, S5, R4–R7, R9, Mp1–Mp5 = **~24 rows (~43%)**.
+- **Blocked on architecture questions**: O8 (depends on Q9 partially), O9, O10 (Q9), M4 (Q14), Mp5 (Q8) = **4 rows**.
+
+**Decision (project policy)**: rather than block on the Phase 3 75% gate, each deferred row is now registered with a structured `<DEFERRED:>` tag and surfaces in `data_parameters.md → "Gaps that block /test-spec downstream"`. `/test-spec` Phase 2 can author scenarios for all 56 rows; deferred rows become **release-gate items**, not development-gate items. The `acceptance_criteria.md → "Acceptance Gates (project-level)"` hardware/replay benchmark requirement is preserved as the hard release gate — that one is NOT being deferred.
+
+## Notes on this spec
+
+- Every row carries a quantifiable comparison + tolerance — no row is "should work".
+- Where the AC depends on hardware (the deployed compute device, ViewPro A40), the test must run on representative hardware OR a benchmarked replay; pure-emulator runs are NOT acceptable for L1–L9, T1–T3, Re1–Re2.
+- Where the AC depends on an external service (`../detections`, `missions`, Ground Station), the test runs against either (a) the real service in the suite e2e (`../e2e/docker-compose.suite-e2e.yml`), or (b) a recorded replay fixture for isolation tests. Both modes are valid; the test scenario states which.
+- Q-tagged rows (M4 → Q14, Mp5 → Q8, O8–O10 → Q9) depend on open architecture questions. Their tolerance ranges may sharpen once those questions resolve; the existence of each row is non-negotiable.
+- M1–M4 visual-reference bindings (`fixtures/movement/video0[1-4].mp4`) are usable for harness smoke testing but DO NOT satisfy the assertion semantics — paired `gimbal.csv` + `telemetry.csv` are required for ego-motion compensation to be verifiable. This is the single highest-priority fixture gap.
@@ -0,0 +1,90 @@
+# Fixture manifest
+
+All fixtures live **inside this workspace** so the autopilot repo is self-sufficient — downstream test runners must never reach into a sibling repo at `../`. When you add or refresh a fixture, update the matching SHA-256 in this manifest AND the rows in `../expected_results/results_report.md` that consume it.
+
+Total on-disk size: ~57 MB.
+
+## Files
+
+### Still-image aerial frames — `images/`
+
+Used as Tier-1 input frames for detection-quality assertions.
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `images/4d6e1830d211ad50.jpg` | 152 KB | `4c396495af64aaf9aac5ecb92431bf0c75db42b0bdb8e4eec1937f9995acee42` | `../detections/data/images/` (re-copied 2026-05-19) | L1, D6 |
+| `images/54f6459dbddb93d8.jpg` | 6.7 MB | `cd65c76a080ef72ce3528031f003f067fca6091c067a86d527a1ae91cd78be59` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
+| `images/6dd601b7d2dc1b30.jpg` | 1.4 MB | `45edd83a357a9f852e14e5845265cd09c20b4b99b1828c160cb3298f0e160181` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
+| `images/805bcf1e9f271a58.jpg` | 176 KB | `fe696899225fc04f2335e87acf6a3ad8a00cd3950c5940d5e73e5ce438f36257` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
+| `images/f997d0934726b555.jpg` | 232 KB | `5d1c9c551c0680e5b3d0aab261bca71e724c78f6db3580da598c680b4f7d4d79` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
+
+### Reconnaissance video — `videos/`
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `videos/94d42580bd1ad6ff.mp4` | 12 MB | `602b22a42515a754313551847caa6d6a6d7b3cde1d857cbd08ebc5543fb8cf7c` | `../detections/data/videos/` (re-copied 2026-05-19) | T3 (frame-rate floor scenario) |
+
+### Movement-detection clips — `movement/`
+
+Wide-area reconnaissance clips intended for movement-detection visual baselines. **Important**: these clips DO NOT have paired `gimbal.csv` / `telemetry.csv` files — ego-motion compensation assertions (M1–M4) cannot run against them. They are useful for visual harness work, frame-count assertions, and as visual reference for the movement-detection scenarios.
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `movement/video01.mp4` | 5.3 MB | `6f37186f5e9be97109db8d0d220df96d21cac9ce5b50b576234c6f7ee369d2bb` | local; provenance pre-existing in workspace | M1 (visual reference only — no telemetry) |
+| `movement/video02.mp4` | 5.9 MB | `7de7981e511e21e1e72f506d44541b44a4c27a995c9505ef8e3b48e69b416367` | local; provenance pre-existing in workspace | M2 (visual reference only — no telemetry) |
+| `movement/video03.mp4` | 6.1 MB | `df441164da7f37d715968212b95e9bf53c8e37384f20ddfab61cd6d0d18b4f3a` | local; provenance pre-existing in workspace | M3 (visual reference only — no telemetry) |
+| `movement/video04.mp4` | 5.8 MB | `36445bf1c86c5afa524000b5b2da7fc9cb3d39c745f9ad830b3d60f6868948e7` | local; provenance pre-existing in workspace | M4 (visual reference only — no telemetry) |
+
+### Semantic reference frames — `semantic/`
+
+Annotated reference examples for concealed-position semantic targets. **Not a graded eval set** — these are 4 hand-picked examples of footpath-to-concealment patterns, intended as visual reference for what the system should recognise. Detection-quality gates (D1, D3, D4, D5) need a full annotated multi-season eval set; these 4 PNGs are insufficient for those gates and serve as starter reference only.
+
+| File | Size | SHA-256 | Description | `results_report.md` rows |
+|---|---|---|---|---|
+| `semantic/semantic01.png` | 3.1 MB | `339ad4d35ab36052828f05652ab7249801bcd5d7bb04522f0ab9cbf6f0ca008a` | Footpath leading to branch-pile hideout in winter forest | D3, D4, D5 (starter only — full multi-season set still required) |
+| `semantic/semantic02.png` | 5.1 MB | `ffe3c49f5f1833724ce46083d212e714422e664b635cdd48b63311adefcd7b1f` | Footpath to FPV launch clearing, branch mass at forest edge | D3, D4, D5 (starter only) |
+| `semantic/semantic03.png` | 1.0 MB | `ce89c139815e9a80679237008f7cfc3039bbd53f162d48017e840ff91e57b109` | Footpath to squared hideout structure | D3, D4, D5 (starter only) |
+| `semantic/semantic04.png` | 1.3 MB | `b25c689b7aa543ec15858e4b5edfa32387ced4930130eb280d952c555f104e69` | Footpath terminating at tree-branch concealment | D3, D4, D5 (starter only) |
+| `semantic/data_parameters.md` | 2 KB | n/a (text) | Description of the four reference examples + the new YOLO primitive classes that motivate them | reference only |
+
+### Detection contract schemas — `schemas/`
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `schemas/expected_detections.json` | 1.4 KB | `ce60c105d697efe0359d2e6b1b46fc63e53d3789b067d53501f9c76aad9bd1ae` | `../e2e/fixtures/` (re-copied 2026-05-19) | D6 (sample Tier-1 response) |
+| `schemas/expected_detections.schema.json` | 2.4 KB | `a7174e0b083dcbf42fa8672acd3e1807d11ea0629cc636ff958a4d77168733b9` | `../e2e/fixtures/` (re-copied 2026-05-19) | D6 (JSON-schema for the Tier-1 contract) |
+
+### Database init script — `sql/`
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `sql/init.sql` | 3.7 KB | `b61e452c549f7b006db88d265f4346837e0a33d1abd4d977ebf3d48d8c943439` | `../e2e/fixtures/` (re-copied 2026-05-19) | suite-only reference; no autopilot AC row asserts against this |
+
+## Copy vs reference
+
+Fixtures were COPIED (not moved). The sibling repos still own the originals — keeping autopilot's copy in sync when an upstream changes is a manual chore today (the `monorepo-e2e` skill at the suite root will eventually own this drift; see `_docs/_process_leftovers/` if a sync is pending).
+
+When an upstream fixture changes:
+
+1. Recompute the SHA-256 in the source repo.
+2. Re-copy into the matching `fixtures/` subdirectory here.
+3. Update this manifest's SHA-256 column.
+4. If the change invalidates an assertion in `../expected_results/results_report.md`, fix the row's expected result too — do not let assertions drift silently against new data.
+
+## Gaps still pending fixture acquisition
+
+The authoritative per-service acquisition catalogue lives in `../services.md`. Summary of the still-open gaps (each is also tagged on its row in `../expected_results/results_report.md` with a structured `<DEFERRED: ...>` marker, and a `_docs/_process_leftovers/` entry records the replay obligation):
+
+| Gap | What's missing | Blocks AC rows | Acquisition status |
+|---|---|---|---|
+| Paired gimbal+telemetry CSVs for the 4 movement clips | `gimbal.csv` + `telemetry.csv` aligned to each video frame timestamps | M1–M4, tightens L6/L7 | **Confirmed unavailable today** (user 2026-05-19) — requires re-flight or new recording with gimbal-feedback channel captured |
+| Annotated eval set across all four seasons | Hundreds–thousands of labelled images per season for concealed-position + footpath gates | D1, D3, D4, D5 | needs annotation campaign (1.5 months at 5 hrs/day target per `semantic/data_parameters.md`) |
+| Per-zoom-band frame sequences | Same kind of clip as `movement/` but recorded at light, medium, and high zoom bands | tightens M2, L7, S2 | needs flight time + zoom-band metadata in the recorder |
+| Mock `missions` HTTPS exchanges | Recorded JSON request/response pairs for mission GET/POST + mapobjects GET/POST | Mp1–Mp5 | inline-authorable against the `mission-schema`; not yet authored |
+| Mock Ground Station session traces | Scripted timing trace (connect / push / drop / reconnect / lost-link) | R4, O8 | inline-authorable; not yet authored |
+| ArduPilot SITL traces | Recorded MAVLink streams for waypoint upload, geofence INCLUSION + EXCLUSION, RTL on lost-link, RTL on battery floor | R4, R5, R6, R7, R9 + project SITL conformance gate | needs SITL run |
+| Operator-command envelopes | Valid / expired / replayed / malformed envelopes under the chosen Q9 auth scheme | O9, O10 | **blocked on Q9** (`_docs/02_document/architecture.md §8`) |
+| VLM I/O pairs | Bounded ROI in → structured `VlmAssessment` out + schema-violation cases | L3, S5 | inline-authorable against the assessment schema once the local model is pinned |
+| GPS / NTP drift scenarios | Scripted offset / lock-loss traces | R8 | inline-authorable |
+
+When a fixture from this list lands, copy it under `fixtures/<category>/`, add a row to the relevant subsection above, and bind the matching `<DEFERRED>` row in `../expected_results/results_report.md` to its new local path.
@@ -0,0 +1,32 @@
+{
+  "$schema": "./expected_detections.schema.json",
+  "_meta": {
+    "fixture_version": "0.1.0-placeholder",
+    "video": "sample.mp4",
+    "video_sha256": "TBD-after-fixture-recording",
+    "model": {
+      "_comment": "Pinned model + classes that detections must run when this baseline applies. Refresh this block (and counts/bboxes below) whenever detections ships a new model.",
+      "name": "TBD",
+      "revision": "TBD",
+      "classes_source": "annotations/src/Database/DatabaseMigrator.cs (ids 0..18)"
+    },
+    "tolerance": {
+      "_comment": "Spec asserts ranges, not exact values. INT8 calibration drift can move pixel positions by a few units; absolute count can drift by ±1 across re-runs of the same engine on the same Jetson.",
+      "count_delta": 1,
+      "bbox_iou_min": 0.8,
+      "confidence_delta": 0.1
+    }
+  },
+  "expected": {
+    "total_annotations": 0,
+    "by_class": [
+      {
+        "class_id": 0,
+        "class_name": "ArmorVehicle",
+        "count": 0,
+        "bbox_samples": []
+      }
+    ],
+    "_placeholder_note": "Replace this block with the real baseline once sample.mp4 is recorded. Each entry under `by_class` carries: class_id, class_name (must match detection_classes.name), count, and bbox_samples (an array of {time_sec, center_x, center_y, width, height, confidence} entries the spec uses for IoU comparison)."
+  }
+}
@@ -0,0 +1,66 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "Suite e2e expected detections baseline",
+  "type": "object",
+  "required": ["_meta", "expected"],
+  "properties": {
+    "$schema": { "type": "string" },
+    "_meta": {
+      "type": "object",
+      "required": ["fixture_version", "video", "video_sha256", "model", "tolerance"],
+      "properties": {
+        "fixture_version": { "type": "string" },
+        "video": { "type": "string" },
+        "video_sha256": { "type": "string" },
+        "model": {
+          "type": "object",
+          "required": ["name", "revision", "classes_source"],
+          "additionalProperties": true
+        },
+        "tolerance": {
+          "type": "object",
+          "required": ["count_delta", "bbox_iou_min", "confidence_delta"],
+          "properties": {
+            "count_delta": { "type": "integer", "minimum": 0 },
+            "bbox_iou_min": { "type": "number", "minimum": 0, "maximum": 1 },
+            "confidence_delta": { "type": "number", "minimum": 0, "maximum": 1 }
+          }
+        }
+      }
+    },
+    "expected": {
+      "type": "object",
+      "required": ["total_annotations", "by_class"],
+      "properties": {
+        "total_annotations": { "type": "integer", "minimum": 0 },
+        "by_class": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "required": ["class_id", "class_name", "count"],
+            "properties": {
+              "class_id": { "type": "integer", "minimum": 0 },
+              "class_name": { "type": "string" },
+              "count": { "type": "integer", "minimum": 0 },
+              "bbox_samples": {
+                "type": "array",
+                "items": {
+                  "type": "object",
+                  "required": ["time_sec", "center_x", "center_y", "width", "height"],
+                  "properties": {
+                    "time_sec": { "type": "number", "minimum": 0 },
+                    "center_x": { "type": "number" },
+                    "center_y": { "type": "number" },
+                    "width": { "type": "number", "minimum": 0 },
+                    "height": { "type": "number", "minimum": 0 },
+                    "confidence": { "type": "number", "minimum": 0, "maximum": 1 }
+                  }
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+}
@@ -0,0 +1,45 @@
+# Semantic And Movement Detection Training Data
+
+# Source
+- Aerial imagery from reconnaissance winged UAVs at 600–1000m altitude
+- ViewPro A40 camera, 1080p resolution, various zoom levels
+- Extracted from video frames and still images
+- Movement detection requires frame sequences, not still images only; include camera/gimbal telemetry where available to separate target motion from UAV motion.
+
+# Target Classes
+- Footpaths / trails (linear features on snow, mud, forest floor)
+- Fresh footpaths (distinct edges, undisturbed surroundings, recent track marks)
+- Stale footpaths (partially covered by snow/vegetation, faded edges)
+- Concealed structures: branch pile hideouts, dugout entrances, squared/circular openings
+- Tree rows (potential concealment lines)
+- Open clearings connected to paths (FPV launch points)
+- Moving point/cluster candidates at wide or light/medium zoom
+
+# YOLO Primitive Classes (new)
+- Black entrances to hideouts (various sizes)
+- Piles of tree branches
+- Footpaths
+- Roads
+- Trees, tree blocks
+
+# Annotation Format
+- Managed by existing annotation tooling in separate repository
+- Expected: bounding boxes and/or segmentation masks depending on model architecture
+- Footpaths may require polyline or segmentation annotation rather than bounding boxes
+
+# Seasonal Coverage Required
+- Winter: snow-covered terrain (footpaths as dark lines on white)
+- Spring: mud season (footpaths as compressed/disturbed soil)
+- Summer: full vegetation (paths through grass/undergrowth)
+- Autumn: mixed leaf cover, partial snow
+
+# Volume
+- Target: hundreds to thousands of annotated images/sequences
+- Available effort: 1.5 months, 5 hours/day
+- Potential for annotation process automation
+
+# Reference Examples
+- semantic01.png — footpath leading to branch-pile hideout in winter forest
+- semantic02.png — footpath to FPV launch clearing, branch mass at forest edge
+- semantic03.png — footpath to squared hideout structure
+- semantic04.png — footpath terminating at tree-branch concealment
@@ -0,0 +1,104 @@
+-- Suite e2e database seed.
+--
+-- Loaded by the `db-seed` service in docker-compose.suite-e2e.yml after
+-- annotations has run its own DatabaseMigrator (which creates the schema +
+-- inserts the canonical detection_classes 0..18). This file therefore only
+-- adds rows that the e2e scenario depends on but the production runtime does
+-- NOT seed automatically.
+--
+-- Idempotency: every statement uses ON CONFLICT / IF NOT EXISTS so re-running
+-- the seed (e.g. on a `down -v` followed by `up`) lands the same final state.
+--
+-- Schema reference: annotations/src/Database/DatabaseMigrator.cs.
+
+\set ON_ERROR_STOP on
+
+-- Wait until annotations has populated its schema. The db-seed container starts
+-- only after postgres-local is healthy, but annotations may still be spinning
+-- up its tables. A bounded poll keeps the seed deterministic.
+DO $$
+DECLARE
+  attempt int := 0;
+BEGIN
+  WHILE attempt < 60 LOOP
+    PERFORM 1
+    FROM information_schema.tables
+    WHERE table_schema = 'public' AND table_name = 'detection_classes';
+    IF FOUND THEN
+      EXIT;
+    END IF;
+    PERFORM pg_sleep(1);
+    attempt := attempt + 1;
+  END LOOP;
+
+  IF attempt >= 60 THEN
+    RAISE EXCEPTION 'detection_classes table not found after 60s — annotations migration did not complete';
+  END IF;
+END $$;
+
+-- Default system_settings row. Annotations starts without one, but several
+-- spec assertions rely on `silent_detection = false` and known thumbnail dims
+-- so overlay rendering is reproducible.
+INSERT INTO system_settings (
+  id, name, military_unit,
+  default_camera_width, default_camera_fov,
+  thumbnail_width, thumbnail_height, thumbnail_border,
+  generate_annotated_image, silent_detection
+) VALUES (
+  '00000000-0000-0000-0000-00000000aaaa',
+  'azaion-suite-e2e',
+  'e2e-unit',
+  3840, 70,
+  240, 135, 10,
+  true, false
+) ON CONFLICT (id) DO NOTHING;
+
+-- Default directory_settings row. Annotations writes media files under the
+-- paths defined here; the e2e-runner doesn't read these directly but the
+-- service requires the row to exist on first hit.
+INSERT INTO directory_settings (
+  id, videos_dir, images_dir, labels_dir, results_dir,
+  thumbnails_dir, gps_sat_dir, gps_route_dir
+) VALUES (
+  '00000000-0000-0000-0000-00000000bbbb',
+  '/data/videos', '/data/images', '/data/labels', '/data/results',
+  '/data/thumbnails', '/data/gps_sat', '/data/gps_route'
+) ON CONFLICT (id) DO NOTHING;
+
+-- Default camera_settings row used by detections to size bbox-to-meters.
+INSERT INTO camera_settings (
+  id, altitude, focal_length, sensor_width
+) VALUES (
+  '00000000-0000-0000-0000-00000000cccc',
+  100, 50, 36
+) ON CONFLICT (id) DO NOTHING;
+
+-- Stable e2e user. The UUID is referenced by the spec when asserting
+-- annotation rows. Annotations does not own a `users` table — user identity
+-- is carried in JWTs minted with JWT_SECRET; the user_id here just needs to
+-- be deterministic and stable across runs.
+-- Stored in user_settings so the spec can `SELECT user_id` to confirm the
+-- seed ran.
+INSERT INTO user_settings (
+  id, user_id,
+  annotations_left_panel_width, annotations_right_panel_width,
+  dataset_left_panel_width,    dataset_right_panel_width
+) VALUES (
+  '00000000-0000-0000-0000-00000000dddd',
+  '00000000-0000-0000-0000-0000e2e2e2e2',
+  300, 400, 320, 320
+) ON CONFLICT (id) DO NOTHING;
+
+-- Sanity check — fail loudly if the canonical detection_classes are missing.
+-- annotations/src/Database/DatabaseMigrator.cs inserts ids 0..18 unconditionally.
+DO $$
+DECLARE
+  cnt int;
+BEGIN
+  SELECT COUNT(*) INTO cnt FROM detection_classes WHERE id BETWEEN 0 AND 18;
+  IF cnt < 19 THEN
+    RAISE EXCEPTION 'expected canonical detection_classes 0..18 (count=19), got %', cnt;
+  END IF;
+END $$;
+
+\echo 'suite-e2e seed complete'
@@ -0,0 +1,113 @@
+# External Services — Test-Mock Requirements
+
+Black-box catalogue of every external system autopilot depends on at runtime, with the **test-fixture / mock shape required for each**. Service-side design (protocols, component contracts, ownership boundaries) lives in `_docs/02_document/architecture.md` — this file owns ONLY the test-data dependency view (per `.cursor/rules/artifact-srp.mdc`, `_docs/00_problem/input_data/` is a test-data concern).
+
+Runtime input shapes (frame rates, message types) are described in `data_parameters.md`. This file extends them with the **acquisition status of the corresponding test fixture**.
+
+## Index
+
+| # | External system | Production role | Test-mock shape needed | Acquisition status |
+|---|---|---|---|---|
+| 1 | Tier-1 detection (`../detections`) | Primitive YOLO inference on every frame; returns class + bbox + confidence | Recorded bi-stream replay file (`request frame` → `response detections`) | **MISSING** — no replay recorded yet |
+| 2 | Mission planner (`missions` API) | Mission JSON pull at start; middle-waypoint POST on operator-confirm; pre-flight area-map pull + post-flight diff push | Mock HTTPS exchanges for GET/POST + sample mission + sample mapobjects state | **MISSING** — schema known (mission-schema), no fixture recorded |
+| 3 | Ground Station (modem) | Continuous push of camera + telemetry + bbox overlay; return path carries operator commands (confirm / decline / target-follow / abort) | Scripted session traces: nominal session, modem drop at T=N, reconnect at T=M, lost-link sustained ≥30 s | **MISSING** — authorable inline (no external dependency) |
+| 4 | Airframe autopilot (ArduPilot / PX4) | MAVLink v2 transport for the ~10–15 commands in `architecture.md §7.7`; battery + position telemetry; geofence enforcement | ArduPilot SITL traces: waypoint upload, geofence INCLUSION + EXCLUSION, RTL on lost-link, RTL on battery floor | **MISSING** — needs SITL run with scripted scenarios |
+| 5 | ViewPro A40 camera (frames) | H.264/265 1080p RTSP video feed at 30/60 fps | Recorded frame sequences (`.mp4`) — wide-zoom, light-zoom, medium-zoom, high-zoom variants | **PARTIAL** — 4 wide-zoom clips on disk (`fixtures/movement/video0[1-4].mp4`); zoom-band variants missing |
+| 6 | ViewPro A40 gimbal (control) | Vendor UDP control protocol; yaw / pitch / zoom telemetry per tick | Per-frame-sequence `gimbal.csv` paired with the matching video; per-tick yaw/pitch/zoom + timestamp | **MISSING** — no `gimbal.csv` paired with the 4 movement videos; ego-motion compensation (M1–M4) is unfalsifiable without this |
+| 7 | Deep-analysis VLM (local IPC) | Optional Tier-3 confirmation over bounded ROI; structured `VlmAssessment` response | Recorded I/O pairs (ROI in → `VlmAssessment` out) + schema-violation cases for fail-closed tests | **MISSING** — depends on the local model choice; can be authored against the assessment schema once the model is pinned |
+| 8 | Time source (GPS / NTP) | Wall-clock; drift triggers the R8 health-yellow gate | Scripted drift scenarios (no real GPS/NTP hardware needed) — clock offset, jump, source loss | **MISSING** — authorable inline |
+
+## Per-service detail — what acquisition would look like
+
+The table above is the index; the rows below explain the shape and acquisition path so the gaps can be planned out one at a time.
+
+### 1. Tier-1 detection replay (`../detections`)
+
+- Production transport: bi-directional gRPC. The autopilot streams frames out; `../detections` streams `Detections` messages back.
+- Mock shape: a `.replay` file (one per scenario) recording timestamped frames + the exact `Detections` responses the model emitted. Used by `detection_client` integration tests in isolation — no need to boot the real Tier-1 service.
+- Acquisition path: record one replay against the currently pinned `../detections` model. Re-record when the upstream model changes (the `monorepo-e2e` skill should eventually own this drift; see the suite's leftovers).
+- Blocks AC rows: every row that needs a deterministic detection stream — practically L1, L2, D1, D2, D6 in isolation; in suite-e2e mode these run live against the real `../detections`.
+
+### 2. Mission + MapObjects mock (`missions` API)
+
+- Production transport: HTTPS REST.
+- Mock shape: JSON fixtures per endpoint + a small mock HTTP server (or replay-style fixtures consumed by a test double). Endpoints in scope:
+  - `GET /missions/{id}` — mission JSON, validated against `mission-schema`.
+  - `POST /missions/{id}` — middle-waypoint insertion (200 OK + updated mission).
+  - `GET /missions/{id}/mapobjects` — pre-flight area-map pull (response shape: map-object records keyed by spatial cell; volume target ~10000 objects for the 30×30 km gate Mp1).
+  - `POST /missions/{id}/mapobjects` — post-flight diff push (NEW / MOVED / REMOVED / CONFIRMED-EXISTING; volume target per Mp3 ~17500 records).
+- Acquisition path: author JSON fixtures against the known schema; record real exchanges once `missions` is reachable from the test bench.
+- Blocks AC rows: Mp1–Mp5 (all 5 map-reconciliation rows).
+
+### 3. Ground Station session trace
+
+- Production transport: continuous push over modem (suite-level protocol).
+- Mock shape: scripted timing trace per scenario. Each scenario is a list of `(t, event)` pairs: connect, push frame, push telemetry, operator-click, modem-drop, reconnect, lost-link.
+- Acquisition path: authorable inline from `architecture.md §7` and `acceptance_criteria.md §Reliability & Safety`. No external dependency — just a fixture generator.
+- Blocks AC rows: R4 (lost-link → RTL at 30 s); O8, O9, O10 (operator command lifecycle on the return path, **but** O9/O10 also depend on Q9 for the auth scheme).
+
+### 4. MAVLink SITL trace
+
+- Production transport: MAVLink v2 over UDP or serial.
+- Mock shape: ArduPilot SITL recording capturing the autopilot's command stream + the airframe's response stream. One trace per scenario: waypoint upload, geofence INCLUSION violation, geofence EXCLUSION violation, lost-link RTL, battery RTL-floor RTL, battery hard-floor land-now.
+- Acquisition path: run ArduPilot SITL with a scripted mission; capture the full MAVLink stream with mavlink-router or equivalent.
+- Blocks AC rows: R4 (RTL exact timing), R5, R6, R7, R9; plus the project-level "MAVLink command surface MUST pass SITL conformance" gate.
+
+### 5. Camera frame sequences (ViewPro A40)
+
+- Production transport: RTSP/RTP over TCP/UDP, 1080p H.264/265 at 30/60 fps.
+- Current local fixtures: `fixtures/movement/video0[1-4].mp4` (4 clips, ~5–6 MB each), `fixtures/videos/94d42580bd1ad6ff.mp4` (one reconnaissance clip used for T3 frame-rate floor).
+- Mock-shape gap: zoom-band coverage. Each AC scenario that names a zoom level (wide, light, medium, high) needs a representative clip at that zoom band. The 4 movement clips do not enumerate which zoom band each represents — this needs documenting per clip OR re-recording with zoom-band labels.
+- Acquisition path: existing clips usable for movement-detection visual baselines; new recordings at each zoom band require flight time.
+
+### 6. Gimbal telemetry CSV (paired with frames)
+
+- Production transport: ViewPro A40 vendor protocol over UDP; per-tick yaw/pitch/zoom updates.
+- Mock shape: `gimbal.csv` with columns `(t, yaw_deg, pitch_deg, zoom_band, focal_mm)`, one CSV per video file, timestamps aligned to frame timestamps within ≤ 1 frame.
+- Acquisition path: requires re-flying the recording with the gimbal-feedback channel captured alongside. CANNOT be back-fitted to existing videos.
+- Blocks AC rows: M1, M2, M3, M4 (movement-detection ego-motion compensation); also tightens L6, L7 (movement candidate enqueue latency).
+- **Confirmed not available today (user-stated 2026-05-19).**
+
+### 7. VLM I/O pairs
+
+- Production transport: Unix-domain socket IPC to local-onboard VLM (NanoLLM / VILA1.5-3B per architecture §1).
+- Mock shape: paired `(roi.png, prompt.txt, vlm_response.json)` per scenario + a small set of schema-violation cases (truncated JSON, wrong field types, missing required fields) for fail-closed tests.
+- Acquisition path: depends on the local VLM model choice. Once pinned, capture real I/O during a flight or scripted run; schema-violation cases authored inline.
+- Blocks AC rows: L3 (Tier-3 ≤5 s latency on bounded ROI), S5 (deep-analysis hold-cap interaction).
+
+### 8. Operator-command envelopes
+
+- Production transport: comes back to autopilot via Ground Station modem return path.
+- Mock shape: per envelope, a `(scheme, payload, signature, sequence_id)` tuple. One fixture per case: valid, expired, replayed (same envelope sent twice), malformed (signature mismatch), unsigned.
+- Acquisition path: **blocked on Q9** (operator-command auth scheme — open in `_docs/02_document/architecture.md §8`). Once the scheme is chosen, envelopes are authorable inline.
+- Blocks AC rows: O9 (replay protection), O10 (signature validation); strengthens O8 (confirm pathway).
+
+### 9. GPS / NTP drift scripts
+
+- Production transport: kernel-level wall clock + GPS lock state.
+- Mock shape: scripted offset injection — bump the clock by N ms, drop GPS lock, change time source.
+- Acquisition path: authorable inline; no external dependency.
+- Blocks AC rows: R8.
+
+## Coverage summary by service
+
+| Service | Rows covered (real fixture) | Rows blocked on this service | Acquisition priority |
+|---|---|---|---|
+| Tier-1 replay | L1, D2, D6 (live; replay desirable for isolation) | none independently blocked | low (can use live `../detections` in suite-e2e) |
+| `missions` mock | none | Mp1–Mp5 (5 rows) | medium |
+| Ground Station trace | none | R4, O8 (2 rows) | low (inline-authorable) |
+| MAVLink SITL | none | R4, R5, R6, R7, R9 (5 rows) + project conformance gate | high |
+| Frame sequences | L1 (with image), T3 (with video) | enriches L6/L7 with telemetry | medium |
+| Gimbal CSV | none | M1–M4 (4 rows) + L6, L7 | **high — explicit user gap** |
+| VLM I/O pairs | none | L3, S5 (2 rows) | low (model-choice gated) |
+| Operator envelopes | none | O9, O10 (2 rows) | blocked on Q9 |
+| GPS/NTP drift | none | R8 | low (inline-authorable) |
+
+Per-row binding lives in `expected_results/results_report.md`. The status of each gap is mirrored in `_docs/_process_leftovers/` so the next `/autodev` run can replay the missing-fixture decision.
+
+## What this file does NOT own
+
+- Component design (how `detection_client` talks to Tier-1, how `mission_client` retries, etc.) — `_docs/02_document/architecture.md` and `_docs/02_document/components/*/description.md`.
+- Production data shapes (frame rate, MAVLink message types) — `data_parameters.md` already has these.
+- AC text — `_docs/00_problem/acceptance_criteria.md`.
+- The choice of which mocks to use during a given test run (live vs replay vs scripted) — `_docs/02_document/tests/` (test strategy doc, authored by `/test-spec` Phase 2).
@@ -0,0 +1,55 @@
+# Problem
+
+## What is being built
+
+`autopilot` is the onboard mission executor for a reconnaissance winged UAV. It runs on the airframe's edge compute device. It receives a mission from outside, controls the airframe, drives the camera + gimbal to inspect terrain, and feeds a remote human operator with everything the operator needs to confirm or decline each candidate target.
+
+## What problem it solves
+
+The reconnaissance UAV detects vehicles and military equipment well enough today, but the current high-value targets are **camouflaged positions** — FPV-operator hideouts, hidden artillery emplacements, dugouts masked by branches. These cannot be found by visual similarity to known object classes alone.
+
+Three observation gaps must be closed:
+
+- **Visual sweep coverage** — the camera must follow the planned route and keep eyes on the terrain it overflies, not only on already-known targets.
+- **Movement detection on a moving camera platform** — small movers must be surfaced as they appear, even while the airframe and gimbal are themselves moving and even at higher zoom levels.
+- **Context-aware target recognition** — a candidate position has to be assessed against scene context (footpaths arriving at it, fresh-vs-stale tracks, concealment patterns), not just shape.
+
+For every candidate it does surface, the system must reach a human operator quickly enough to act, without overwhelming the operator with too many candidates at once, and with confidence-scaled urgency so high-confidence targets are not lost to a low-confidence noise queue.
+
+## Who uses it
+
+- **Operators** — single primary, optional remote secondary. They see camera feed + telemetry + candidate overlays in a browser at a Ground Station and respond with confirm / decline / target-follow / abort. Their decisions must be authenticated, signed, and replay-protected because the radio link is hostile territory.
+- **Mission planners** — define the mission region and consume the post-mission diff of what was found.
+- **Airframe / Ground-Station crews** — depend on the system to safely abort or RTL when the operator link is lost, and to refuse takeoff if the system is not in a flight-ready state.
+- **Suite operations** — need to know when the airframe is in flight so that other ground-side housekeeping (model updates, OTA) does not interfere.
+
+## The operational reality this problem lives in
+
+Stated as fact, not as a design choice. (Design lives in `_docs/01_solution/solution.md` and `_docs/02_document/architecture.md`.)
+
+- The airframe is a reconnaissance winged UAV flying at 600–1000 m altitude.
+- Missions cover all four seasons and all common terrain types (winter snow, spring mud, summer vegetation, autumn; forest, open field, urban edges, mixed terrain).
+- The link between the airframe and the Ground Station is a modem radio that can degrade or drop entirely mid-flight; the system has to keep flying safely when this happens.
+- The operator is remote, watches a browser UI on the Ground Station, and is not co-located with the airframe.
+- Primitive (Tier 1) object detection is the responsibility of a separate service running alongside the autopilot on the same compute, accessible over a local interface — this split is fixed at the suite level, not something autopilot can choose.
+- Mission state and the area-level map of previously-seen objects come from a separate `missions` service over the network and are reconciled before takeoff and after landing.
+
+## What this system is NOT for
+
+(Scope-clarifying so the reader does not project unrelated concerns onto autopilot.)
+
+- Multi-airframe coordination, fleet management, swarm logic.
+- Mission planning across regions.
+- GPS-denied navigation algorithms (a separate suite service provides corrected GPS).
+- Annotation tooling, model training, dataset curation.
+- The operator browser UI itself (the Ground Station hosts it; autopilot feeds it).
+- Cloud-hosted inference of any kind.
+
+## Where to read further
+
+- `_docs/00_problem/restrictions.md` — the hard constraints (hardware, environment, regulatory).
+- `_docs/00_problem/acceptance_criteria.md` — measurable success criteria.
+- `_docs/00_problem/security_approach.md` — threat model + security non-negotiables.
+- `_docs/00_problem/input_data/` — runtime inputs + test fixture references.
+- `_docs/01_solution/solution.md` — the chosen solution shape (component breakdown, tech stack rationale).
+- `_docs/02_document/architecture.md` — the full architectural design.
@@ -0,0 +1,54 @@
+# Restrictions
+
+Externally imposed constraints the system MUST satisfy. Design choices — even frozen ones — live in `_docs/02_document/architecture.md`, not here. (Audited against `.cursor/rules/artifact-srp.mdc`.)
+
+## Hardware (fixed at the suite level — autopilot does not choose)
+
+- Compute device: **Jetson Orin Nano Super** (aarch64), 67 TOPS INT8, **8 GB shared LPDDR5**. Tier 1 detection consumes ~2 GB of that, leaving ~6 GB for everything autopilot owns.
+- Primary camera: **ViewPro A40**. 1080p (1920×1080), 40× optical zoom, f=4.25–170 mm, Sony 1/2.8" CMOS (IMX462LQR), HDMI or IP output at 1080p 30/60 fps. The A40's vendor control protocol is the only way to drive its pan/tilt/zoom — autopilot must speak it.
+- Alternative camera: **ViewPro Z40K** (higher cost; the system must remain compatible).
+- Thermal sensor (640×512, NETD ≤50 mK) may be added later; the system must not assume it is present today.
+- 40× optical zoom traversal takes 1–2 s wall-clock. Any sub-2-second zoom-out → zoom-in product behaviour must account for this physical floor.
+
+## Operational
+
+- Flight altitude: 600–1000 m.
+- All seasons in scope: winter snow, spring mud, summer vegetation, autumn. Winter-first-only is rejected (frozen 2026-05-06).
+- All terrain types in scope: forest, open field, urban edges, mixed terrain.
+- The operator/Ground-Station radio link is a modem with intermittent reliability — the system must tolerate degradation and full loss mid-flight.
+
+## Software environment (externally imposed)
+
+- The chosen onboard inference path must run on Jetson Orin Nano Super within the 6 GB residual RAM budget (after Tier 1).
+- **Models use FP16 precision** (frozen 2026-05-06; INT8 is rejected for MVP). Applies to every model loaded onto Jetson.
+- **No cloud egress for inference.** Any model larger than the in-binary footprint must run locally on the same Jetson, not in the cloud. Network calls for inference are forbidden.
+- Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant. (This is a hardware-resource fact; how the system serialises them is design.)
+- The mission file format is the shared `mission-schema` artefact owned jointly by autopilot and the `missions` service. Autopilot MUST consume that schema; it cannot fork it.
+
+## Suite-level architectural splits (autopilot does not own these decisions)
+
+- Tier 1 primitive object detection runs in the sibling **`../detections`** service. Autopilot consumes its output; autopilot does NOT host Tier 1.
+- Mission state (waypoints, region, etc.) comes from the **`missions`** service. Autopilot does not author missions.
+- Central map of previously-detected objects lives in **`missions`** (extension `/missions/{id}/mapobjects`). Autopilot reconciles with it pre-flight and post-flight; in-flight, autopilot is authoritative for its mission's area.
+- GPS coordinates come from a separate **GPS-denied service** (`../gps-denied-onboard` / `../gps-denied-desktop`). Autopilot does NOT implement GPS-denied algorithms.
+- Operator browser UI is owned by the **Ground Station**. Autopilot pushes the data; it does NOT render the UI.
+- Annotation tooling + model training live in **separate repos** (`../annotations`, `../ai-training`). Autopilot does NOT own them.
+
+## Reliability & Safety obligations (mandatory)
+
+These are existence-of-the-rule constraints. The specific numeric thresholds (RTL grace, drift bound, retry count) are measured success criteria and live in `acceptance_criteria.md`.
+
+- **Pre-flight self-test (BIT) MUST gate takeoff.** The airframe must not take off until every dependency the mission needs is verifiably healthy or the operator has explicitly accepted a known degraded state (e.g. cached MapObjects fallback).
+- **Lost operator-link failsafe MUST be deterministic and bounded.** Loss of the operator/Ground-Station radio link cannot result in undefined behaviour. The eventual outcome must be a known mission-safe state (RTL by default, configurable per mission).
+- **Airframe MAVLink link loss MUST surface health-red immediately** and defer behaviour to the autopilot stack on the airframe (ArduPilot / PX4).
+- **Battery / fuel thresholds MUST trigger pre-defined safety behaviour** (RTL above a soft floor; land-now below a hard floor). Only operator override may bypass.
+- **Geofence enforcement MUST be symmetric** — both INCLUSION and EXCLUSION polygons honoured.
+- **Operator commands MUST be authenticated, signed, and replay-protected.** Modem-link encryption alone is not sufficient. (Threat model + open scheme choice live in `security_approach.md`.)
+- **On-device storage MUST be bounded.** Persistent-store full is a takeoff-blocker; mid-flight eviction policy is mandatory.
+- **No silent error swallowing.** Every dependency state MUST surface through a health endpoint.
+- **Wall-clock MUST be bound to GPS time once GPS is locked, or NTP at boot.** Forensic timestamping of operator commands depends on this.
+- **MAVLink command surface MUST conform** to whatever ArduPilot/PX4 actually accepts (SITL is the conformance reference). Inventing MAVLink semantics is not permitted.
+
+## Out of scope — see `problem.md → "What this system is NOT for"`
+
+Scope-exclusion statements are owned by `problem.md`. Not duplicated here.
@@ -0,0 +1,52 @@
+# Security Approach
+
+Threat model + non-negotiable security principles. Specific schemes / libraries / algorithms (HMAC vs ed25519, Unix-domain socket peer-cred mechanism, etc.) are design choices and live in `_docs/02_document/architecture.md` + per-component specs. (Audited against `.cursor/rules/artifact-srp.mdc`.)
+
+## Threat model
+
+The autopilot runs onboard a flying UAV. The threats it must defend against on the MVP timeline:
+
+1. **Hijack of operator commands over the radio link.** Even with modem-level link encryption, an attacker who acquires session state could replay a confirm / decline / target-follow / abort command and seize the system's behaviour. The radio link is hostile territory; link encryption alone cannot be the entire defence.
+2. **Crafted input payloads** (image / video crops sent to onboard models, malformed messages on the airframe link, oversize attachments to any onboard service) exploiting decoders, memory bugs, or causing resource exhaustion.
+3. **Unstructured model output** corrupting downstream decisions and producing false operator-facing confidence (e.g. a free-form VLM text response treated as a trusted downstream API).
+4. **Mid-flight peer spoofing** — a fake sibling service (Tier 1 detection, mission service, or any local IPC peer) impersonating a trusted dependency.
+5. **Forensic / audit gaps** — wall-clock drift breaking operator-command timestamping, post-mission diff attribution, or replay-protection windowing.
+
+**Out of scope** (lives elsewhere in the suite or is not relevant to the airborne payload):
+
+- Cloud-hosted secret management — autopilot does not call cloud services.
+- Multi-tenancy — single mission per flight; single operator-or-paired-operator session per flight.
+- Web-attack surface — the operator browser UI lives in the Ground Station, not in autopilot.
+- OTA update signing — Watchtower at the suite level owns it; autopilot only consumes signed images.
+
+## Non-negotiable security principles
+
+These are existence-of-the-rule constraints. The chosen mechanism for each is a design decision and lives in `_docs/02_document/architecture.md`.
+
+- **Operator commands MUST be authenticated, signed, and replay-protected.** Every confirm / decline / target-follow / abort command MUST carry a session-bound, replay-resistant signature that is validated before any state change. Failures are logged at WARN+ and dropped silently from the system's state machine; they are never permitted to take effect.
+- **No cloud egress for inference.** Tier 2 + Tier 3 (if enabled) MUST run on the same compute as the rest of autopilot. No HTTP / external network call originating from autopilot for inference is permitted.
+- **No silent error swallowing for security-relevant failures.** Signature invalid, peer-credential mismatch, schema violation, oversize payload rejected — each MUST surface through the health endpoint and the structured log.
+- **Bounded input for any model call.** Crop size + format allow-list + patched image decoders. Crafted-input and resource-exhaustion mitigation is mandatory; "accept anything and hope the decoder handles it" is not acceptable.
+- **Schema validation for any non-deterministic model output.** Free-form generative output (e.g. VLM text) MUST be projected onto a fixed structured schema before it crosses any decision boundary inside autopilot. Schema violation MUST fail closed.
+- **Local IPC peer authorisation.** Any onboard IPC peer that autopilot trusts MUST be identifiable as the expected local process (not just "anyone who can reach the socket"). The mechanism is a design choice.
+- **Health endpoint MUST reflect security state.** Pre-flight BIT covers reachability + warm-up of every external dependency; the same endpoint surfaces in-flight security signals (repeated signature failures, peer-credential mismatch, schema-violation rate).
+- **Wall-clock binding requirement.** Operator-command timestamping requires a trusted clock source. Wall-clock MUST be bound to GPS time once GPS is locked, or NTP at boot. Both sources MUST be recorded with `clock_source` + `last_sync_at`. Drift > 200 ms surfaces health yellow (the AC enforces the threshold; this rule mandates the binding).
+- **Airframe MAVLink integrity.** Whether the airframe link MUST use MAVLink-2 message signing depends on whether the link is physically isolated. If it is not physically isolated, message signing MUST be enabled. (The decision and the mechanism are tracked as Q6 in `architecture.md §8`.)
+
+## What this system does NOT own
+
+- Modem-link encryption setup — handled at the radio layer below autopilot.
+- Suite-wide TLS / certificate provisioning — delegated to suite-level deployment (`../_infra/`).
+- OTA update signing — Watchtower; autopilot consumes already-signed images. Boot-time self-check + rollback policy is an open suite-level question (Q10 in `architecture.md §8`).
+- Annotation / training-data security — lives in the `ai-training` repo.
+- Operator browser UI auth — Ground Station owns it; the modem-side handshake is jointly specified per the operator-command auth scheme (Q9).
+
+## Open security decisions (tracked in `_docs/02_document/architecture.md §8`)
+
+- **Q6** — MAVLink-2 message signing on the airframe link.
+- **Q9** — Operator-command authentication scheme (HMAC / ed25519 / MAVLink-2-extension / separate envelope).
+- **Q10** — Software rollback policy on the airframe (boot-time self-check, A/B partition, watchdog rollback).
+- **Q11** — Multi-operator session policy (single active operator vs quorum).
+- **Q12** — Comms blackout during banking turns (tolerate vs suppress lost-link failsafe during known turn arcs).
+
+None of these block the rest of the design. Each affected component spec calls out the question it depends on and the temporary contract used until the question resolves.