[AZ-626] Decompose complete: 47 tasks + docs + module layout

Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-19 11:02:01 +03:00
parent f7d6cb4a3a
commit bc40ea7300
235 changed files with 12585 additions and 15097 deletions
+58
View File
@@ -0,0 +1,58 @@
# Input Data
Runtime inputs the autopilot consumes when flying, plus reference fixtures + expected-output assertions for tests. **All fixtures live inside this workspace** (`fixtures/`) — never reach into sibling repos at `../` for inputs. The autopilot repo is self-sufficient.
## Layout
| Path | Owns |
|---|---|
| `data_parameters.md` | Description of runtime input shapes (camera, telemetry, gRPC, mission JSON, operator commands, VLM IPC) + the categories of reference data tests need + Tier-1/Tier-2 class catalogue. |
| `services.md` | Per-external-service test-mock requirements: what shape of mock/fixture each of the 7 external systems needs and the acquisition status of each. |
| `fixtures/README.md` | File-by-file manifest of every fixture in this directory: SHA-256, size, upstream provenance, which `expected_results/results_report.md` rows consume it. |
| `fixtures/images/` | Real aerial frames (5 images, ~9 MB total) — Tier-1 inputs for detection-quality assertions (L1, D2, D6). |
| `fixtures/videos/` | Real reconnaissance video (1 clip, 12 MB) for frame-rate floor + sequence tests (T3). |
| `fixtures/movement/` | Wide-area movement-detection visual reference clips (4 clips, ~23 MB total). **No paired `gimbal.csv` / `telemetry.csv`** — ego-motion compensation (M1M4) cannot run against these alone. |
| `fixtures/semantic/` | Concealed-position semantic reference frames (4 PNGs, ~11 MB total) + `data_parameters.md` describing the new YOLO primitive classes the examples motivate. **Starter set only**, not a graded eval set. |
| `fixtures/schemas/` | Detection-result contract schemas (JSON + JSON-schema) for D6. |
| `fixtures/sql/` | Database init script — reference only; not directly asserted by an autopilot AC. |
| `expected_results/results_report.md` | The input → quantifiable-expected-output mapping consumed by `/test-spec` Phase 1. Every row keys off an AC in `../acceptance_criteria.md`; deferred rows carry a structured `<DEFERRED: <shape>; ref <pointer>>` tag. |
## Why fixtures are local
The autopilot repo MUST be self-sufficient — a developer with only the autopilot clone (no parent suite checked out) MUST be able to run the test specifications. Cross-repo `../` paths are forbidden in `results_report.md` and in any test runner script. When a sibling repo (`../detections/`, `../e2e/`, `../missions/`, etc.) is the upstream source of a fixture, we **copy** it in and SHA-pin it in `fixtures/README.md` so upstream drift is detectable.
## Suite-level coupling that still matters
Even though fixtures are local, the underlying contracts the fixtures embody come from suite-level decisions. When those decisions change, the fixtures here go stale:
- **Tier-1 detection model / classes** — when `../detections` ships a new model the `expected_detections.json` baseline goes stale; D1, D2, D6 rows in `results_report.md` must be re-recorded.
- **`mission-schema`** — shared between autopilot and the `missions` repo. Schema changes break the mission JSON contract; the mock fixtures for Mp1Mp5 (when authored) must re-pin.
- **Detection classes catalogue** — class IDs 0..18 are governed at the suite level. Autopilot's normalised-box output uses the same IDs. The 5 new Tier-1 classes documented in `data_parameters.md → "Class catalogue"` must land in the suite catalogue before D1 can be measured.
Today these couplings are tracked manually. The `monorepo-e2e` skill at the suite root will eventually own the drift detection.
## Fixture gaps and the project policy on `/test-spec` Phase 3
`/test-spec` Phase 3 has a **hard 75% coverage gate** on rows with real input fixtures + real expected results. Today's coverage is well below that gate (see `expected_results/results_report.md → "Coverage Status"`). **Project policy as of 2026-05-19**: rather than block the autodev flow at the gate, each deferred row is registered with a structured `<DEFERRED: <shape>; ref <pointer>>` tag in `results_report.md`, pointing at the per-service acquisition path in `services.md` or at an open architecture question (Q-tag). Deferred rows become **release-gate items**, not development-gate items. The `acceptance_criteria.md → "Acceptance Gates (project-level)"` hardware/replay benchmark requirement remains a hard release blocker.
Summary of open gaps (authoritative list lives in `services.md` and `fixtures/README.md`):
1. **Paired `gimbal.csv` + `telemetry.csv` for the 4 movement clips** — highest priority (blocks M1M4 + tightens L6/L7). **User-confirmed unavailable today (2026-05-19).**
2. Annotated multi-season eval set (concealed positions + footpaths).
3. Mock `missions` API exchanges + mock `/mapobjects` round-trip.
4. Mock Ground Station session traces.
5. ArduPilot SITL traces.
6. Operator-command envelopes (blocked on Q9).
7. VLM I/O pairs.
8. GPS / NTP drift scripts.
Closing each gap is its own workstream tracked in Jira; the autodev flow does not block on them.
## Adding new fixtures
1. Drop the file under `fixtures/<images|videos|movement|semantic|schemas|sql|gimbal|telemetry|mavlink|vlm|operator|mapobjects>/<descriptive-name>.<ext>` — create the subdirectory if it does not exist.
2. Compute SHA-256 (`shasum -a 256 <file>`).
3. Add a row to the matching subsection in `fixtures/README.md` (file path, size, SHA, upstream provenance, which `results_report.md` rows consume it).
4. Replace the matching `<DEFERRED: ...>` placeholder(s) in `expected_results/results_report.md` with the local path `fixtures/<...>`.
5. If the fixture replaces a service mock, also update `services.md → "Coverage summary by service"` to reflect the new acquisition status.
6. If the fixture is binary and large (> 50 MB) consider gitignoring it + adding an acquisition script per the e2e pattern; for everything in the current set, direct commit is fine.
@@ -0,0 +1,101 @@
# Input Data Parameters
Describes the **categories of input data** the system consumes at runtime, and the **categories of reference data** tests need. Internal component names, programming languages, IPC mechanisms, schema class names, and specific model choices are design and live in `_docs/02_document/architecture.md` — they do not belong in this file (per `.cursor/rules/artifact-srp.mdc`).
Local fixtures live in `fixtures/`; see `fixtures/README.md` for the manifest. External-service test-mock requirements live in `services.md`; the per-row binding to AC criteria lives in `expected_results/results_report.md`.
## Runtime inputs (what the system consumes when flying)
| Input | Source | Format | Cadence | Notes |
|---|---|---|---|---|
| Camera frames | ViewPro A40 (or alternative ViewPro Z40K) | H.264 / H.265 over RTSP, 1080p (1920×1080) | 30 / 60 fps | Frame timestamps are mandatory. |
| Primitive (Tier 1) detection responses | `../detections` service over a bi-directional streaming RPC contract | Bounding boxes with class id, confidence, normalised coordinates | Per frame | Same boxes feed Tier-2 ROI selection and the operator overlay. |
| UAV telemetry | Airframe via MAVLink v2 (UDP or serial) | MAVLink messages: position, attitude, velocity, battery, link health, GPS fix | ≥1 Hz (10 Hz target) | Source-of-truth for ego-motion compensation. |
| Gimbal feedback | ViewPro A40 vendor protocol over UDP | Yaw / pitch / zoom angle telemetry | per-tick | Source-of-truth for camera-pose compensation. |
| Mission JSON | `missions` service via HTTPS REST | Shared `mission-schema` JSON | Once at mission start + middle-waypoint updates | Validated against the shared schema. |
| Area-level map state | `missions` service extension `/missions/{id}/mapobjects` (GET) | Map-object records keyed by spatial cell | Once at mission start | Hydrates the system's local copy of the area map; cache-fallback on timeout. |
| Operator commands | Ground Station via modem (return path of the outbound telemetry stream) | Authenticated + signed + replay-protected command envelope (scheme open per Q9) | Event-driven | confirm / decline / target-follow start / target-follow release / abort. |
| Deep-analysis responses (optional) | Local-onboard model accessed via local IPC | Structured assessment schema (validated) | Per zoomed-in endpoint hold (when deep-analysis is enabled) | Schema-violation fails closed. |
## Class catalogue (Tier-1 + Tier-2)
Detection-quality acceptance criteria (`acceptance_criteria.md → Detection Quality`) are evaluated against a class catalogue that combines pre-existing suite-level classes with new autopilot-driven additions. Class IDs are governed at the suite level (`../detections` owns the catalogue); autopilot only consumes the IDs.
### New Tier-1 (YOLO primitive) classes — to be added to the suite catalogue
| # | Class name | Annotation hint | Motivated by |
|---|---|---|---|
| 1 | Black entrances | Bounding box; various sizes (small hideout openings to dugout entrances) | Concealed-position detection (D3, D4) |
| 2 | Branch piles | Bounding box | Concealment material around hideouts (D3, D4) |
| 3 | Footpaths | **Polyline / segmentation preferred over bbox** for linear features | Footpath recall gate (D5) |
| 4 | Roads | Polyline / segmentation | Distinguishing roads from footpaths in the same scene |
| 5 | Trees / tree blocks | Bounding box; tree-block annotation may use larger box for clusters | Concealment-context anchor; reduces false positives around tree-rows in movement detection (M1) |
### Tier-2 semantic attributes — composed by `semantic_analyzer`, NOT added to YOLO catalogue
| # | Attribute | Composed from | Used by |
|---|---|---|---|
| 1 | Footpath freshness (fresh / stale) | Footpath bbox + texture/edge analysis + seasonal context | Decision-window scoring, D5 partial coverage |
| 2 | Concealed-structure inference | Black-entrance + branch-piles + footpath-approach proximity | POI surfacing for D3/D4 (the structure itself is composed, not directly labelled) |
| 3 | Open clearing connected to path | Cleared-terrain texture + footpath endpoint | FPV-launch-point flagging |
### Existing classes (already in the suite catalogue)
The existing-class baseline (P=0.816, R=0.852 per the AC) covers the suite's pre-autopilot class set (vehicles, military equipment, etc.). Autopilot must not degrade these — see D2.
### Reference for IDs
The 19-id catalogue (0..18) is owned by `../detections`. Autopilot's normalised-box output uses the same IDs. When `../detections` ships a new model or renumbers IDs, the `expected_detections.json` baseline goes stale and D1, D2, D6 rows must be re-recorded.
## Reference data needed for testing
### Local fixtures already on disk
See `fixtures/README.md` for the SHA-pinned manifest. Categorised summary:
| Local fixture category | Files | Purpose | Bound to AC rows |
|---|---|---|---|
| `fixtures/images/*.jpg` | 5 aerial frames | Tier-1 detection contract; existing-class regression; normalised-box conformance | L1, D2, D6 |
| `fixtures/videos/94d42580bd1ad6ff.mp4` | 1 reconnaissance clip | Frame-rate floor scenario, reserved for future movement-sequence tests | T3 |
| `fixtures/schemas/expected_detections.{json,schema.json}` | 2 schema files | Detection-result contract shape reference | D6 |
| `fixtures/sql/init.sql` | 1 SQL file | Suite-e2e DB seed reference | (suite-only; no autopilot AC) |
| `fixtures/movement/video0[1-4].mp4` | 4 wide-area clips | Visual reference for movement-detection scenarios — **no paired telemetry CSVs**, ego-motion assertions unfalsifiable until those land | M1M4 (visual reference only) |
| `fixtures/semantic/semantic0[1-4].png` | 4 reference frames | Visual reference for concealed-position semantic targets — **starter set only, not a graded eval set** | D3, D4, D5 (starter only) |
### Reference shapes still needed but not yet on disk
The per-service mock catalogue is in `services.md` (authoritative). Summary of categories tests need:
| Reference shape | Why it's needed | See |
|---|---|---|
| Frame sequences with synchronised `gimbal.csv` + `telemetry.csv` | Ego-motion compensation at zoom-out AND zoomed-in inspection | `services.md §6 Gimbal telemetry CSV` |
| Concealed-position image set across all four seasons (annotated) | Concealed-position recall ≥60% and precision ≥20% | `services.md §5 Camera frame sequences` |
| Footpath sequences (fresh, stale, all four seasons, polyline-annotated) | Footpath recall ≥70% | `services.md §5` |
| New-class evaluation set (5 new classes above) | New-class per-class P/R ≥80% without degrading existing-class performance | `services.md §1 Tier-1 detection replay` (plus annotation campaign owned by `../ai-training` repo) |
| Mock Tier-1 streaming-RPC replays | Detection-consumer isolation tests | `services.md §1` |
| Mock Ground Station session traces | Lost-link failsafe ladder + operator-link reconnect | `services.md §3` |
| MAVLink SITL traces | MAVLink conformance + waypoint insertion + geofence enforcement | `services.md §4` |
| Mock central area-map service responses | Pre-flight pull / post-flight push round-trip; conflict cases (Q8) | `services.md §2` |
| Operator-command envelopes | Signature + replay-protection tests (once Q9 resolves) | `services.md §8` |
| VLM I/O pairs | Bounded ROI inputs + structured assessment outputs + schema-violation cases | `services.md §7` |
| GPS / NTP drift scenarios | Wall-clock drift health-yellow gate | `services.md §9` |
## Data volume targets
- Training data: hundreds to thousands of annotated images/sequences total.
- Seasonal coverage: winter (snow), spring (mud), summer (vegetation), autumn (mixed leaf + partial snow).
- Available assembly effort: 1.5 months at 5 hours/day.
- Movement detection requires **frame sequences** (not still images only) with synchronised camera + gimbal + UAV telemetry.
- Footpaths require polyline or segmentation annotation rather than bounding boxes (see "Class catalogue" above).
## Gaps that block `/test-spec` downstream
`/test-spec` Phase 1 will pass on prerequisite existence (`expected_results/results_report.md` is non-empty). Phase 3 has a **hard 75% coverage gate** on rows with real input fixtures + real expected results.
**Current coverage state** (re-computed 2026-05-19 after fixture restoration):
- Rows bound to real local fixtures: L1, D2, D6, T3 (~4 rows) — these are also the rows whose fixtures were restored on 2026-05-19 from sibling repos.
- Rows bound to **starter-only** fixtures (insufficient on their own): D3, D4, D5 (semantic PNGs), M1M4 (movement videos without CSV).
- Rows still deferred for fixture acquisition: see `fixtures/README.md → "Gaps still pending fixture acquisition"` and `services.md` for the authoritative list.
**Project policy on the Phase 3 gate**: rather than block `/test-spec` at the 75% gate, the autodev flow registers each deferred row with a structured `<DEFERRED: needs <shape>; blocks AC <id>>` tag in `expected_results/results_report.md`. Test-spec authoring proceeds; deferred rows become release-gate items, not development-gate items. The acceptance_criteria.md project-level gate ("MUST pass before product implementation begins") still applies for the hardware/replay benchmark — that remains a hard release blocker, not deferred.
@@ -0,0 +1,153 @@
# Expected Results
Maps every quantifiable acceptance criterion from `_docs/00_problem/acceptance_criteria.md` to an input fixture + a measurable expected result. Consumed by `/test-spec` Phase 1.
Per `.cursor/rules/artifact-srp.mdc`, this file uses **role / observable-behaviour language**, not internal component slugs. The system's externally observable behaviour is what's tested. Implementation names (component slugs, libraries, model names) live in `_docs/02_document/`.
**Fixture sourcing**: all fixtures live in `fixtures/` (sibling-repo `../` paths are forbidden). Where no fixture exists yet, the `Input` cell carries a structured `<DEFERRED: <shape>; ref services.md §N>` tag. Phase 3 has a hard 75% coverage gate — the autodev flow registers deferred rows as release-gate items rather than blocking on the gate; see `data_parameters.md → "Gaps that block /test-spec downstream"`.
**Comparison vocabulary**: see `.cursor/skills/test-spec/templates/expected-results.md` for canonical methods (`exact`, `numeric_tolerance`, `threshold_min`, `threshold_max`, `range`, `regex`, `substring`, `set_contains`, `json_diff`, `file_reference`).
**Deferred-tag legend**: `<DEFERRED: <shape>; ref <pointer>>` where `<pointer>` is a section in `../services.md` (per-service mock requirements), an open architecture question (e.g. `Q9`), or `inline-authorable` (no external dependency — just not yet written).
---
## Latency
Source ACs: `acceptance_criteria.md → Latency`.
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|---|---|---|---|---|---|
| L1 | `fixtures/images/4d6e1830d211ad50.jpg` | Single 1280 px aerial frame consumed through the Tier-1 contract; measure end-to-end | per-frame end-to-end latency | threshold_max | ≤ 100 ms | N/A |
| L2 | derived ROI ~640×640 from `fixtures/images/4d6e1830d211ad50.jpg` (inline-cropped by the test runner) | Tier-2 semantic confirmation over a single ROI | per-ROI latency | threshold_max | ≤ 200 ms | N/A |
| L3 | `<DEFERRED: bounded ROI crop matching the deep-analysis input contract; ref services.md §7>` | Tier-3 deep-analysis (when enabled) local-IPC call | per-ROI call latency | threshold_max | ≤ 5000 ms | N/A |
| L4 | `<DEFERRED: SITL or hardware-in-loop ViewPro A40 zoom command (medium→high); ref services.md §5>` | A40 physical zoom transition | wall-clock transition duration | threshold_max | ≤ 2000 ms | N/A |
| L5 | `<DEFERRED: scripted scan decision event followed by camera physical motion; ref services.md §3, §5>` | Decision-to-movement latency end-to-end | wall-clock decision→motion duration | threshold_max | ≤ 500 ms | N/A |
| L6 | `fixtures/movement/video01.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv; ref services.md §6>` | Movement candidate enqueue at the wide-area sweep | detection→enqueue duration | threshold_max | ≤ 1000 ms | N/A |
| L7 | `fixtures/movement/video02.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv at zoomed-in band; ref services.md §6>` | Movement candidate enqueue during zoomed inspection | detection→enqueue duration | threshold_max | ≤ 1500 ms | N/A |
| L8 | `<DEFERRED: full sweep → zoomed-inspection transition (POI detected → ROI fully zoomed); ref services.md §3, §5>` | Scan-mode transition including physical zoom | wall-clock transition | threshold_max | ≤ 2000 ms | N/A |
| L9 | `<DEFERRED: scripted operator-click → outbound command emitted by the system (modem RTT excluded); ref services.md §3>` | Operator command → action latency | wall-clock click→outbound | threshold_max | ≤ 500 ms | N/A |
## Throughput / Rate
Source ACs: `acceptance_criteria.md → Throughput / Rate`.
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|---|---|---|---|---|---|
| T1 | `<DEFERRED: long synthetic POI feed sustained above the cap (e.g. 20 POIs/min); inline-authorable>` | Cap enforcement on POIs surfaced to operator | POI rate surfaced | threshold_max | ≤ 5 / min | N/A |
| T2 | `<DEFERRED: airframe MAVLink telemetry replay over a 60 s window; ref services.md §4>` | Position telemetry consumed from the airframe link | reported position rate | range | 1 Hz ≤ rate ≤ 10 Hz (10 Hz target) | N/A |
| T3 | `fixtures/videos/94d42580bd1ad6ff.mp4` replayed with throttled-decode + frame-drop injection to drop below 10 fps for ≥5 s | Frame-rate floor trigger | zoom-in transitions suppressed AND overall health surfaces yellow | exact (suppression bool) + exact (health = yellow) | N/A | N/A |
## Detection Quality
Source ACs: `acceptance_criteria.md → Detection Quality`. Evaluation runs against the Tier-1 detection pipeline that the system consumes; autopilot's role is correct consumption + re-emission of the normalised-box contract. Class catalogue (5 new Tier-1 classes + 3 Tier-2 attributes) is defined in `../data_parameters.md → "Class catalogue"`.
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|---|---|---|---|---|---|
| D1 | `<DEFERRED: new-class eval set across all four seasons (black entrances, branch piles, footpaths, roads, trees, tree blocks); ref services.md §1, annotation campaign in ../ai-training>` | Per-class precision/recall for added classes | per-class precision ≥ 0.80 AND recall ≥ 0.80 | threshold_min (both) | N/A | `<DEFERRED: expected_results/new_classes_pr.json>` |
| D2 | `fixtures/images/{4d6e1830d211ad50,54f6459dbddb93d8,6dd601b7d2dc1b30,805bcf1e9f271a58,f997d0934726b555}.jpg` (5 frames) | Existing-class regression — must not degrade vs documented baseline P=0.816, R=0.852 | per-class precision + recall delta vs baseline | numeric_tolerance | ± 0.02 absolute | `<DEFERRED: expected_results/existing_classes_baseline.json — to be recorded against the pinned ../detections model>` |
| D3 | `fixtures/semantic/semantic0[1-4].png` (4 starter frames — 1 winter, 3 unmarked season) + `<DEFERRED: full multi-season annotated concealed-position set; ref services.md §5>` | Concealed-position recall (initial gate, accepting high FP) | recall | threshold_min | ≥ 0.60 | `<DEFERRED: expected_results/concealed_positions.json>` |
| D4 | Same as D3 | Concealed-position precision (operators filter) | precision | threshold_min | ≥ 0.20 | same as D3 |
| D5 | `fixtures/semantic/semantic0[1-4].png` (all 4 feature footpaths leading to concealment — starter set) + `<DEFERRED: footpath sequences (fresh + stale, all four seasons), polyline-annotated; ref services.md §5>` | Footpath recall | recall | threshold_min | ≥ 0.70 | `<DEFERRED: expected_results/footpaths.json>` |
| D6 | `fixtures/images/4d6e1830d211ad50.jpg` | Single-frame Tier-1 contract — system must consume the bbox stream and re-emit normalised-box format | output box stream conforms to the suite-level class catalogue (ids 0..18) + normalised coordinates ∈ [0,1] | schema_match + range | each coord ∈ [0,1] | `fixtures/schemas/expected_detections.schema.json` |
## Movement Detection Behaviour
Source ACs: `acceptance_criteria.md → Movement Detection`. Latency aspects (L6, L7) live under Latency.
**Note**: M1M4 each have a visual-reference video on disk but NO paired `gimbal.csv` / `telemetry.csv`. Ego-motion compensation cannot be verified against these videos — the visual binding is provided so a smoke harness can run, but the assertions in this section require the deferred CSVs to be meaningful. User confirmed 2026-05-19: paired CSVs do not exist today.
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|---|---|---|---|---|---|
| M1 | `fixtures/movement/video01.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv; scene must contain 1 stable tree row + 1 moving vehicle; ref services.md §6>` | Ego-motion compensation — stable objects rejected | system emits exactly 1 movement candidate (the vehicle); does NOT emit a candidate for the tree row | set_contains | candidate set == {vehicle}; ∉ tree row | N/A |
| M2 | `fixtures/movement/video02.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv at zoomed-in band; 1 small mover; ref services.md §6>` | Movement detection continues during zoomed-in hold | system enqueues 1 candidate while the camera is in the zoomed-in hold; current ROI is not preempted unless the candidate's priority exceeds it | exact | 1 candidate enqueued; ROI preempt decision matches priority rule | N/A |
| M3 | `fixtures/movement/video03.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv simulating per-zoom-band threshold edge (cluster persistence one frame below threshold); ref services.md §6>` | Per-zoom-band threshold honoured (no false candidate) | no candidate emitted | exact | count == 0 | N/A |
| M4 | `fixtures/movement/video04.mp4` (visual reference) + `<DEFERRED: zoom-out + zoomed-in benchmark suite measuring false-positive rate at each band; ref services.md §6, Q14>` | Movement zoomed-in benchmark gate (Q14 fallback trigger) | false-positive rate per zoom band | threshold_max | ≤ per-zoom-band budget (configurable; default ≤ 0.5 / minute at zoomed-in) | `<DEFERRED: expected_results/movement_benchmark_caps.json>` |
## Scan & Camera Control Behaviour
Source ACs: `acceptance_criteria.md → Scan and Camera Control`.
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|---|---|---|---|---|---|
| S1 | `<DEFERRED: scripted mission with planned route + simulated POI detected mid-sweep; ref services.md §3, §4>` | Sweep → zoomed-inspection transition within 2 s (L8) AND POI properly enqueued | transition completes; ROI matches POI bbox; queue length increments | exact (multiple) | N/A | N/A |
| S2 | `<DEFERRED: zoomed-inspection hold scenario with footpath polyline overlapping the ROI; ref services.md §5, §6>` | Camera lock + pan along footpath while airframe flies | camera commands keep the footpath in the centre 50% of frame for the duration of the hold | numeric_tolerance | centre offset ≤ 25% per frame | N/A |
| S3 | `<DEFERRED: operator-confirmed target + 60 s follow window; ref services.md §3>` | Target-follow centre-window | target inside centre 25% of frame while visible | threshold_max | per-frame |dx,dy| ≤ 0.125 × frame_size | N/A |
| S4 | `<DEFERRED: queue with 3 POIs at varied confidence × proximity scores; inline-authorable>` | POI queue ordering | system pops POIs in order of `confidence × proximity × age_factor` (relative order matches) | exact (order) | N/A | N/A |
| S5 | `<DEFERRED: hold endpoint with deep-analysis enabled — assessment returns within 2 s; ref services.md §7>` | Zoomed-in hold timeout default 5 s/POI; deep-analysis hold capped at 2 s | hold ends at min(5 s, deep_analysis_complete) | exact | N/A | N/A |
## Operator Workflow
Source ACs: `acceptance_criteria.md → Operator Workflow`.
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|---|---|---|---|---|---|
| O1 | `<DEFERRED: synthetic POI at confidence = 0.40; inline-authorable>` | Confidence-scaled decision window lower bound | window duration | exact | 30 s | N/A |
| O2 | `<DEFERRED: synthetic POI at confidence = 1.00; inline-authorable>` | Confidence-scaled decision window upper bound | window duration | exact | 120 s | N/A |
| O3 | `<DEFERRED: synthetic POI at confidence = 0.70; inline-authorable>` | Linear interpolation (40% → 30 s, 100% → 120 s) | window duration ≈ 30 + (0.70-0.40)/(1.00-0.40) × (120-30) = 75 s | numeric_tolerance | ± 0.5 s | N/A |
| O4 | `<DEFERRED: synthetic POI at confidence = 0.39; inline-authorable>` | Below-threshold suppression | POI NOT surfaced to operator | exact | count surfaced == 0 | N/A |
| O5 | `<DEFERRED: surfaced POI followed by operator decline event; inline-authorable>` | Decline → ignored-item entry persisted | ignored-item appended with `(MGRS, class_group)` matching the declined POI | exact (count delta +1) + schema_match | N/A | N/A |
| O6 | `<DEFERRED: new detection whose (MGRS, class_group) matches an existing ignored-item; inline-authorable>` | Ignored-item suppression | POI NOT surfaced | exact | count surfaced == 0 | N/A |
| O7 | `<DEFERRED: surfaced POI + no operator response, > decision-window; inline-authorable>` | Timeout = forget (NOT blacklisted) | POI removed from queue; no ignored-item written | exact (queue 1) + exact (ignored-item count unchanged) | N/A | N/A |
| O8 | `<DEFERRED: operator confirm command — valid + signed + within sequence; ref services.md §3, §8 (Q9)>` | Confirm → middle waypoint inserted; mode transitions to target-follow | mission update POSTed; scan-mode reports target-follow | exact (HTTP 200) + exact (mode) | N/A | N/A |
| O9 | `<DEFERRED: replayed operator command — same envelope a second time; ref services.md §8 (blocked on Q9)>` | Replay protection | command rejected; security WARN logged; no state change | exact (state unchanged) + substring (log contains "replay") | N/A | N/A |
| O10 | `<DEFERRED: malformed / unsigned operator command; ref services.md §8 (blocked on Q9)>` | Signature validation | command rejected; security WARN logged | exact (state unchanged) + substring (log contains "invalid") | N/A | N/A |
## Reliability & Safety
Source ACs: `acceptance_criteria.md → Reliability & Safety` + lost-link failsafe ladder.
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|---|---|---|---|---|---|
| R1 | `<DEFERRED: BIT scenario — every dependency healthy; inline-authorable>` | Pre-flight self-test passes | health endpoint returns all green; takeoff permitted | exact (state) + exact (health.all == "green") | N/A | N/A |
| R2 | `<DEFERRED: BIT scenario — Tier-1 detection unreachable; inline-authorable>` | BIT fails the takeoff gate | takeoff NOT permitted; detection dependency reports red | exact (takeoff inhibited) | N/A | N/A |
| R3 | `<DEFERRED: BIT scenario — persistent-store ≥95% full; inline-authorable>` | Storage floor BIT failure | takeoff NOT permitted; storage dependency reports red | exact (takeoff inhibited) | N/A | N/A |
| R4 | `<DEFERRED: in-flight operator/Ground-Station modem-link loss + 30 s elapsed; ref services.md §3, §4>` | Lost-link failsafe ladder (default 30 s grace → RTL) | system issues RTL at exactly 30 s; operator-link dependency reports red | exact (RTL command at 30s ± 1s) | ± 1 s | N/A |
| R5 | `<DEFERRED: mid-flight battery sample at RTL-floor (e.g. 25%); ref services.md §4>` | RTL trigger | system issues RTL; health → yellow | exact (RTL command) + exact (health == yellow) | N/A | N/A |
| R6 | `<DEFERRED: mid-flight battery sample at hard-floor (e.g. 15%); ref services.md §4>` | Land-now trigger (only operator-overridable) | system issues land-now | exact (land_now command) | N/A | N/A |
| R7 | `<DEFERRED: airframe link command + simulated bounded retry/backoff with peer not responding through max-retries; ref services.md §4>` | Watchdog flips health red on exhaustion | airframe-link dependency reports red after configured max-retry | exact (health == red) | N/A | N/A |
| R8 | `<DEFERRED: wall-clock drift > 200 ms simulation (GPS lock present, NTP disabled); ref services.md §9>` | Drift alarm | time-source dependency reports yellow; `clock_source` + `last_sync_at` reflect the drift | exact (health == yellow) | N/A | N/A |
| R9 | `<DEFERRED: geofence EXCLUSION polygon crossed by simulated waypoint; ref services.md §4>` | Symmetric geofence enforcement | waypoint refused; RTL triggered | exact (waypoint rejected) + exact (RTL) | N/A | N/A |
## Resources & Data
Source ACs: `acceptance_criteria.md → Resources & Data`.
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|---|---|---|---|---|---|
| Re1 | `<DEFERRED: long-running scenario — system's full onboard workload active for 5 min, monitored via process RSS; inline-authorable harness>` | Onboard memory budget (everything autopilot owns, excluding Tier 1) | combined RSS on the deployed compute device | threshold_max | ≤ 6 GB | N/A |
| Re2 | Same as Re1 with concurrent Tier-1 traffic | Tier-1 non-degradation | Tier-1 ms/frame delta vs baseline (L1) | numeric_tolerance | ± 5 ms | N/A |
## Map Reconciliation
Source ACs: `acceptance_criteria.md → Map Reconciliation`.
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|---|---|---|---|---|---|
| Mp1 | `<DEFERRED: mock central area-map service — 30 km × 30 km region, ~10000 map objects; ref services.md §2>` | Pre-flight pull | wall-clock GET → local copy hydrated | threshold_max | ≤ 30 s | N/A |
| Mp2 | `<DEFERRED: same mock but unreachable (timeout); ref services.md §2>` | Cache-fallback path | system falls back to last-known cached state; reports `map_sync == "cached_fallback"`; operator MUST acknowledge before takeoff | exact (state) + exact (BIT requires explicit ack) | N/A | N/A |
| Mp3 | `<DEFERRED: simulated 60-minute mission pass diff (~5000 NEW + ~2000 MOVED + ~500 REMOVED + ~10000 CONFIRMED-EXISTING); ref services.md §2>` | Post-flight push | wall-clock POST → 200 OK | threshold_max | ≤ 120 s | N/A |
| Mp4 | `<DEFERRED: same as Mp3 but POST returns 5xx; ref services.md §2>` | Persist-on-disk + bounded retry | pending diff written to on-device storage; operator-visible warning surfaced; retry attempts logged | exact (file exists) + exact (warning surfaced) + threshold_max (retries ≤ configured cap) | N/A | N/A |
| Mp5 | `<DEFERRED: two map updates with conflicting state for same (spatial-cell, class_group) — append-only log scenario; ref services.md §2, Q8>` | Conflict-resolution rule (Q8 placeholder) | append-only observation log + computed current view; conflict resolution per documented rule | json_diff | N/A | `<DEFERRED: expected_results/mapobjects_conflict_resolution.json — pending Q8>` |
---
## Coverage Status (auto-recomputed 2026-05-19)
- **Total rows**: 56 (L1L9, T1T3, D1D6, M1M4, S1S5, O1O10, R1R9, Re1Re2, Mp1Mp5).
- **Fully bound to real fixtures**: L1, T3, D2, D6 = **4 rows (~7%)**.
- **Bound to derived inline fixture** (no external acquisition needed): L2 = **+1 row (5 total, ~9%)**.
- **Bound to starter/partial fixtures** (visual reference only — assertions need additional deferred inputs to be meaningful): D3, D4, D5, M1, M2, M3, M4 = **+7 rows (12 total partial, ~21%)**.
- **Inline-authorable but not yet authored** (no external dependency — can be unblocked anytime by writing the fixture): T1, S4, O1O7, R1R3, R8, Re1, Re2 = **15 rows (~27%)**. Lifting these alone would bring effective coverage to ~48%.
- **Blocked on external acquisition** (real recordings, SITL, annotated eval sets, mock services): L3L9 (minus L6/L7 partial), T2, D1, M1M4 (CSV pairs), S1, S2, S3, S5, R4R7, R9, Mp1Mp5 = **~24 rows (~43%)**.
- **Blocked on architecture questions**: O8 (depends on Q9 partially), O9, O10 (Q9), M4 (Q14), Mp5 (Q8) = **4 rows**.
**Decision (project policy)**: rather than block on the Phase 3 75% gate, each deferred row is now registered with a structured `<DEFERRED:>` tag and surfaces in `data_parameters.md → "Gaps that block /test-spec downstream"`. `/test-spec` Phase 2 can author scenarios for all 56 rows; deferred rows become **release-gate items**, not development-gate items. The `acceptance_criteria.md → "Acceptance Gates (project-level)"` hardware/replay benchmark requirement is preserved as the hard release gate — that one is NOT being deferred.
## Notes on this spec
- Every row carries a quantifiable comparison + tolerance — no row is "should work".
- Where the AC depends on hardware (the deployed compute device, ViewPro A40), the test must run on representative hardware OR a benchmarked replay; pure-emulator runs are NOT acceptable for L1L9, T1T3, Re1Re2.
- Where the AC depends on an external service (`../detections`, `missions`, Ground Station), the test runs against either (a) the real service in the suite e2e (`../e2e/docker-compose.suite-e2e.yml`), or (b) a recorded replay fixture for isolation tests. Both modes are valid; the test scenario states which.
- Q-tagged rows (M4 → Q14, Mp5 → Q8, O8O10 → Q9) depend on open architecture questions. Their tolerance ranges may sharpen once those questions resolve; the existence of each row is non-negotiable.
- M1M4 visual-reference bindings (`fixtures/movement/video0[1-4].mp4`) are usable for harness smoke testing but DO NOT satisfy the assertion semantics — paired `gimbal.csv` + `telemetry.csv` are required for ego-motion compensation to be verifiable. This is the single highest-priority fixture gap.
@@ -0,0 +1,90 @@
# Fixture manifest
All fixtures live **inside this workspace** so the autopilot repo is self-sufficient — downstream test runners must never reach into a sibling repo at `../`. When you add or refresh a fixture, update the matching SHA-256 in this manifest AND the rows in `../expected_results/results_report.md` that consume it.
Total on-disk size: ~57 MB.
## Files
### Still-image aerial frames — `images/`
Used as Tier-1 input frames for detection-quality assertions.
| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
|---|---|---|---|---|
| `images/4d6e1830d211ad50.jpg` | 152 KB | `4c396495af64aaf9aac5ecb92431bf0c75db42b0bdb8e4eec1937f9995acee42` | `../detections/data/images/` (re-copied 2026-05-19) | L1, D6 |
| `images/54f6459dbddb93d8.jpg` | 6.7 MB | `cd65c76a080ef72ce3528031f003f067fca6091c067a86d527a1ae91cd78be59` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
| `images/6dd601b7d2dc1b30.jpg` | 1.4 MB | `45edd83a357a9f852e14e5845265cd09c20b4b99b1828c160cb3298f0e160181` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
| `images/805bcf1e9f271a58.jpg` | 176 KB | `fe696899225fc04f2335e87acf6a3ad8a00cd3950c5940d5e73e5ce438f36257` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
| `images/f997d0934726b555.jpg` | 232 KB | `5d1c9c551c0680e5b3d0aab261bca71e724c78f6db3580da598c680b4f7d4d79` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
### Reconnaissance video — `videos/`
| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
|---|---|---|---|---|
| `videos/94d42580bd1ad6ff.mp4` | 12 MB | `602b22a42515a754313551847caa6d6a6d7b3cde1d857cbd08ebc5543fb8cf7c` | `../detections/data/videos/` (re-copied 2026-05-19) | T3 (frame-rate floor scenario) |
### Movement-detection clips — `movement/`
Wide-area reconnaissance clips intended for movement-detection visual baselines. **Important**: these clips DO NOT have paired `gimbal.csv` / `telemetry.csv` files — ego-motion compensation assertions (M1M4) cannot run against them. They are useful for visual harness work, frame-count assertions, and as visual reference for the movement-detection scenarios.
| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
|---|---|---|---|---|
| `movement/video01.mp4` | 5.3 MB | `6f37186f5e9be97109db8d0d220df96d21cac9ce5b50b576234c6f7ee369d2bb` | local; provenance pre-existing in workspace | M1 (visual reference only — no telemetry) |
| `movement/video02.mp4` | 5.9 MB | `7de7981e511e21e1e72f506d44541b44a4c27a995c9505ef8e3b48e69b416367` | local; provenance pre-existing in workspace | M2 (visual reference only — no telemetry) |
| `movement/video03.mp4` | 6.1 MB | `df441164da7f37d715968212b95e9bf53c8e37384f20ddfab61cd6d0d18b4f3a` | local; provenance pre-existing in workspace | M3 (visual reference only — no telemetry) |
| `movement/video04.mp4` | 5.8 MB | `36445bf1c86c5afa524000b5b2da7fc9cb3d39c745f9ad830b3d60f6868948e7` | local; provenance pre-existing in workspace | M4 (visual reference only — no telemetry) |
### Semantic reference frames — `semantic/`
Annotated reference examples for concealed-position semantic targets. **Not a graded eval set** — these are 4 hand-picked examples of footpath-to-concealment patterns, intended as visual reference for what the system should recognise. Detection-quality gates (D1, D3, D4, D5) need a full annotated multi-season eval set; these 4 PNGs are insufficient for those gates and serve as starter reference only.
| File | Size | SHA-256 | Description | `results_report.md` rows |
|---|---|---|---|---|
| `semantic/semantic01.png` | 3.1 MB | `339ad4d35ab36052828f05652ab7249801bcd5d7bb04522f0ab9cbf6f0ca008a` | Footpath leading to branch-pile hideout in winter forest | D3, D4, D5 (starter only — full multi-season set still required) |
| `semantic/semantic02.png` | 5.1 MB | `ffe3c49f5f1833724ce46083d212e714422e664b635cdd48b63311adefcd7b1f` | Footpath to FPV launch clearing, branch mass at forest edge | D3, D4, D5 (starter only) |
| `semantic/semantic03.png` | 1.0 MB | `ce89c139815e9a80679237008f7cfc3039bbd53f162d48017e840ff91e57b109` | Footpath to squared hideout structure | D3, D4, D5 (starter only) |
| `semantic/semantic04.png` | 1.3 MB | `b25c689b7aa543ec15858e4b5edfa32387ced4930130eb280d952c555f104e69` | Footpath terminating at tree-branch concealment | D3, D4, D5 (starter only) |
| `semantic/data_parameters.md` | 2 KB | n/a (text) | Description of the four reference examples + the new YOLO primitive classes that motivate them | reference only |
### Detection contract schemas — `schemas/`
| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
|---|---|---|---|---|
| `schemas/expected_detections.json` | 1.4 KB | `ce60c105d697efe0359d2e6b1b46fc63e53d3789b067d53501f9c76aad9bd1ae` | `../e2e/fixtures/` (re-copied 2026-05-19) | D6 (sample Tier-1 response) |
| `schemas/expected_detections.schema.json` | 2.4 KB | `a7174e0b083dcbf42fa8672acd3e1807d11ea0629cc636ff958a4d77168733b9` | `../e2e/fixtures/` (re-copied 2026-05-19) | D6 (JSON-schema for the Tier-1 contract) |
### Database init script — `sql/`
| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
|---|---|---|---|---|
| `sql/init.sql` | 3.7 KB | `b61e452c549f7b006db88d265f4346837e0a33d1abd4d977ebf3d48d8c943439` | `../e2e/fixtures/` (re-copied 2026-05-19) | suite-only reference; no autopilot AC row asserts against this |
## Copy vs reference
Fixtures were COPIED (not moved). The sibling repos still own the originals — keeping autopilot's copy in sync when an upstream changes is a manual chore today (the `monorepo-e2e` skill at the suite root will eventually own this drift; see `_docs/_process_leftovers/` if a sync is pending).
When an upstream fixture changes:
1. Recompute the SHA-256 in the source repo.
2. Re-copy into the matching `fixtures/` subdirectory here.
3. Update this manifest's SHA-256 column.
4. If the change invalidates an assertion in `../expected_results/results_report.md`, fix the row's expected result too — do not let assertions drift silently against new data.
## Gaps still pending fixture acquisition
The authoritative per-service acquisition catalogue lives in `../services.md`. Summary of the still-open gaps (each is also tagged on its row in `../expected_results/results_report.md` with a structured `<DEFERRED: ...>` marker, and a `_docs/_process_leftovers/` entry records the replay obligation):
| Gap | What's missing | Blocks AC rows | Acquisition status |
|---|---|---|---|
| Paired gimbal+telemetry CSVs for the 4 movement clips | `gimbal.csv` + `telemetry.csv` aligned to each video frame timestamps | M1M4, tightens L6/L7 | **Confirmed unavailable today** (user 2026-05-19) — requires re-flight or new recording with gimbal-feedback channel captured |
| Annotated eval set across all four seasons | Hundredsthousands of labelled images per season for concealed-position + footpath gates | D1, D3, D4, D5 | needs annotation campaign (1.5 months at 5 hrs/day target per `semantic/data_parameters.md`) |
| Per-zoom-band frame sequences | Same kind of clip as `movement/` but recorded at light, medium, and high zoom bands | tightens M2, L7, S2 | needs flight time + zoom-band metadata in the recorder |
| Mock `missions` HTTPS exchanges | Recorded JSON request/response pairs for mission GET/POST + mapobjects GET/POST | Mp1Mp5 | inline-authorable against the `mission-schema`; not yet authored |
| Mock Ground Station session traces | Scripted timing trace (connect / push / drop / reconnect / lost-link) | R4, O8 | inline-authorable; not yet authored |
| ArduPilot SITL traces | Recorded MAVLink streams for waypoint upload, geofence INCLUSION + EXCLUSION, RTL on lost-link, RTL on battery floor | R4, R5, R6, R7, R9 + project SITL conformance gate | needs SITL run |
| Operator-command envelopes | Valid / expired / replayed / malformed envelopes under the chosen Q9 auth scheme | O9, O10 | **blocked on Q9** (`_docs/02_document/architecture.md §8`) |
| VLM I/O pairs | Bounded ROI in → structured `VlmAssessment` out + schema-violation cases | L3, S5 | inline-authorable against the assessment schema once the local model is pinned |
| GPS / NTP drift scenarios | Scripted offset / lock-loss traces | R8 | inline-authorable |
When a fixture from this list lands, copy it under `fixtures/<category>/`, add a row to the relevant subsection above, and bind the matching `<DEFERRED>` row in `../expected_results/results_report.md` to its new local path.
Binary file not shown.

After

Width:  |  Height:  |  Size: 149 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.7 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 173 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 231 KiB

@@ -0,0 +1,32 @@
{
"$schema": "./expected_detections.schema.json",
"_meta": {
"fixture_version": "0.1.0-placeholder",
"video": "sample.mp4",
"video_sha256": "TBD-after-fixture-recording",
"model": {
"_comment": "Pinned model + classes that detections must run when this baseline applies. Refresh this block (and counts/bboxes below) whenever detections ships a new model.",
"name": "TBD",
"revision": "TBD",
"classes_source": "annotations/src/Database/DatabaseMigrator.cs (ids 0..18)"
},
"tolerance": {
"_comment": "Spec asserts ranges, not exact values. INT8 calibration drift can move pixel positions by a few units; absolute count can drift by ±1 across re-runs of the same engine on the same Jetson.",
"count_delta": 1,
"bbox_iou_min": 0.8,
"confidence_delta": 0.1
}
},
"expected": {
"total_annotations": 0,
"by_class": [
{
"class_id": 0,
"class_name": "ArmorVehicle",
"count": 0,
"bbox_samples": []
}
],
"_placeholder_note": "Replace this block with the real baseline once sample.mp4 is recorded. Each entry under `by_class` carries: class_id, class_name (must match detection_classes.name), count, and bbox_samples (an array of {time_sec, center_x, center_y, width, height, confidence} entries the spec uses for IoU comparison)."
}
}
@@ -0,0 +1,66 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Suite e2e expected detections baseline",
"type": "object",
"required": ["_meta", "expected"],
"properties": {
"$schema": { "type": "string" },
"_meta": {
"type": "object",
"required": ["fixture_version", "video", "video_sha256", "model", "tolerance"],
"properties": {
"fixture_version": { "type": "string" },
"video": { "type": "string" },
"video_sha256": { "type": "string" },
"model": {
"type": "object",
"required": ["name", "revision", "classes_source"],
"additionalProperties": true
},
"tolerance": {
"type": "object",
"required": ["count_delta", "bbox_iou_min", "confidence_delta"],
"properties": {
"count_delta": { "type": "integer", "minimum": 0 },
"bbox_iou_min": { "type": "number", "minimum": 0, "maximum": 1 },
"confidence_delta": { "type": "number", "minimum": 0, "maximum": 1 }
}
}
}
},
"expected": {
"type": "object",
"required": ["total_annotations", "by_class"],
"properties": {
"total_annotations": { "type": "integer", "minimum": 0 },
"by_class": {
"type": "array",
"items": {
"type": "object",
"required": ["class_id", "class_name", "count"],
"properties": {
"class_id": { "type": "integer", "minimum": 0 },
"class_name": { "type": "string" },
"count": { "type": "integer", "minimum": 0 },
"bbox_samples": {
"type": "array",
"items": {
"type": "object",
"required": ["time_sec", "center_x", "center_y", "width", "height"],
"properties": {
"time_sec": { "type": "number", "minimum": 0 },
"center_x": { "type": "number" },
"center_y": { "type": "number" },
"width": { "type": "number", "minimum": 0 },
"height": { "type": "number", "minimum": 0 },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 }
}
}
}
}
}
}
}
}
}
}
@@ -0,0 +1,45 @@
# Semantic And Movement Detection Training Data
# Source
- Aerial imagery from reconnaissance winged UAVs at 6001000m altitude
- ViewPro A40 camera, 1080p resolution, various zoom levels
- Extracted from video frames and still images
- Movement detection requires frame sequences, not still images only; include camera/gimbal telemetry where available to separate target motion from UAV motion.
# Target Classes
- Footpaths / trails (linear features on snow, mud, forest floor)
- Fresh footpaths (distinct edges, undisturbed surroundings, recent track marks)
- Stale footpaths (partially covered by snow/vegetation, faded edges)
- Concealed structures: branch pile hideouts, dugout entrances, squared/circular openings
- Tree rows (potential concealment lines)
- Open clearings connected to paths (FPV launch points)
- Moving point/cluster candidates at wide or light/medium zoom
# YOLO Primitive Classes (new)
- Black entrances to hideouts (various sizes)
- Piles of tree branches
- Footpaths
- Roads
- Trees, tree blocks
# Annotation Format
- Managed by existing annotation tooling in separate repository
- Expected: bounding boxes and/or segmentation masks depending on model architecture
- Footpaths may require polyline or segmentation annotation rather than bounding boxes
# Seasonal Coverage Required
- Winter: snow-covered terrain (footpaths as dark lines on white)
- Spring: mud season (footpaths as compressed/disturbed soil)
- Summer: full vegetation (paths through grass/undergrowth)
- Autumn: mixed leaf cover, partial snow
# Volume
- Target: hundreds to thousands of annotated images/sequences
- Available effort: 1.5 months, 5 hours/day
- Potential for annotation process automation
# Reference Examples
- semantic01.png — footpath leading to branch-pile hideout in winter forest
- semantic02.png — footpath to FPV launch clearing, branch mass at forest edge
- semantic03.png — footpath to squared hideout structure
- semantic04.png — footpath terminating at tree-branch concealment
Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.3 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 MiB

@@ -0,0 +1,104 @@
-- Suite e2e database seed.
--
-- Loaded by the `db-seed` service in docker-compose.suite-e2e.yml after
-- annotations has run its own DatabaseMigrator (which creates the schema +
-- inserts the canonical detection_classes 0..18). This file therefore only
-- adds rows that the e2e scenario depends on but the production runtime does
-- NOT seed automatically.
--
-- Idempotency: every statement uses ON CONFLICT / IF NOT EXISTS so re-running
-- the seed (e.g. on a `down -v` followed by `up`) lands the same final state.
--
-- Schema reference: annotations/src/Database/DatabaseMigrator.cs.
\set ON_ERROR_STOP on
-- Wait until annotations has populated its schema. The db-seed container starts
-- only after postgres-local is healthy, but annotations may still be spinning
-- up its tables. A bounded poll keeps the seed deterministic.
DO $$
DECLARE
attempt int := 0;
BEGIN
WHILE attempt < 60 LOOP
PERFORM 1
FROM information_schema.tables
WHERE table_schema = 'public' AND table_name = 'detection_classes';
IF FOUND THEN
EXIT;
END IF;
PERFORM pg_sleep(1);
attempt := attempt + 1;
END LOOP;
IF attempt >= 60 THEN
RAISE EXCEPTION 'detection_classes table not found after 60s — annotations migration did not complete';
END IF;
END $$;
-- Default system_settings row. Annotations starts without one, but several
-- spec assertions rely on `silent_detection = false` and known thumbnail dims
-- so overlay rendering is reproducible.
INSERT INTO system_settings (
id, name, military_unit,
default_camera_width, default_camera_fov,
thumbnail_width, thumbnail_height, thumbnail_border,
generate_annotated_image, silent_detection
) VALUES (
'00000000-0000-0000-0000-00000000aaaa',
'azaion-suite-e2e',
'e2e-unit',
3840, 70,
240, 135, 10,
true, false
) ON CONFLICT (id) DO NOTHING;
-- Default directory_settings row. Annotations writes media files under the
-- paths defined here; the e2e-runner doesn't read these directly but the
-- service requires the row to exist on first hit.
INSERT INTO directory_settings (
id, videos_dir, images_dir, labels_dir, results_dir,
thumbnails_dir, gps_sat_dir, gps_route_dir
) VALUES (
'00000000-0000-0000-0000-00000000bbbb',
'/data/videos', '/data/images', '/data/labels', '/data/results',
'/data/thumbnails', '/data/gps_sat', '/data/gps_route'
) ON CONFLICT (id) DO NOTHING;
-- Default camera_settings row used by detections to size bbox-to-meters.
INSERT INTO camera_settings (
id, altitude, focal_length, sensor_width
) VALUES (
'00000000-0000-0000-0000-00000000cccc',
100, 50, 36
) ON CONFLICT (id) DO NOTHING;
-- Stable e2e user. The UUID is referenced by the spec when asserting
-- annotation rows. Annotations does not own a `users` table — user identity
-- is carried in JWTs minted with JWT_SECRET; the user_id here just needs to
-- be deterministic and stable across runs.
-- Stored in user_settings so the spec can `SELECT user_id` to confirm the
-- seed ran.
INSERT INTO user_settings (
id, user_id,
annotations_left_panel_width, annotations_right_panel_width,
dataset_left_panel_width, dataset_right_panel_width
) VALUES (
'00000000-0000-0000-0000-00000000dddd',
'00000000-0000-0000-0000-0000e2e2e2e2',
300, 400, 320, 320
) ON CONFLICT (id) DO NOTHING;
-- Sanity check — fail loudly if the canonical detection_classes are missing.
-- annotations/src/Database/DatabaseMigrator.cs inserts ids 0..18 unconditionally.
DO $$
DECLARE
cnt int;
BEGIN
SELECT COUNT(*) INTO cnt FROM detection_classes WHERE id BETWEEN 0 AND 18;
IF cnt < 19 THEN
RAISE EXCEPTION 'expected canonical detection_classes 0..18 (count=19), got %', cnt;
END IF;
END $$;
\echo 'suite-e2e seed complete'
+113
View File
@@ -0,0 +1,113 @@
# External Services — Test-Mock Requirements
Black-box catalogue of every external system autopilot depends on at runtime, with the **test-fixture / mock shape required for each**. Service-side design (protocols, component contracts, ownership boundaries) lives in `_docs/02_document/architecture.md` — this file owns ONLY the test-data dependency view (per `.cursor/rules/artifact-srp.mdc`, `_docs/00_problem/input_data/` is a test-data concern).
Runtime input shapes (frame rates, message types) are described in `data_parameters.md`. This file extends them with the **acquisition status of the corresponding test fixture**.
## Index
| # | External system | Production role | Test-mock shape needed | Acquisition status |
|---|---|---|---|---|
| 1 | Tier-1 detection (`../detections`) | Primitive YOLO inference on every frame; returns class + bbox + confidence | Recorded bi-stream replay file (`request frame``response detections`) | **MISSING** — no replay recorded yet |
| 2 | Mission planner (`missions` API) | Mission JSON pull at start; middle-waypoint POST on operator-confirm; pre-flight area-map pull + post-flight diff push | Mock HTTPS exchanges for GET/POST + sample mission + sample mapobjects state | **MISSING** — schema known (mission-schema), no fixture recorded |
| 3 | Ground Station (modem) | Continuous push of camera + telemetry + bbox overlay; return path carries operator commands (confirm / decline / target-follow / abort) | Scripted session traces: nominal session, modem drop at T=N, reconnect at T=M, lost-link sustained ≥30 s | **MISSING** — authorable inline (no external dependency) |
| 4 | Airframe autopilot (ArduPilot / PX4) | MAVLink v2 transport for the ~1015 commands in `architecture.md §7.7`; battery + position telemetry; geofence enforcement | ArduPilot SITL traces: waypoint upload, geofence INCLUSION + EXCLUSION, RTL on lost-link, RTL on battery floor | **MISSING** — needs SITL run with scripted scenarios |
| 5 | ViewPro A40 camera (frames) | H.264/265 1080p RTSP video feed at 30/60 fps | Recorded frame sequences (`.mp4`) — wide-zoom, light-zoom, medium-zoom, high-zoom variants | **PARTIAL** — 4 wide-zoom clips on disk (`fixtures/movement/video0[1-4].mp4`); zoom-band variants missing |
| 6 | ViewPro A40 gimbal (control) | Vendor UDP control protocol; yaw / pitch / zoom telemetry per tick | Per-frame-sequence `gimbal.csv` paired with the matching video; per-tick yaw/pitch/zoom + timestamp | **MISSING** — no `gimbal.csv` paired with the 4 movement videos; ego-motion compensation (M1M4) is unfalsifiable without this |
| 7 | Deep-analysis VLM (local IPC) | Optional Tier-3 confirmation over bounded ROI; structured `VlmAssessment` response | Recorded I/O pairs (ROI in → `VlmAssessment` out) + schema-violation cases for fail-closed tests | **MISSING** — depends on the local model choice; can be authored against the assessment schema once the model is pinned |
| 8 | Time source (GPS / NTP) | Wall-clock; drift triggers the R8 health-yellow gate | Scripted drift scenarios (no real GPS/NTP hardware needed) — clock offset, jump, source loss | **MISSING** — authorable inline |
## Per-service detail — what acquisition would look like
The table above is the index; the rows below explain the shape and acquisition path so the gaps can be planned out one at a time.
### 1. Tier-1 detection replay (`../detections`)
- Production transport: bi-directional gRPC. The autopilot streams frames out; `../detections` streams `Detections` messages back.
- Mock shape: a `.replay` file (one per scenario) recording timestamped frames + the exact `Detections` responses the model emitted. Used by `detection_client` integration tests in isolation — no need to boot the real Tier-1 service.
- Acquisition path: record one replay against the currently pinned `../detections` model. Re-record when the upstream model changes (the `monorepo-e2e` skill should eventually own this drift; see the suite's leftovers).
- Blocks AC rows: every row that needs a deterministic detection stream — practically L1, L2, D1, D2, D6 in isolation; in suite-e2e mode these run live against the real `../detections`.
### 2. Mission + MapObjects mock (`missions` API)
- Production transport: HTTPS REST.
- Mock shape: JSON fixtures per endpoint + a small mock HTTP server (or replay-style fixtures consumed by a test double). Endpoints in scope:
- `GET /missions/{id}` — mission JSON, validated against `mission-schema`.
- `POST /missions/{id}` — middle-waypoint insertion (200 OK + updated mission).
- `GET /missions/{id}/mapobjects` — pre-flight area-map pull (response shape: map-object records keyed by spatial cell; volume target ~10000 objects for the 30×30 km gate Mp1).
- `POST /missions/{id}/mapobjects` — post-flight diff push (NEW / MOVED / REMOVED / CONFIRMED-EXISTING; volume target per Mp3 ~17500 records).
- Acquisition path: author JSON fixtures against the known schema; record real exchanges once `missions` is reachable from the test bench.
- Blocks AC rows: Mp1Mp5 (all 5 map-reconciliation rows).
### 3. Ground Station session trace
- Production transport: continuous push over modem (suite-level protocol).
- Mock shape: scripted timing trace per scenario. Each scenario is a list of `(t, event)` pairs: connect, push frame, push telemetry, operator-click, modem-drop, reconnect, lost-link.
- Acquisition path: authorable inline from `architecture.md §7` and `acceptance_criteria.md §Reliability & Safety`. No external dependency — just a fixture generator.
- Blocks AC rows: R4 (lost-link → RTL at 30 s); O8, O9, O10 (operator command lifecycle on the return path, **but** O9/O10 also depend on Q9 for the auth scheme).
### 4. MAVLink SITL trace
- Production transport: MAVLink v2 over UDP or serial.
- Mock shape: ArduPilot SITL recording capturing the autopilot's command stream + the airframe's response stream. One trace per scenario: waypoint upload, geofence INCLUSION violation, geofence EXCLUSION violation, lost-link RTL, battery RTL-floor RTL, battery hard-floor land-now.
- Acquisition path: run ArduPilot SITL with a scripted mission; capture the full MAVLink stream with mavlink-router or equivalent.
- Blocks AC rows: R4 (RTL exact timing), R5, R6, R7, R9; plus the project-level "MAVLink command surface MUST pass SITL conformance" gate.
### 5. Camera frame sequences (ViewPro A40)
- Production transport: RTSP/RTP over TCP/UDP, 1080p H.264/265 at 30/60 fps.
- Current local fixtures: `fixtures/movement/video0[1-4].mp4` (4 clips, ~56 MB each), `fixtures/videos/94d42580bd1ad6ff.mp4` (one reconnaissance clip used for T3 frame-rate floor).
- Mock-shape gap: zoom-band coverage. Each AC scenario that names a zoom level (wide, light, medium, high) needs a representative clip at that zoom band. The 4 movement clips do not enumerate which zoom band each represents — this needs documenting per clip OR re-recording with zoom-band labels.
- Acquisition path: existing clips usable for movement-detection visual baselines; new recordings at each zoom band require flight time.
### 6. Gimbal telemetry CSV (paired with frames)
- Production transport: ViewPro A40 vendor protocol over UDP; per-tick yaw/pitch/zoom updates.
- Mock shape: `gimbal.csv` with columns `(t, yaw_deg, pitch_deg, zoom_band, focal_mm)`, one CSV per video file, timestamps aligned to frame timestamps within ≤ 1 frame.
- Acquisition path: requires re-flying the recording with the gimbal-feedback channel captured alongside. CANNOT be back-fitted to existing videos.
- Blocks AC rows: M1, M2, M3, M4 (movement-detection ego-motion compensation); also tightens L6, L7 (movement candidate enqueue latency).
- **Confirmed not available today (user-stated 2026-05-19).**
### 7. VLM I/O pairs
- Production transport: Unix-domain socket IPC to local-onboard VLM (NanoLLM / VILA1.5-3B per architecture §1).
- Mock shape: paired `(roi.png, prompt.txt, vlm_response.json)` per scenario + a small set of schema-violation cases (truncated JSON, wrong field types, missing required fields) for fail-closed tests.
- Acquisition path: depends on the local VLM model choice. Once pinned, capture real I/O during a flight or scripted run; schema-violation cases authored inline.
- Blocks AC rows: L3 (Tier-3 ≤5 s latency on bounded ROI), S5 (deep-analysis hold-cap interaction).
### 8. Operator-command envelopes
- Production transport: comes back to autopilot via Ground Station modem return path.
- Mock shape: per envelope, a `(scheme, payload, signature, sequence_id)` tuple. One fixture per case: valid, expired, replayed (same envelope sent twice), malformed (signature mismatch), unsigned.
- Acquisition path: **blocked on Q9** (operator-command auth scheme — open in `_docs/02_document/architecture.md §8`). Once the scheme is chosen, envelopes are authorable inline.
- Blocks AC rows: O9 (replay protection), O10 (signature validation); strengthens O8 (confirm pathway).
### 9. GPS / NTP drift scripts
- Production transport: kernel-level wall clock + GPS lock state.
- Mock shape: scripted offset injection — bump the clock by N ms, drop GPS lock, change time source.
- Acquisition path: authorable inline; no external dependency.
- Blocks AC rows: R8.
## Coverage summary by service
| Service | Rows covered (real fixture) | Rows blocked on this service | Acquisition priority |
|---|---|---|---|
| Tier-1 replay | L1, D2, D6 (live; replay desirable for isolation) | none independently blocked | low (can use live `../detections` in suite-e2e) |
| `missions` mock | none | Mp1Mp5 (5 rows) | medium |
| Ground Station trace | none | R4, O8 (2 rows) | low (inline-authorable) |
| MAVLink SITL | none | R4, R5, R6, R7, R9 (5 rows) + project conformance gate | high |
| Frame sequences | L1 (with image), T3 (with video) | enriches L6/L7 with telemetry | medium |
| Gimbal CSV | none | M1M4 (4 rows) + L6, L7 | **high — explicit user gap** |
| VLM I/O pairs | none | L3, S5 (2 rows) | low (model-choice gated) |
| Operator envelopes | none | O9, O10 (2 rows) | blocked on Q9 |
| GPS/NTP drift | none | R8 | low (inline-authorable) |
Per-row binding lives in `expected_results/results_report.md`. The status of each gap is mirrored in `_docs/_process_leftovers/` so the next `/autodev` run can replay the missing-fixture decision.
## What this file does NOT own
- Component design (how `detection_client` talks to Tier-1, how `mission_client` retries, etc.) — `_docs/02_document/architecture.md` and `_docs/02_document/components/*/description.md`.
- Production data shapes (frame rate, MAVLink message types) — `data_parameters.md` already has these.
- AC text — `_docs/00_problem/acceptance_criteria.md`.
- The choice of which mocks to use during a given test run (live vs replay vs scripted) — `_docs/02_document/tests/` (test strategy doc, authored by `/test-spec` Phase 2).