[AZ-626] Decompose complete: 47 tasks + docs + module layout

Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-19 11:02:01 +03:00
parent f7d6cb4a3a
commit bc40ea7300
235 changed files with 12585 additions and 15097 deletions
@@ -0,0 +1,374 @@
# Initial Project Structure
**Task**: AZ-640_initial_structure
**Name**: Initial Structure
**Description**: Scaffold the Rust cargo workspace — per-component crates, shared crate, runtime composition root, Dockerfile + docker-compose for dev/test, Woodpecker CI pipeline, observability scaffold, on-device state directory, env config, and replay-based integration test layout.
**Complexity**: 5 points
**Dependencies**: None
**Component**: Bootstrap
**Tracker**: AZ-640
**Epic**: AZ-626
## Project Folder Layout
```
autopilot/
├── Cargo.toml # cargo workspace manifest
├── Cargo.lock
├── rust-toolchain.toml # pin stable channel + components
├── .cargo/
│ └── config.toml # cross-compile target = aarch64-unknown-linux-gnu
├── .woodpecker.yml # CI pipeline (per deployment/ci_cd_pipeline.md)
├── .dockerignore
├── Dockerfile # multi-stage; non-root; pinned l4t-base for prod, ubuntu:22.04 for emul
├── docker-compose.yml # dev: autopilot + mock detections + mock missions + mock ground-station
├── docker-compose.test.yml # blackbox: autopilot + ArduPilot SITL + mock detections + replay sources
├── .env.example # documented environment variables
├── config/
│ ├── autopilot.dev.toml # dev profile (mock endpoints)
│ ├── autopilot.staging.toml # staging profile (real endpoints, non-flight)
│ └── autopilot.prod.toml # prod template (Jetson on-airframe)
├── crates/
│ ├── autopilot/ # binary crate — runtime composition root
│ │ ├── Cargo.toml # `[[bin]] name = "autopilot"`
│ │ ├── src/
│ │ │ ├── main.rs # CLI parse, config load, wire actors, run
│ │ │ ├── runtime.rs # actor topology, health aggregator, shutdown
│ │ │ └── health_server.rs # HTTP /health endpoint (port from config)
│ │ └── tests/ # cross-crate integration tests (replay-based)
│ ├── shared/
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs # re-exports
│ │ ├── models/ # canonical entities from data_model.md
│ │ │ ├── mod.rs
│ │ │ ├── frame.rs # Frame, BoundingBox
│ │ │ ├── detection.rs # Detection, DetectionBatch
│ │ │ ├── movement.rs # MovementCandidate
│ │ │ ├── tier2.rs # Tier2Evidence
│ │ │ ├── vlm.rs # VlmAssessment
│ │ │ ├── poi.rs # POI
│ │ │ ├── mapobject.rs # MapObject, MapObjectObservation, MapObjectsBundle, IgnoredItem
│ │ │ ├── mission.rs # MissionItem, MissionWaypoint, Geofence, Coordinate
│ │ │ ├── operator.rs # OperatorCommand
│ │ │ └── gimbal.rs # GimbalState
│ │ ├── config/ # toml loader + typed config sections
│ │ ├── error.rs # AutopilotError enum, Result alias
│ │ ├── health.rs # ComponentHealth, AggregatedHealth
│ │ ├── observability/ # tracing-subscriber init + log field constants
│ │ └── clock.rs # monotonic + wall-clock binding (GPS / NTP)
│ ├── frame_ingest/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs # public API trait + actor handle
│ │ ├── src/internal/ # decoder, RTSP client
│ │ └── tests/ # replay-based unit tests against fixture RTSP clips
│ ├── detection_client/
│ │ ├── Cargo.toml
│ │ ├── build.rs # tonic-build for ../detections .proto
│ │ ├── proto/ # copy of ../detections gRPC contract
│ │ ├── src/lib.rs
│ │ └── tests/
│ ├── movement_detector/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs
│ │ ├── src/internal/ # ego-motion, optical-flow, per-zoom-band thresholds
│ │ └── tests/ # replay fixtures, zoom-out + zoom-in
│ ├── semantic_analyzer/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs
│ │ ├── src/internal/ # primitive graph, ROI CNN call
│ │ └── tests/
│ ├── vlm_client/
│ │ ├── Cargo.toml # feature = ["vlm"] — see autopilot/Cargo.toml
│ │ ├── src/lib.rs # default impl returns VlmAssessment{status=vlm_disabled}
│ │ ├── src/internal/ # UDS client + peer-cred check
│ │ └── tests/
│ ├── scan_controller/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs
│ │ ├── src/state_machine/ # ZoomedOut / ZoomedIn / TargetFollow types
│ │ ├── src/poi_queue/ # priority queue + ≤5 POIs/min cap
│ │ └── tests/ # behaviour-tree scenarios from system-flows.md §F4
│ ├── mapobjects_store/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs
│ │ ├── src/internal/h3_index/ # h3rs wrapper
│ │ ├── src/internal/engine/ # engine trait + in-memory+snapshot default impl (Q3)
│ │ └── tests/
│ ├── gimbal_controller/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs
│ │ ├── src/internal/a40_protocol/ # ViewPro A40 UDP vendor protocol
│ │ └── tests/ # mock A40 over UDP
│ ├── operator_bridge/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs
│ │ ├── src/internal/auth/ # OperatorCommand envelope validation (Q9 — stubbed)
│ │ └── tests/
│ ├── mission_executor/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs
│ │ ├── src/internal/multirotor/ # multirotor variant FSM
│ │ ├── src/internal/fixed_wing/ # fixed-wing variant FSM
│ │ ├── src/internal/geofence/ # INCLUSION + EXCLUSION enforcement
│ │ ├── src/internal/failsafe/ # lost-link ladder, battery thresholds
│ │ └── tests/ # ArduPilot SITL fixtures
│ ├── mavlink_layer/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs
│ │ ├── src/internal/codec/ # MAVLink v2 encode/decode (only §7.7 surface)
│ │ ├── src/internal/transport/ # UDP and serial connection abstraction
│ │ └── tests/ # SITL conformance fixtures
│ ├── mission_client/
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs
│ │ ├── src/internal/missions_api/ # HTTPS REST client; pull + middle-waypoint POST
│ │ ├── src/internal/mapobjects_sync/ # pre-flight GET + post-flight POST of /mapobjects bundles
│ │ └── tests/ # mock missions API
│ └── telemetry_stream/
│ ├── Cargo.toml
│ ├── src/lib.rs
│ ├── src/internal/uplink/ # modem push of frames + telemetry + bbox overlay
│ └── tests/ # mock Ground Station receiver
├── tests/
│ └── e2e/ # cross-crate blackbox scenarios (used by docker-compose.test.yml)
├── benches/
│ ├── tier1_latency.rs # benchmark-gate harness for §6 NFRs
│ ├── tier2_latency.rs
│ ├── gimbal_zoom.rs
│ └── movement_fpr.rs # per-zoom-band FPR replay benchmark
├── fixtures/
│ ├── rtsp/ # pre-recorded RTSP clips
│ ├── mavlink/ # ArduPilot SITL replay scripts
│ ├── missions/ # mission JSON fixtures
│ └── detections/ # deterministic Tier-1 response fixtures
├── deploy/
│ ├── systemd/
│ │ └── autopilot.service # per deployment/containerization.md §3
│ └── jetson/
│ └── README.md # on-airframe install steps
└── README.md
```
### Layout Rationale
- **Cargo workspace with one crate per component** matches the recommended Rust layout in `_docs/02_document/decompose/templates/module-layout.md` (`crates/<component>/`). It enforces module boundaries: a crate's internals (`internal/`, private modules) are unreachable from sibling components — only its `lib.rs` public surface is.
- **Single binary crate `crates/autopilot/`** is the runtime composition root (per `deployment/containerization.md` — "single Rust binary"). It depends on every component crate and wires the actor topology in `runtime.rs`.
- **`crates/shared/`** owns the canonical entity catalogue from `data_model.md` and cross-cutting concerns (config, error, health, observability, clock). All component crates may import from it; it imports from no one.
- **`fixtures/` separate from `tests/`** because the same fixtures feed unit tests, replay-based integration tests, blackbox tests, and benchmark gates.
- **`vlm_client` crate exists unconditionally**; the optional behaviour is implemented via a default `VlmAssessment` provider that returns `status=vlm_disabled` when the `vlm` feature is off (per `architecture.md §7.6` "Optionality model").
## DTOs and Interfaces
### Shared DTOs (live in `crates/shared/src/models/`)
| DTO | Source spec | Used by components |
|---|---|---|
| `Frame`, `BoundingBox` | `data_model.md §2` | `frame_ingest`, `detection_client`, `movement_detector`, `semantic_analyzer`, `telemetry_stream` |
| `Detection`, `DetectionBatch` | `data_model.md §2` | `detection_client`, `scan_controller`, `telemetry_stream`, `operator_bridge` |
| `MovementCandidate` | `data_model.md §2` | `movement_detector`, `scan_controller` |
| `Tier2Evidence` | `data_model.md §2` | `semantic_analyzer`, `scan_controller` |
| `VlmAssessment` | `data_model.md §2` | `vlm_client`, `scan_controller` |
| `POI` | `data_model.md §3` | `scan_controller`, `operator_bridge`, `telemetry_stream` |
| `MapObject`, `MapObjectObservation`, `MapObjectsBundle`, `IgnoredItem` | `data_model.md §3` | `mapobjects_store`, `mission_client`, `scan_controller` |
| `Coordinate`, `Geofence`, `MissionItem` | `data_model.md §4` | `mission_client`, `mission_executor`, `operator_bridge` |
| `MissionWaypoint` | `data_model.md §4` | `mission_executor`, `mavlink_layer` |
| `OperatorCommand` | `data_model.md §4` | `operator_bridge`, `scan_controller`, `mission_executor` |
| `GimbalState` | `data_model.md §4` | `gimbal_controller`, `frame_ingest`, `movement_detector` |
| `AutopilotError`, `Result<T>` | new | every crate |
| `ComponentHealth`, `AggregatedHealth` | new (per `containerization.md §7`) | every crate + `autopilot/runtime.rs` |
### Component Public APIs (live in each component's `lib.rs`)
Each component exposes an actor handle plus its narrow request/response trait. Inter-component communication is Tokio channels owned inside the component; consumers receive a typed handle, not the underlying `tokio::sync::*` types.
| Component | Public surface (handle methods) | Exposed to |
|---|---|---|
| `frame_ingest` | `FrameIngestHandle::subscribe() -> FrameStream`, `health()` | `detection_client`, `movement_detector`, `telemetry_stream` |
| `detection_client` | `DetectionClientHandle::request(Frame) -> Result<DetectionBatch>`, `health()` | `scan_controller`, `movement_detector`, `telemetry_stream` |
| `movement_detector` | `MovementDetectorHandle::candidates() -> CandidateStream`, `health()` | `scan_controller` |
| `semantic_analyzer` | `SemanticAnalyzerHandle::analyze(Roi) -> Result<Tier2Evidence>`, `health()` | `scan_controller` |
| `vlm_client` (trait) | `VlmProvider::assess(Roi) -> Result<VlmAssessment>` (default impl returns `vlm_disabled`) | `scan_controller` |
| `scan_controller` | `ScanControllerHandle::tick(), submit_operator_cmd(OperatorCommand)`, `health()` | `autopilot::runtime` |
| `mapobjects_store` | `MapObjectsStoreHandle::classify(Detection) -> Classification`, `apply_decline(Poi)`, `dump_pending() -> MapObjectsBundle`, `hydrate(MapObjectsBundle)`, `health()` | `scan_controller`, `mission_client` |
| `gimbal_controller` | `GimbalControllerHandle::set_pose(GimbalCommand), zoom(level), state() -> GimbalState`, `health()` | `scan_controller` |
| `operator_bridge` | `OperatorBridgeHandle::surface_poi(POI) -> OperatorDecision`, `cmds() -> CommandStream`, `health()` | `scan_controller`, `mission_executor` |
| `mission_executor` | `MissionExecutorHandle::start(Mission), insert_middle_waypoint(Coordinate), failsafe_trigger(FailsafeKind)`, `health()` | `scan_controller`, `operator_bridge` |
| `mavlink_layer` | `MavlinkHandle::send(Command), telemetry() -> TelemetryStream`, `health()` | `mission_executor`, `telemetry_stream` |
| `mission_client` | `MissionClientHandle::pull_mission() -> Mission`, `post_middle_waypoint(Coordinate)`, `pull_mapobjects(MissionId) -> MapObjectsBundle`, `push_mapobjects(MapObjectsBundle)`, `health()` | `mission_executor`, `mapobjects_store` |
| `telemetry_stream` | `TelemetryStreamHandle::push_frame(Frame, Overlay), push_telemetry(Sample)`, `health()` | `frame_ingest`, `detection_client`, `mavlink_layer`, `operator_bridge` |
## CI/CD Pipeline
Single Woodpecker pipeline (per `deployment/ci_cd_pipeline.md §2`). Stages run sequentially; a failed stage stops the run.
| Stage | Purpose | Tool / Command |
|---|---|---|
| Fetch | Clone, restore Cargo cache | `cargo fetch` with remote cache key |
| Lint | `cargo fmt --check`; `cargo clippy --all-targets --all-features -- -D warnings` | Hard fail on any warning |
| Unit Tests | `cargo test --workspace` (host-arch) | Most logic is platform-independent |
| Build arm64 | Cross-compile for `aarch64-unknown-linux-gnu` | `cross` or `cargo zigbuild`; produce binary + debug symbols |
| Build no-vlm | `cargo build --workspace --no-default-features` | Enforces VLM optionality contract |
| Integration Tests | Replay-based, no hardware | `cargo test --test '*' -- --include-ignored=false`; fixtures from `fixtures/` |
| SITL Conformance | ArduPilot SITL + autopilot binary in containers, fixed mission, asserts §7.7 surface + geofence | `docker compose -f docker-compose.test.yml up --abort-on-container-exit` |
| Security Scan | `cargo audit` + `cargo deny check` | Dependency CVE scan |
| Benchmark Gate (manual / nightly) | Tier 1 / 2 / VLM / gimbal latency on real Jetson | Runs on self-hosted Jetson Orin Nano runner |
| Package | Build container image | Multi-arch tag `azaion/autopilot:<branch>-arm64` |
| Sign | Cosign for image; OS signing flow for binary | Tagged builds only |
| Publish | Push image + binary to internal registry | Tagged builds only |
### Pipeline Configuration Notes
- Cache `~/.cargo/registry/`, `~/.cargo/git/`, and `target/` between runs keyed on `Cargo.lock` hash.
- `--features vlm` and the no-feature path are both built and tested to enforce the optionality contract.
- `dev` and `main` branches are protected; force-push forbidden; merges require a green pipeline.
- Benchmark gate is opt-in (manual approval or nightly cron) because it requires a Jetson runner.
## Environment Strategy
| Environment | Purpose | Configuration Notes |
|---|---|---|
| Development (local) | Run autopilot locally against mock detections + mock missions + mock Ground Station; iterate on logic | `docker compose -f docker-compose.yml up`; `config/autopilot.dev.toml`; `RUST_LOG=info,autopilot=debug` |
| Staging | Pre-production: real `../detections`, real `missions` API, real `Ground Station`, but no airframe MAVLink (SITL instead) | `config/autopilot.staging.toml`; secrets via `EnvironmentFile=` |
| Production (airframe) | Native systemd on Jetson Orin Nano per `containerization.md §3` | `/etc/azaion/autopilot/config.toml`; `/etc/systemd/system/autopilot.service`; `/var/lib/autopilot/`; `/run/azaion/in-flight` flight-gate marker |
| CI (Tier-1) | Lint + unit + replay-based integration on amd64 | GitHub-hosted runner; no GPU |
| CI (Tier-2) | Benchmark gate on real Jetson | Self-hosted Jetson Orin Nano Super runner; pinned JetPack + power mode |
### Environment Variables
| Variable | Dev | Staging | Production | Description |
|---|---|---|---|---|
| `AUTOPILOT_CONFIG` | `./config/autopilot.dev.toml` | `/etc/azaion/autopilot/config.toml` | `/etc/azaion/autopilot/config.toml` | Path to TOML config |
| `RUST_LOG` | `info,autopilot=debug` | `info` | `info` | `tracing-subscriber` filter |
| `AUTOPILOT_MISSION_ID` | (per-flight CLI arg) | (per-flight CLI arg) | (per-flight CLI arg) | Active mission UUID; CLI arg, not env |
| `AUTOPILOT_HEALTH_BIND` | `127.0.0.1:8080` | `127.0.0.1:8080` | `127.0.0.1:8080` | HTTP `/health` bind address |
| `AUTOPILOT_VLM_ENABLED` | `false` | `false` (until benchmark passes) | per benchmark | Runtime VLM flag; binary must also build with `--features vlm` |
| `MISSIONS_API_TOKEN` | (mock) | from `EnvironmentFile=` | from `EnvironmentFile=` | Bearer token; never in `config.toml` |
| `GROUND_STATION_TOKEN` | (mock) | from `EnvironmentFile=` | from `EnvironmentFile=` | Bearer / session token |
All non-secret configuration lives in `config.toml` (per `containerization.md §6`). Secrets come from `EnvironmentFile=` on systemd, from compose `secrets:` in containers.
## Database Migration Approach
**Migration tool**: none — autopilot has **no traditional database**.
**Persistence strategy**: the only persisted data is the on-device `mapobjects_store`. Its engine is open (`architecture.md §8 Q3`); the bootstrap default is **in-memory + snapshot to `/var/lib/autopilot/mapobjects/`** (file-backed, no schema migrations). When Q3 resolves toward SQLite + H3 or another engine, the `mapobjects_store` crate's engine module is swapped without changing its public API. The central `missions` API owns its own Postgres schema (per `architecture.md §7.13`) — autopilot does NOT migrate central tables.
### Initial Persisted Surface
| Subsystem | What is persisted | Where | Format |
|---|---|---|---|
| `mapobjects_store` | `current_state`, `pending_observations`, `pending_ignored`, `sync_state` | `/var/lib/autopilot/mapobjects/` | engine-defined; default = JSON snapshots + append-only log |
| `operator_bridge` audit log | accepted/rejected `OperatorCommand` envelopes | `/var/lib/autopilot/audit/` | newline-delimited JSON |
| `mission_client` deferred uploads | post-flight push payload on push failure | `/var/lib/autopilot/pending_pushes/` | JSON files keyed by mission ID |
Disk quota for `/var/lib/autopilot/` is configured in `config.toml`; persistent-store-full at pre-flight BIT is a takeoff blocker (per `architecture.md §5`).
## Test Structure
```
crates/<component>/
└── tests/ # crate-level integration tests; per-crate
└── <scenario>.rs
tests/
└── e2e/ # workspace-level end-to-end (uses docker-compose.test.yml)
├── sitl_conformance.rs # SITL gate per ci_cd_pipeline.md §5
├── geofence_inclusion.rs
├── geofence_exclusion.rs # explicit regression vs earlier silent-ignore behaviour
├── lost_link_failsafe.rs
└── operator_command_replay.rs
fixtures/
├── rtsp/<clip>.h264
├── mavlink/<replay>.tlog
├── missions/<mission>.json
└── detections/<deterministic>.json
benches/
├── tier1_latency.rs # benchmark-gate harness
├── tier2_latency.rs
├── gimbal_zoom.rs
└── movement_fpr.rs # per-zoom-band FPR replay
```
### Test Configuration Notes
- **Unit tests** live alongside each component's source in `#[cfg(test)] mod tests { ... }` within `src/` files. They MUST run in <5 s on developer workstation; no network, no Docker.
- **Crate-level integration tests** live in `crates/<component>/tests/`. They may use fixtures from `fixtures/` but MUST NOT cross component boundaries — that's what workspace e2e is for.
- **Workspace e2e** in `tests/e2e/` exercises the full binary against a docker-compose-managed stack (ArduPilot SITL, mock missions API, mock detections gRPC, replay RTSP).
- **Replay-driven debugging**: all non-trivial decisions are reconstructable from logs + size-capped raw inputs (per `observability.md §6`). Replay fixtures are the foundation of regression tests.
- **Test runner**: `cargo test --workspace` for unit + integration; `docker compose -f docker-compose.test.yml up --abort-on-container-exit` for e2e; `cargo bench` (or `criterion`) for benchmark-gate measurements.
- **Mock-data discipline**: mocks live in `tests/` directories only — never in production crates (per `coderule.mdc`).
## Implementation Order
| Order | Component | Reason |
|---|---|---|
| 1 | `shared` (models + config + error + health + observability + clock) | Every other crate depends on it; nothing depends on it. Must land first. |
| 2 | `mavlink_layer` | Self-contained transport; required by `mission_executor` and `telemetry_stream`; SITL conformance lands the first hard gate early. |
| 3 | `mission_client` | Self-contained REST client; required by `mission_executor` and `mapobjects_store` sync. |
| 4 | `mission_executor` | Combines `mavlink_layer` + `mission_client` + geofence/failsafe logic; gates takeoff via BIT. |
| 5 | `gimbal_controller` | Self-contained A40 UDP driver; required by `scan_controller`. |
| 6 | `frame_ingest` | RTSP decoder; required by all perception crates. |
| 7 | `detection_client` | gRPC client to `../detections`; required by `scan_controller` and `telemetry_stream`. |
| 8 | `movement_detector` | Depends on `frame_ingest` + `GimbalState`; standalone otherwise. |
| 9 | `mapobjects_store` | Engine choice may be deferred; default in-memory+snapshot unblocks `scan_controller`. |
| 10 | `semantic_analyzer` | Tier 2; depends on `Frame` + `Detection`. |
| 11 | `vlm_client` | Optional; default impl returns `vlm_disabled`. Real IPC implementation can land later. |
| 12 | `telemetry_stream` | Pure egress; ready once `frame_ingest`, `detection_client`, `mavlink_layer` exist. |
| 13 | `operator_bridge` | Depends on `telemetry_stream` + `mapobjects_store`; envelope auth scheme is Q9-stubbed. |
| 14 | `scan_controller` | Sits on top of everything in Perception + Action; lands last. |
| 15 | `autopilot` binary (composition root) | Wires every component handle; runs the actor topology. |
## Acceptance Criteria
**AC-1: Workspace scaffolded**
Given the structure plan above
When the implementer executes this task
Then `cargo metadata` lists all 14 crates (`shared`, `autopilot`, and 12 components — `vlm_client` is the 13th component crate but listed under perception above) and `cargo check --workspace` succeeds with no compile errors.
**AC-2: Stub tests runnable**
Given the scaffolded workspace
When `cargo test --workspace` runs on a developer workstation (no Docker, no GPU)
Then every crate's stub test (e.g. `it_compiles()`) passes within 5 seconds total.
**AC-3: CI pipeline configured**
Given the scaffolded workspace
When the Woodpecker pipeline runs on a feature branch push
Then `fetch → lint → unit-test → build-arm64 → build-no-vlm → integration-test → sitl-conformance` all complete successfully on a known-good baseline commit.
**AC-4: Dev compose boots**
Given `docker-compose.yml`
When `docker compose -f docker-compose.yml up -d` runs on a fresh workstation
Then the autopilot container starts, the `/health` endpoint returns HTTP 200 with `status: green | yellow` (red is acceptable here only for components without a mock target), and the mock detections + mock missions services are reachable.
**AC-5: Blackbox compose boots with SITL**
Given `docker-compose.test.yml`
When `docker compose -f docker-compose.test.yml up --abort-on-container-exit` runs
Then ArduPilot SITL + autopilot + mock detections + replay RTSP all start, and the SITL conformance e2e test exits 0.
**AC-6: Optionality contract enforced**
Given the scaffolded workspace
When `cargo build --workspace --no-default-features` runs
Then the binary builds and links without the `vlm` feature; `cargo test --workspace --no-default-features` passes; the `VlmProvider` default impl returns `VlmAssessment{status=vlm_disabled}`.
**AC-7: Cross-compile target ready**
Given `.cargo/config.toml` configured for `aarch64-unknown-linux-gnu`
When `cross build --target aarch64-unknown-linux-gnu --release` (or `cargo zigbuild` equivalent) runs in CI
Then an aarch64 binary is produced and stored as an artifact.
**AC-8: Flight-gate marker wiring exists**
Given `deploy/systemd/autopilot.service`
When systemd parses the unit
Then `ExecStartPre` asserts `/run/azaion/in-flight` is created and `ExecStopPost` removes it (per `containerization.md §3` and the suite-level flight-gate convention).
**AC-9: Observability scaffold initialised**
Given the autopilot binary
When it starts
Then `tracing-subscriber` emits JSON-formatted logs to stdout with the per-line fields enumerated in `observability.md §2` (`ts`, `ts_mono_ns`, `level`, `target`, `event`), and the `/health` endpoint returns the per-component breakdown documented in `containerization.md §7`.
**AC-10: Persistent state directory created**
Given `/var/lib/autopilot/` (or its container-mounted equivalent)
When autopilot starts in dev or prod
Then the binary creates `mapobjects/`, `audit/`, and `pending_pushes/` subdirectories with the owning user, fails closed if any directory cannot be created, and surfaces the failure to `/health` (red on `mapobjects_store`).
@@ -0,0 +1,80 @@
# MAVLink Transport and Heartbeat
**Task**: AZ-641_mavlink_transport_and_heartbeat
**Name**: MAVLink transport + heartbeat
**Description**: Single connection abstraction (UDP or serial, picked at startup), 1 Hz outgoing HEARTBEAT, bounded reconnect on transport loss, autopilot-heartbeat timeout detection.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure
**Component**: mavlink_layer
**Tracker**: AZ-641
**Epic**: AZ-637
## Problem
`mavlink_layer` needs a single, stable connection abstraction to the airframe autopilot (ArduPilot / PX4). The connection is either UDP or serial — picked once at startup from the connection URI (`udp://...` or `serial:///dev/...`); no runtime URI swap. The link must self-heal on transport loss with bounded backoff and surface link health to the rest of the system without silent failure.
## Outcome
- A `MavlinkConnection` opens once at startup and reconnects automatically on transport loss with bounded exponential backoff (≤2 s on serial / ≤5 s on UDP).
- A 1 Hz outgoing `HEARTBEAT` keeps the autopilot's GCS-link path alive.
- Autopilot heartbeats received on the inbound stream are timestamped; a configurable wall-clock timeout flips link state to `lost` and surfaces it via `health()` and a typed signal consumed by `mission_executor`.
- Health surface includes `connected`, `last_heartbeat_age_ms`, `signing_enabled`.
## Scope
### Included
- Connection-URI parser (`udp://host:port` and `serial:///dev/...`).
- UDP socket and serial port concrete transports behind a single `Transport` trait.
- Bounded exponential backoff on transport-open failure and on read failure.
- 1 Hz outgoing `HEARTBEAT` timer.
- Inbound heartbeat timestamping + wall-clock timeout → `link_lost` signal.
- `ComponentHealth` surface fields above.
### Excluded
- Message encoding / decoding (separate task 03).
- Command-ack demux and retry (separate task 04).
- MAVLink-2 signing (separate task 04; only the `signing_enabled` flag is plumbed here).
## Acceptance Criteria
**AC-1: UDP connection opens and survives drop**
Given a configured `udp://127.0.0.1:14550` endpoint
When the autopilot is not listening at process start
Then `MavlinkLayer::run()` retries with exponential backoff up to its cap and reports `connected = false` via `health()`; when the autopilot becomes reachable, the link reconnects within ≤5 s.
**AC-2: Serial connection opens and survives drop**
Given a configured `serial:///dev/pts/N` endpoint backed by a `socat` pair (or equivalent)
When the peer end is closed and reopened
Then `mavlink_layer` reconnects within ≤2 s and resumes heartbeat emission.
**AC-3: Heartbeat emitted at 1 Hz**
Given a healthy link
When the connection is open for 10 s
Then exactly 10 ± 1 outbound `HEARTBEAT` frames are observed by the peer.
**AC-4: Autopilot heartbeat loss flips link state**
Given a healthy link that has been emitting peer heartbeats
When the peer stops sending heartbeats
Then within the configured timeout (default 3 s) `health()` reports `link_lost = true` and a typed `LinkLost` signal is emitted on the public output channel.
## Non-Functional Requirements
**Performance**
- Reconnect latency: ≤2 s serial, ≤5 s UDP.
- Heartbeat cadence: 1 Hz ± 50 ms.
**Reliability**
- No infinite retry — bounded backoff cap is configurable (default 30 s).
- Transport-open failure surfaces to health → red; never silently absorbed.
## Constraints
- Hand-rolled — no third-party MAVLink SDK (per `architecture.md §5`).
- Single connection per process; no runtime URI swap.
## Runtime Completeness
- **Named capability**: MAVLink emission (HEARTBEAT) and link liveness.
- **Production code that must exist**: real UDP socket and real serial port transports.
- **Allowed external stubs**: in CI / integration tests, the peer end may be `socat` for serial or a loopback UDP listener.
- **Unacceptable substitutes**: a "fake transport" that swallows writes and synthesises heartbeats is not allowed in production code — only as a test double under `#[cfg(test)]`.
@@ -0,0 +1,79 @@
# MAVLink Message Codec (§7.7 Surface)
**Task**: AZ-642_mavlink_codec
**Name**: MAVLink v2 encode/decode for the §7.7 surface
**Description**: Encode and decode the ~1015 MAVLink v2 messages this codebase needs (the §7.7 surface only) with strict validation.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure
**Component**: mavlink_layer
**Tracker**: AZ-642
**Epic**: AZ-637
## Problem
Autopilot speaks a deliberately narrow MAVLink command surface (per `architecture.md §7.7` — ~1015 messages). Adding messages outside that list requires explicit design review. A hand-rolled MAVLink v2 codec must encode outbound messages with correct sequence numbers, system / component IDs, and (when enabled) signing, and decode inbound messages with strict validation — rejecting malformed frames, unknown IDs, and signing failures.
## Outcome
- Outbound encoder produces wire-correct MAVLink v2 frames for the message surface in §7.7 with monotonically incrementing per-link sequence numbers.
- Inbound decoder parses the same surface, rejecting malformed frames, unknown message IDs, and frames with sequence-number gaps (logged, not hard-failed).
- Decoded messages are exposed as a typed `MavlinkMessage` enum (one variant per supported message kind) on the inbound channel.
- Per-message-kind parse error counters are exposed via `health()`.
## Scope
### Included
- Encode + decode for `HEARTBEAT` (bidir), `COMMAND_LONG` outbound subset (arm/disarm, takeoff, set-mode, change-speed, change-alt, land, RTL), `COMMAND_ACK` inbound, `MISSION_COUNT`, `MISSION_REQUEST_INT`, `MISSION_ITEM_INT`, `MISSION_ACK`, `MISSION_SET_CURRENT`, `MISSION_CURRENT`, `MISSION_ITEM_REACHED`, `MISSION_CLEAR_ALL`, `GLOBAL_POSITION_INT`, `ATTITUDE`, `SYS_STATUS`, `EXTENDED_SYS_STATE`, `STATUSTEXT`, `SET_MODE`.
- Per-link outbound `tx_seq` counter with wrap-around handling.
- Strict size + CRC validation; reject malformed frames.
- Unknown message IDs counted and dropped (not hard-failed).
- Sequence-number gap detection (logged, not fatal).
### Excluded
- Transport and reconnect (task 02).
- Heartbeat scheduling (task 02).
- Ack demultiplexing to callers (task 04).
- MAVLink-2 signing (task 04).
- Any message not in the §7.7 surface — adding new messages requires design review.
## Acceptance Criteria
**AC-1: Round-trip every supported message**
Given the encoder produces a frame for each message kind in the §7.7 surface with deterministic field values
When the same frame is fed back through the decoder
Then the typed `MavlinkMessage` matches the original fields and `parse_errors_total` does not increment.
**AC-2: Malformed frame is rejected**
Given a byte buffer with a truncated payload or a wrong CRC
When the decoder consumes it
Then the frame is dropped, `parse_errors_total{kind="crc" | "truncated"}` increments by 1, and the codec continues processing subsequent bytes.
**AC-3: Unknown message ID is counted, not fatal**
Given an inbound frame with a message ID outside the §7.7 surface
When the decoder consumes it
Then the frame is dropped, `parse_errors_total{kind="unknown_id"}` increments by 1, and decoding continues.
**AC-4: SITL round-trip**
Given an ArduPilot SITL instance configured for `udp://127.0.0.1:14550`
When `mavlink_layer` emits a `COMMAND_LONG` for `MAV_CMD_NAV_RETURN_TO_LAUNCH`
Then SITL receives the command and replies with a matching `COMMAND_ACK`; the decoder emits a `MavlinkMessage::CommandAck` with `result = MAV_RESULT_ACCEPTED`.
## Non-Functional Requirements
**Performance**
- Per-message encode + decode round-trip: ≤50 ms p99 on a healthy link (per `description.md §8`).
**Reliability**
- No silent acceptance of malformed or signed-mismatch frames.
## Constraints
- Hand-rolled — no third-party MAVLink SDK.
- Adding any message outside the §7.7 surface requires an explicit design review noted in the PR description.
## Runtime Completeness
- **Named capability**: MAVLink v2 wire-correct encode/decode for the §7.7 command surface.
- **Production code that must exist**: real byte-level encoder + decoder; CRC computation; sequence number handling.
- **Allowed external stubs**: ArduPilot SITL is the conformance reference for the SITL round-trip AC.
- **Unacceptable substitutes**: a JSON or human-readable "MAVLink-like" envelope is not acceptable — the wire format must be MAVLink v2.
@@ -0,0 +1,74 @@
# MAVLink Ack Demux, Retry, and Signing
**Task**: AZ-643_mavlink_ack_demux_and_signing
**Name**: Command-ack demux + retry handle + optional MAVLink-2 signing
**Description**: Map outbound `COMMAND_LONG` requests to their `COMMAND_ACK` responses by `command_id`, enforce ack timeout, surface result to the originating caller; optionally enable MAVLink-2 message signing.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-641_mavlink_transport_and_heartbeat, AZ-642_mavlink_codec
**Component**: mavlink_layer
**Tracker**: AZ-643
**Epic**: AZ-637
## Problem
Outbound MAVLink commands are async with respect to their acks. `mission_executor` (and other callers) need a synchronous-feeling `send_command(...) -> Result<CommandAck>` API that times out at a configurable wall-clock deadline (default 1 s) — the retry decision then belongs to the caller, not to `mavlink_layer`. Separately, when the autopilot link supports it, MAVLink-2 message signing should be enabled for outbound frames and validated for inbound frames; mismatched signatures are rejected.
## Outcome
- `MavlinkHandle::send_command(cmd) -> Result<CommandAck, AckTimeout>` resolves when a matching `COMMAND_ACK` arrives within the deadline, or returns `AckTimeout` otherwise.
- An in-flight command map (`command_id → (caller, deadline)`) is correctly populated and cleared on success and on timeout (no leaks).
- When `signing_enabled = true` at config time, outbound frames are signed; inbound frames with bad signatures are rejected and counted (`parse_errors_total{kind="signing_mismatch"}`).
- `signing_enabled` is reported in `health()`.
## Scope
### Included
- In-flight command map with deadline-driven eviction.
- Public `send_command(...) -> Result<CommandAck>` API.
- MAVLink-2 outbound signature + inbound signature validation (off-by-default; on when configured).
- Health fields: `commands_in_flight`, `signing_enabled`.
### Excluded
- The decision to retry on `AckTimeout` (belongs to `mission_executor`).
- Encoding the new commands themselves (task 03).
## Acceptance Criteria
**AC-1: Command-ack happy path**
Given a healthy SITL link
When `send_command(MAV_CMD_NAV_RETURN_TO_LAUNCH)` is called
Then within ≤1 s the result resolves with `MAV_RESULT_ACCEPTED` and `commands_in_flight` returns to 0.
**AC-2: Ack timeout returns explicit error**
Given a SITL instance that is configured not to ack commands (or is paused)
When `send_command(...)` is called with the default 1 s deadline
Then the call resolves with `Err(AckTimeout)`; the in-flight map is cleared; the link stays open.
**AC-3: Signing rejection counted**
Given `signing_enabled = true` and an inbound frame whose signature does not match
When the decoder runs on the frame
Then the frame is rejected, `parse_errors_total{kind="signing_mismatch"}` increments by 1, and the link stays open.
**AC-4: Optional signing — disabled path**
Given `signing_enabled = false`
When inbound frames arrive (signed or unsigned)
Then the signature field is ignored and `parse_errors_total{kind="signing_mismatch"}` stays at 0.
## Non-Functional Requirements
**Performance**
- Ack demux lookup: O(1); does not contribute measurably to the ≤50 ms per-message round-trip target.
**Reliability**
- No leaked entries in the in-flight map; every `send_command` either resolves or times out.
## Constraints
- Signing scheme decision (Q6) lives elsewhere — this task only wires the on/off mechanism using the spec-defined MAVLink-2 signing.
## Runtime Completeness
- **Named capability**: MAVLink-2 message signing (when enabled) + COMMAND_ACK demux.
- **Production code that must exist**: real signature computation + verification; in-flight map keyed by `command_id`.
- **Allowed external stubs**: SITL with signing disabled is the default test fixture; a separate fixture exercises the signing path.
- **Unacceptable substitutes**: signature stub that always returns "valid" is not acceptable in production.
@@ -0,0 +1,81 @@
# Mission Pull + Schema Validation
**Task**: AZ-644_mission_client_pull_and_schema
**Name**: HTTPS mission fetch + schema validation
**Description**: HTTPS REST client to the external `missions` API, mission fetch by `mission_id` on startup, validate the response against the shared `mission-schema`, bounded retry on transient connection loss.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure
**Component**: mission_client
**Tracker**: AZ-644
**Epic**: AZ-638
## Problem
`autopilot` does not own the missions database — it fetches the mission by ID from the external `missions` API at startup. The response must validate against the shared `mission-schema`; on schema-invalid the mission MUST be rejected (no silent downcast). On transient connectivity failure, fetch is retried with bounded exponential backoff; on exceeding the cap, the mission start is refused and health flips to red.
## Outcome
- `MissionClient::fetch(mission_id) -> Result<Mission, FetchError>` performs an HTTPS GET against the configured `missions` endpoint, validates the response against the bundled `mission-schema` (schema version recorded), and returns a typed `Mission` (`{ waypoints, geofences, return_point, mission_id, schema_version }`).
- Transient failures (timeout, 5xx, DNS) are retried with bounded exponential backoff; max attempts configurable (default 5).
- On schema mismatch the call returns `Err(SchemaInvalid)` with a size-capped sample of the raw response for offline analysis.
- Health surface includes `last_fetch_ts`, `fetch_errors_total`, `schema_version`, `connection_state`.
## Scope
### Included
- HTTPS client (`reqwest` or `hyper` — pick the one already pinned in `shared`).
- Auth header plumb-through (concrete scheme deferred to `../_docs/02_missions.md`; passed as opaque `Authorization` header).
- Schema validation against `mission-schema` (bundled in `shared/contracts/`).
- Bounded exponential backoff.
### Excluded
- Middle-waypoint POST (task 06).
- MapObjects pre-flight pull (task 07).
- MapObjects post-flight push and durable queue (task 08).
## Acceptance Criteria
**AC-1: Happy path fetch**
Given a fixture `missions` API that returns a schema-valid mission JSON for `mission_id = M1`
When `MissionClient::fetch("M1")` is called
Then it returns `Ok(Mission { ... })` and `health()` reports `last_fetch_ts` updated, `connection_state = "ok"`.
**AC-2: Schema-invalid is rejected**
Given a fixture `missions` API that returns a valid HTTP 200 but the JSON body has a missing required field
When `MissionClient::fetch("M1")` is called
Then it returns `Err(SchemaInvalid)` and `health()` records the failure; the raw response excerpt is logged size-capped.
**AC-3: Transient failure retries within budget**
Given the missions API returns `503` for the first two attempts and `200` on the third
When `MissionClient::fetch("M1")` is called
Then it returns `Ok` after the third attempt; backoff is observed between attempts.
**AC-4: Cap exhaustion refuses start**
Given the missions API is unreachable for all 5 default attempts
When `MissionClient::fetch("M1")` is called
Then it returns `Err(MaxRetriesExceeded)` and `health()` is red.
## Non-Functional Requirements
**Performance**
- Startup fetch completes within ≤5 s on healthy connectivity.
**Reliability**
- No silent downcast on schema mismatch.
- No infinite retry — bounded backoff cap is configurable.
## Constraints
- Mission schema is shared with the external `missions` repo; the schema file lives in `shared/contracts/mission-schema.json` (bundled at build time).
## Contract
- `mission-schema.json` is the authoritative wire contract. Owner: `../_docs/02_missions.md`. Bundled copy in `shared/contracts/mission-schema.json`.
- Canonical typed model: `data_model.md §MissionItem`, `§MissionWaypoint`, `§Geofence`.
## Runtime Completeness
- **Named capability**: HTTPS REST to the external `missions` API + JSON Schema validation.
- **Production code that must exist**: real HTTPS request; real JSON Schema validator (e.g. `jsonschema` crate).
- **Allowed external stubs**: in tests, the missions API can be a local `wiremock`/`mockito` server.
- **Unacceptable substitutes**: skipping schema validation in production "for speed" is not acceptable; validation is a safety boundary.
@@ -0,0 +1,64 @@
# Middle-Waypoint POST
**Task**: AZ-645_mission_client_waypoint_post
**Name**: Middle-waypoint POST to missions API
**Description**: POST the updated mission (with operator-confirmed middle waypoint inserted) to the external `missions` API; bounded retry; surface failure to `mission_executor`.
**Complexity**: 2 points
**Dependencies**: AZ-640_initial_structure, AZ-644_mission_client_pull_and_schema
**Component**: mission_client
**Tracker**: AZ-645
**Epic**: AZ-638
## Problem
When the operator confirms a POI, `scan_controller` hands a middle-waypoint hint to `mission_executor`, which computes the patched mission (`current_position → middle_waypoint → resume_original_route`). That patched mission must be POSTed to the external `missions` API for persistence and traceability. If the POST fails, the executor decides whether to halt, RTL, or continue with the in-memory mission — `mission_client` only surfaces the failure.
## Outcome
- `MissionClient::post_middle_waypoint(mission_id, patched_mission) -> Result<MissionUpdateAck, PostError>` performs a `POST /missions/{id}/middle-waypoint` (exact path per `../_docs/02_missions.md`) and awaits an ack.
- Bounded exponential backoff on transient failure (default 3 attempts).
- On final failure returns a typed error; never silent.
- Health field `last_middle_waypoint_post_status` updated.
## Scope
### Included
- POST endpoint call with the patched mission body.
- Bounded retry on 5xx / timeout.
- Error surface to caller.
### Excluded
- The decision to RTL on failure (`mission_executor`).
- Recomputing the patched mission (`mission_executor`).
## Acceptance Criteria
**AC-1: Happy path POST**
Given a fixture missions API that accepts the POST and returns `200`
When `post_middle_waypoint("M1", patched)` is called
Then it returns `Ok(MissionUpdateAck { ... })` within ≤2 s and `health.last_middle_waypoint_post_status = "ok"`.
**AC-2: Transient failure retries**
Given the API returns `503` once then `200`
When the call is made
Then it returns `Ok` on the second attempt.
**AC-3: Cap exhaustion bubbles error**
Given the API returns `500` for all 3 default attempts
When the call is made
Then it returns `Err(MaxRetriesExceeded)` and the error is surfaced to the caller; no silent absorption.
## Non-Functional Requirements
**Performance**
- Single happy-path POST completes in ≤2 s on healthy connectivity.
**Reliability**
- Bounded backoff; no infinite retry.
## Runtime Completeness
- **Named capability**: middle-waypoint POST against the external `missions` API.
- **Production code that must exist**: real HTTPS POST.
- **Allowed external stubs**: `wiremock`/`mockito` for tests.
- **Unacceptable substitutes**: swallowing the error and proceeding is not acceptable.
@@ -0,0 +1,76 @@
# MapObjects Pre-Flight Pull
**Task**: AZ-646_mission_client_mapobjects_pull
**Name**: Pre-flight MapObjects GET + cached-fallback handshake
**Description**: After mission fetch succeeds, GET `/missions/{id}/mapobjects` (and `/ignored` if separated). Surface the bundle to `mapobjects_store`. On failure, surface BIT degradation — operator must acknowledge cached fallback or abort. Never silent.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-644_mission_client_pull_and_schema
**Component**: mission_client
**Tracker**: AZ-646
**Epic**: AZ-638
## Problem
The MapObjects working copy is hydrated pre-flight from the central `missions` API. The pull must complete before `mission_executor` proceeds past `BIT_OK`. On pull failure the system must NOT silently proceed; instead, `mission_executor`'s BIT (F9) surfaces a degraded state — the operator either acknowledges cached fallback (signed acknowledgement per Q9) or aborts.
## Outcome
- `MissionClient::pull_mapobjects(mission_id) -> Result<MapObjectsBundle, PullError>` performs a `GET /missions/{id}/mapobjects` (and `/ignored` if the API splits them) and returns a typed `MapObjectsBundle { map_objects, ignored_items, fetched_at, schema_version, fallback_used: bool }`.
- On 200, the bundle is handed to `mapobjects_store` for hydration; `mapobjects_pull_state = synced`.
- On error or timeout, `pull_state = failed`; the typed error is surfaced to `mission_executor` (F9 BIT degrades, never silent).
- Health fields: `mapobjects_pull_state`, `last_mapobjects_pull_ts`.
## Scope
### Included
- GET endpoint(s) call.
- Schema validation of the bundle (using the shared MapObjects schema in `shared/contracts/`).
- Cached-fallback semantics — the **cache** itself lives in `mapobjects_store` (task 28); this task only knows to set `fallback_used = true` if it uses cached on operator ack.
- Health surface fields above.
### Excluded
- The cache storage itself (lives in `mapobjects_store`).
- Operator-acknowledgement flow (`operator_bridge`).
- BIT orchestration (`mission_executor`).
## Acceptance Criteria
**AC-1: Happy path pull**
Given a fixture API that returns a schema-valid MapObjects bundle
When `pull_mapobjects("M1")` is called
Then it returns `Ok(bundle)`, `pull_state = synced`, and the bundle reaches `mapobjects_store` for hydration.
**AC-2: Schema-invalid is rejected**
Given the API returns a 200 with a missing required field
When `pull_mapobjects("M1")` is called
Then it returns `Err(SchemaInvalid)` and `pull_state = failed`; no silent acceptance.
**AC-3: Network failure surfaces to F9**
Given the API is unreachable
When `pull_mapobjects("M1")` is called
Then it returns `Err(Unreachable)`, `pull_state = failed`, and the error is observable by `mission_executor`'s BIT path.
**AC-4: 30 km × 30 km area completes within budget**
Given a fixture bundle the size of a 30 km × 30 km mission area
When the pull is performed on a 100 Mbps loopback link
Then the call completes in ≤30 s.
## Non-Functional Requirements
**Performance**
- ≤30 s for a 30 km × 30 km mission area on healthy connectivity (per `description.md §8`).
**Reliability**
- Never silent on failure.
## Contract
- MapObjects bundle schema: `shared/contracts/mapobjects-bundle.json`. Owner: `../_docs/02_missions.md` §7.13 extension.
- Canonical typed model: `data_model.md §MapObjectsBundle`.
## Runtime Completeness
- **Named capability**: HTTPS GET against the central MapObjects extension + schema validation.
- **Production code that must exist**: real HTTPS GET; real schema validator.
- **Allowed external stubs**: `wiremock`/`mockito`.
- **Unacceptable substitutes**: skipping schema validation in production.
@@ -0,0 +1,84 @@
# MapObjects Post-Flight Push + Durable Queue
**Task**: AZ-647_mission_client_mapobjects_push
**Name**: Post-flight MapObjects push with durable queue and crash-recovery push
**Description**: On `mission_executor` terminal state, drain `mapobjects_store`'s pending diff and POST to `/missions/{id}/mapobjects` + `/missions/{id}/mapobjects/ignored`. Independent retry per endpoint. Persist pending diff on disk for 24 h durable retry. At startup, replay any non-empty pending diff from a previously terminated mission BEFORE BIT for any new mission begins.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-644_mission_client_pull_and_schema, AZ-646_mission_client_mapobjects_pull
**Component**: mission_client
**Tracker**: AZ-647
**Epic**: AZ-638
## Problem
The full pass diff (NEW / MOVED / EXISTING / REMOVED-candidate observations + IgnoredItem appends) must reach the central API after the mission ends. In-flight central writes are forbidden (Frozen choice 6 — `architecture.md §7.3`). The post-flight push must survive transient failure (independent retry per endpoint), persistent failure (operator-visible warning + manual replay), and crash mid-mission (next-boot push of pending diff). The durable queue is the disk-backed safety net.
## Outcome
- `MissionClient::push_mapobjects_diff(mission_id, diff) -> PushReport` posts the observations and ignored-items independently; partial success does not roll back the successful endpoint.
- The pending diff is persisted on disk at `${state_dir}/mapobjects_push/<mission_id>.json` BEFORE the push starts (write-ahead).
- Per-endpoint bounded exponential backoff (24 h durable retry window; configurable).
- Persistent failure: `sync_state = degraded`; operator-visible warning; entry stays on disk for manual replay.
- At startup, if `${state_dir}/mapobjects_push/` has any non-empty file, run the push for those missions BEFORE BIT for any new mission begins (crash-recovery path).
## Scope
### Included
- Two POST endpoints, called independently with separate retry/backoff state.
- Write-ahead persistence of the pending diff before the network call.
- Crash-recovery sweep at startup.
- `PushReport { observations: PerEndpointStatus, ignored: PerEndpointStatus }`.
- Health surface: `mapobjects_push_pending`, `last_push_ts`, per-endpoint last error.
### Excluded
- Building the pending diff (`mapobjects_store` — task 28 owns `pending_observations` + `pending_ignored`).
- Choosing what's a terminal state (`mission_executor`).
- Operator UI for the manual-replay warning (`operator_bridge` / Ground Station).
## Acceptance Criteria
**AC-1: Happy path push**
Given the mission ended with N observations and M ignored items
When `push_mapobjects_diff("M1", diff)` is called and both endpoints return 200
Then both succeed, the disk file is cleared, and `sync_state = synced`.
**AC-2: Partial success — independent retry**
Given `/mapobjects` returns 200 and `/mapobjects/ignored` returns 503
When the push runs
Then the observations endpoint is reported success, the ignored endpoint is queued for retry, and the disk file retains ONLY the ignored portion.
**AC-3: Persistent failure persists for manual replay**
Given both endpoints return 503 for all 24 h of bounded retry
When the retry window closes
Then `sync_state = degraded`, the disk file remains intact, and a manual-replay warning is observable in `health()`.
**AC-4: Crash-recovery push at startup**
Given a previous run terminated with a non-empty disk file at `${state_dir}/mapobjects_push/M0.json`
When the process starts a new run for mission `M1`
Then the push for `M0` is attempted before BIT begins for `M1`; the order is observable via logs.
**AC-5: 60-min mission push within budget**
Given a fixture pass diff sized for a 60-min mission
When the push is performed on a 100 Mbps loopback link
Then both endpoints complete in ≤2 min.
## Non-Functional Requirements
**Performance**
- ≤2 min for a 60-min mission's pass diff (per `description.md §8`).
**Reliability**
- 24 h durable retry window.
- Crash-mid-mission: nothing is lost on disk.
## Contract
- MapObjects POST schemas: `shared/contracts/mapobjects-observations.json` and `shared/contracts/mapobjects-ignored.json`. Owner: `../_docs/02_missions.md` §7.13 extension.
- Canonical typed model: `data_model.md §MapObjectObservation`, `§IgnoredItem`.
## Runtime Completeness
- **Named capability**: durable on-disk queue + post-flight push to the central `missions` API.
- **Production code that must exist**: real disk write-ahead (atomic rename); real HTTPS POST; real backoff state machine; real crash-recovery sweep.
- **Allowed external stubs**: `wiremock`/`mockito` for tests; `tempfile` for the disk-queue tests.
- **Unacceptable substitutes**: an in-memory-only queue is not acceptable (crash recovery requires disk).
@@ -0,0 +1,82 @@
# Mission Executor State Machine (Both Variants)
**Task**: AZ-648_mission_executor_state_machine
**Name**: Variant-aware mission state machine
**Description**: Typed state machine for both multirotor and fixed-wing variants. Transitions are explicit and fully enumerated; bounded retry per transition with explicit max-retry. No infinite retry. State is in-process only; restart re-runs from `DISCONNECTED`.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-641_mavlink_transport_and_heartbeat, AZ-642_mavlink_codec, AZ-643_mavlink_ack_demux_and_signing
**Component**: mission_executor
**Tracker**: AZ-648
**Epic**: AZ-636
## Problem
`mission_executor` drives the airframe through a typed state machine. The flow differs per variant (multirotor vs fixed-wing); both variants share the same transition discipline and observability surface. Every transition has a bounded retry budget — on cap exhaustion health flips to red and the failure surfaces via `operator_bridge`. **No infinite retry** is permitted (per `architecture.md §5`).
## Outcome
- A typed `MissionState` enum encodes:
- Multirotor: `DISCONNECTED → CONNECTED → HEALTH_OK → BIT_OK → ARMED → TAKE_OFF → MISSION_UPLOADED → FLY_MISSION → LAND → POST_FLIGHT_SYNC → DONE`.
- Fixed-wing: `DISCONNECTED → CONNECTED → HEALTH_OK → BIT_OK → MISSION_UPLOADED → WAIT_AUTO → FLY_MISSION → LAND → POST_FLIGHT_SYNC → DONE`.
- `MissionExecutor::tick(now, telemetry)` advances the state machine; each transition is gated by an explicit guard.
- Per-transition retry counter + last-failure reason; on cap exhaustion the machine pauses and health → red.
- Health surface: current state, `state_duration_ms`, `transition_failures_by_state`, retry counts.
## Scope
### Included
- Both variant state graphs.
- Bounded retry per transition (configurable; default 3 attempts).
- `Variant` enum (`Multirotor`, `FixedWing`) wired from startup config.
- State-transition events published on an output channel for `scan_controller` and `telemetry_stream`.
- Mission re-upload sequence (`MISSION_CLEAR_ALL` → upload waypoints → `MISSION_SET_CURRENT`) — invoked from `MISSION_UPLOADED` entry guards.
### Excluded
- BIT (F9) — separate task 11.
- Lost-link failsafe ladder (F10) — separate task 12.
- Geofence + battery enforcement — separate task 13.
- Middle-waypoint re-upload — separate task 13 (logic) but exercised here for the base mission upload.
- Post-flight push trigger — separate task 13.
## Acceptance Criteria
**AC-1: Happy-path multirotor flow against SITL**
Given a multirotor SITL + `mavlink_layer` healthy + a valid in-memory mission
When `mission_executor::run()` is started
Then it reaches `DONE` traversing the multirotor state graph; transitions are observable as events; mission progress reaches all waypoints.
**AC-2: Happy-path fixed-wing flow against SITL**
Given a fixed-wing SITL + the operator's GCS sets AUTO mode externally
When `mission_executor::run()` is started
Then it traverses the fixed-wing graph (no `ARMED`/`TAKE_OFF`; `WAIT_AUTO` waits for the AUTO transition) and reaches `DONE`.
**AC-3: Bounded retry on mission-upload rejection**
Given SITL is configured to reject `MISSION_ACK` for the first attempt and accept the second
When the executor reaches `MISSION_UPLOADED`
Then the retry counter increments to 1, the second attempt succeeds, and the machine proceeds.
**AC-4: Cap exhaustion flips health to red**
Given SITL is configured to reject `MISSION_ACK` for all 3 default attempts
When the executor reaches `MISSION_UPLOADED`
Then the machine pauses, health → red, and the failure is observable on the output channel; no transition past `MISSION_UPLOADED`.
## Non-Functional Requirements
**Performance**
- Mission-upload retry budget: configurable; default 3 attempts.
- State-machine tick: ≤10 ms p99.
**Reliability**
- No infinite retry anywhere.
## Constraints
- `mavlink_layer::send_command` is the only path to the airframe.
- Variant is fixed at startup; no runtime swap.
## Runtime Completeness
- **Named capability**: variant-aware state machine + mission upload via MAVLink.
- **Production code that must exist**: explicit transition guards; real retry counters; real mission-upload sequence.
- **Allowed external stubs**: ArduPilot SITL is the conformance target (both `arducopter` and `arduplane`).
- **Unacceptable substitutes**: a generic "if-else cascade" instead of typed state transitions is not acceptable.
@@ -0,0 +1,65 @@
# Telemetry Forwarding from Mission Executor
**Task**: AZ-649_mission_executor_telemetry_forwarding
**Name**: Telemetry forwarding to scan, movement, telemetry, BIT input
**Description**: Forward decoded MAVLink telemetry (position, attitude, mode, sys-status) from `mavlink_layer` to `scan_controller` (proximity + middle-waypoint computation), `movement_detector` (ego-motion compensation), and `telemetry_stream` (operator overlay). Provide a typed `UavTelemetry` snapshot for BIT consumption.
**Complexity**: 2 points
**Dependencies**: AZ-640_initial_structure, AZ-648_mission_executor_state_machine
**Component**: mission_executor
**Tracker**: AZ-649
**Epic**: AZ-636
## Problem
`mission_executor` is the only component subscribed to the raw decoded MAVLink stream — it owns the airframe relationship. Downstream components (`scan_controller`, `movement_detector`, `telemetry_stream`) and the BIT path need the same telemetry, but in a typed, projection-friendly form (`UavTelemetry { position, attitude, mode, sys_status, monotonic_ts }`). Forwarding must not duplicate decode work and must not drop messages silently.
## Outcome
- `UavTelemetry` is published on three lossy broadcast channels (one per downstream consumer) with monotonic timestamps; consumers that fall behind get drops counted, not blocking.
- `UavTelemetrySnapshot` (latest-state view) is exposed for BIT and health-check consumers.
- Health surface: `last_telemetry_ts`, per-consumer drop counters.
## Scope
### Included
- Subscribe to the typed `MavlinkMessage` enum from `mavlink_layer`.
- Project to `UavTelemetry` (`data_model.md §UavTelemetry`).
- Publish on three Tokio broadcast channels.
- Maintain an atomic latest-snapshot for synchronous reads.
### Excluded
- Decoding MAVLink (task 03).
- Geofence/battery checks (task 13).
- BIT logic (task 11).
## Acceptance Criteria
**AC-1: Telemetry reaches all three consumers**
Given a healthy SITL link
When `GLOBAL_POSITION_INT` and `ATTITUDE` arrive at 10 Hz
Then `UavTelemetry` is observed at ≥10 Hz on all three downstream channels, with monotonic timestamps.
**AC-2: Slow consumer drops, fast consumers unaffected**
Given a slow consumer that yields every 500 ms while telemetry arrives at 10 Hz
When the channels back-pressure
Then the slow consumer's drop counter increments while the other two channels deliver every frame.
**AC-3: Latest-snapshot is monotonic**
Given a sequence of telemetry messages with monotonically advancing timestamps
When `latest_snapshot()` is read concurrently
Then every read returns a snapshot whose `monotonic_ts` is `>=` the previously observed value.
## Non-Functional Requirements
**Performance**
- Telemetry republish adds ≤2 ms to the MAVLink decode-to-consumer path.
**Reliability**
- Slow consumer never blocks fast consumers (lossy broadcast).
- Drops are counted, never silent.
## Runtime Completeness
- **Named capability**: typed telemetry fan-out to three concurrent consumers.
- **Production code that must exist**: real Tokio broadcast or equivalent; real atomic snapshot.
- **Unacceptable substitutes**: blocking single-consumer queue is not acceptable (it would gate the slowest downstream).
@@ -0,0 +1,69 @@
# Pre-Flight BIT (F9)
**Task**: AZ-650_mission_executor_bit_f9
**Name**: Built-In Test gate before ARMED/WAIT_AUTO
**Description**: Pre-flight Built-In Test (F9). Gates the transition to `ARMED` (multirotor) or `WAIT_AUTO` (fixed-wing). Covers every dependency in `architecture.md §5` plus mission load + MapObjects pre-flight pull (cached fallback acknowledged) + persistent-store free space + wall-clock binding. On FAIL no transition. On DEGRADED, surface to operator for signed acknowledgement (per Q9).
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-648_mission_executor_state_machine, AZ-649_mission_executor_telemetry_forwarding, AZ-644_mission_client_pull_and_schema, AZ-646_mission_client_mapobjects_pull
**Component**: mission_executor
**Tracker**: AZ-650
**Epic**: AZ-636
## Problem
The airframe must not be armed until every load-bearing dependency is verified healthy and every load-bearing input has been ingested. The BIT is the deliberate gate that captures `architecture.md §5` "BIT mandatory" + `system-flows.md §F9`. On FAIL the executor MUST refuse to transition past `BIT_OK`. On DEGRADED the executor surfaces a signed acknowledgement requirement to the operator (per Q9) and only proceeds when ack is observed.
## Outcome
- `Bit::evaluate(env) -> BitReport { items: Vec<BitItem { name, status: Pass | Degraded | Fail, detail }> }` returns a structured report.
- BIT items cover (at minimum): `mavlink_link`, `gimbal_link`, `camera_rtsp`, `detection_grpc`, `movement_telemetry_sync_ready`, `mapobjects_synced_or_cached_acked`, `mission_loaded`, `state_dir_free_space`, `wall_clock_bound`, `tier2_session_ready` (if enabled), `vlm_session_ready` (if enabled), `operator_bridge_session`.
- On `Fail` for any item, the state machine does NOT transition past `BIT_OK`; the report surfaces via `operator_bridge`.
- On `Degraded` items, the state machine waits for a signed `BitDegradedAck` from `operator_bridge` (matching the report id); on ack, proceeds; on timeout (configurable; default 5 min), surfaces failure.
## Scope
### Included
- BIT item evaluators (one per item).
- Report aggregation + status fusion.
- Signed `BitDegradedAck` handling (the auth check itself lives in `operator_bridge` — this task only consumes the validated event).
- Timeout for ack.
### Excluded
- BIT UI / operator overlay (Ground Station + `operator_bridge`).
- Operator-command auth validation (lives in `operator_bridge` — task 41).
## Acceptance Criteria
**AC-1: All-pass BIT proceeds**
Given every dependency is healthy
When the executor reaches `HEALTH_OK` and runs BIT
Then `BitReport.overall = Pass`, the machine transitions to `BIT_OK`, and proceeds to `ARMED` (multirotor) or `MISSION_UPLOADED` (fixed-wing).
**AC-2: Fail blocks transition**
Given `camera_rtsp` reports `Fail`
When BIT runs
Then `BitReport.overall = Fail`, the machine stays at `HEALTH_OK`, and the report is observable via `operator_bridge`.
**AC-3: Degraded requires signed ack**
Given `mapobjects_synced_or_cached_acked` reports `Degraded` (cached fallback)
When BIT runs
Then the executor waits; only after a signed `BitDegradedAck` matching the report id does the machine transition to `BIT_OK`.
**AC-4: Degraded ack timeout fails the BIT**
Given a Degraded report with no ack within the configured timeout (default 5 min)
When the timeout fires
Then `BitReport.overall = Fail`, the machine stays at `HEALTH_OK`, and the timeout is observable.
## Non-Functional Requirements
**Performance**
- BIT evaluation completes in ≤2 s when all dependencies are healthy.
**Reliability**
- No silent FAIL; every item's status is observable.
## Runtime Completeness
- **Named capability**: F9 BIT — production gate before arming.
- **Production code that must exist**: real evaluators that read live health from each dependency; real signed-ack consumption path.
- **Unacceptable substitutes**: a hardcoded "BIT always passes" path in production is unacceptable.
@@ -0,0 +1,72 @@
# Lost-Link Failsafe Ladder (F10)
**Task**: AZ-651_mission_executor_lost_link_ladder
**Name**: Lost-link ladder LinkOk → LinkDegraded → LinkLost → LinkLostInFollow
**Description**: Per-tick evaluation of the operator/Ground-Station modem link state. Default RTL after 30 s grace. Configurable. MAVLink-link loss to ArduPilot itself is a separate, more severe event — health → red, airframe failsafe takes over (we do NOT override it).
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-648_mission_executor_state_machine, AZ-649_mission_executor_telemetry_forwarding
**Component**: mission_executor
**Tracker**: AZ-651
**Epic**: AZ-636
## Problem
The operator's modem link is critical to safe operation but inherently flaky. The failsafe must escalate predictably from `LinkOk` to `LinkDegraded` (530 s) to `LinkLost` (>30 s) to `LinkLostInFollow` (special-cased target-follow case) — each step with a defined behaviour. Default action on `LinkLost` is RTL after a grace window. Crucially, MAVLink-link loss to ArduPilot is a different event — autopilot does NOT override the airframe's built-in failsafe in that case.
## Outcome
- `LostLinkLadder::tick(now, link_state)` updates an enum `LadderState ∈ {LinkOk, LinkDegraded, LinkLost, LinkLostInFollow}` deterministically based on the elapsed time since the last operator-link heartbeat.
- `LinkDegraded` for 530 s: health → yellow; events queued; no command to airframe.
- `LinkLost` for >30 s (configurable): trigger RTL via `mavlink_layer`; transition to `LAND`.
- `LinkLostInFollow` (active `TargetFollow` + >30 s): 30 s grace, then RTL.
- MAVLink-link loss to ArduPilot: detected via `mavlink_layer`'s `LinkLost`; health → red; do NOT issue RTL (airframe handles it).
- Health surface: current `LadderState`, time-in-state, RTL trigger count.
## Scope
### Included
- Ladder state machine.
- Subscribe to operator-link state from `telemetry_stream` (forwarded by `operator_bridge` health).
- Subscribe to MAVLink-link state from `mavlink_layer`.
- Configurable thresholds (defaults: degraded=5 s, lost=30 s, follow-grace=30 s).
- RTL command issuance via `mavlink_layer::send_command(MAV_CMD_NAV_RETURN_TO_LAUNCH)`.
### Excluded
- Operator command auth checks (`operator_bridge`).
- Target-follow state ownership (`scan_controller`).
## Acceptance Criteria
**AC-1: Operator-link degraded then recovers**
Given a healthy link
When the operator-link heartbeat stops for 10 s and resumes
Then the ladder reports `LinkOk → LinkDegraded → LinkOk` with correct dwell times; no RTL is issued.
**AC-2: Operator-link lost triggers RTL**
Given a healthy link
When the operator-link heartbeat stops for 31 s
Then the ladder reports `LinkLost`, `send_command(MAV_CMD_NAV_RETURN_TO_LAUNCH)` is issued exactly once, and the state machine transitions to `LAND`.
**AC-3: Lost-in-follow grace then RTL**
Given the system is in `TargetFollow` and the operator-link drops
When the link is down for 30 s (grace), then continues to be down past the grace
Then RTL is triggered after the grace fires, not earlier.
**AC-4: MAVLink loss does NOT trigger autopilot-side RTL**
Given the MAVLink link to ArduPilot is lost (`mavlink_layer` reports `LinkLost`)
When the ladder tick runs
Then health → red, no `MAV_CMD_NAV_RETURN_TO_LAUNCH` is issued by autopilot (airframe failsafe owns the response), and the event is observable.
## Non-Functional Requirements
**Performance**
- Ladder tick: ≤5 ms.
**Reliability**
- All thresholds configurable; no hardcoded defaults beyond the defaults documented above.
## Runtime Completeness
- **Named capability**: F10 lost-link failsafe ladder.
- **Production code that must exist**: real state machine; real RTL command issuance.
- **Unacceptable substitutes**: omitting the `LinkLostInFollow` grace is not acceptable (an operator may have momentary glitches mid-follow).
@@ -0,0 +1,92 @@
# Geofence + Battery Enforcement + Middle-Waypoint Re-Upload + Post-Flight Trigger
**Task**: AZ-652_mission_executor_safety_and_resume
**Name**: Geofence + battery thresholds + middle-waypoint re-upload + post-flight push trigger
**Description**: Continuous safety enforcement (INCLUSION + EXCLUSION geofences honoured equally; battery thresholds with operator override). Mission re-upload on middle-waypoint hint. Mission revert on target-follow ending. Trigger post-flight MapObjects push on `POST_FLIGHT_SYNC` entry.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-648_mission_executor_state_machine, AZ-649_mission_executor_telemetry_forwarding, AZ-643_mavlink_ack_demux_and_signing, AZ-647_mission_client_mapobjects_push
**Component**: mission_executor
**Tracker**: AZ-652
**Epic**: AZ-636
## Problem
The state machine alone is not enough — three continuous concerns must run on every tick:
1. **Geofence enforcement**: both INCLUSION and EXCLUSION violations trigger RTL. The earlier C++ behaviour silently ignored EXCLUSION; the new design rejects that.
2. **Battery / fuel thresholds**: RTL at `battery ≤ rtl_threshold` (default 25 %); land-now at `battery ≤ hard_floor` (default 15 %); operator override only via signed command.
3. **Middle-waypoint re-upload + target-follow revert**: on operator confirm, recompute and re-upload (`MISSION_CLEAR_ALL` → re-upload → `MISSION_SET_CURRENT(0)`); on target-follow ending, recompute and re-upload the original mission from current position.
Plus the post-flight trigger: on `POST_FLIGHT_SYNC` entry, hand off to `mission_client::push_mapobjects_diff`.
## Outcome
- `GeofenceMonitor::tick(uav_telemetry, mission_geofences)` triggers RTL on INCLUSION exit or EXCLUSION entry within ≤500 ms; alert is observable.
- `BatteryMonitor::tick(sys_status, ext_sys_state)` triggers RTL at `≤rtl_threshold`, land-now at `≤hard_floor`; signed operator-override is honoured and audit-logged.
- `MissionRePlanner::on_middle_waypoint(hint)` computes the patched mission and issues the re-upload sequence; result is observable.
- `MissionRePlanner::on_target_follow_release(reason)` recomputes the original mission from the current position and re-uploads.
- On entry to `POST_FLIGHT_SYNC`, the executor calls `mission_client::push_mapobjects_diff(mission_id, diff)`; result is logged; the machine still reaches `DONE` even on push failure (push surface manual-replay warning).
## Scope
### Included
- Continuous geofence check using `geo` crate or equivalent (point-in-polygon).
- Continuous battery check using `SYS_STATUS` + `EXTENDED_SYS_STATE`.
- Re-upload sequence helpers.
- Post-flight push trigger.
### Excluded
- Middle-waypoint computation algorithm (`scan_controller` provides the hint with `target_mgrs` + `target_class`; the executor only handles re-upload mechanics).
- Operator signature validation (`operator_bridge`).
- The actual push (`mission_client` task 08).
- The audit log persistence layer (lives in `shared::audit`).
## Acceptance Criteria
**AC-1: INCLUSION geofence exit triggers RTL**
Given a multirotor flying inside an INCLUSION polygon
When the UAV position crosses outside the polygon
Then RTL is triggered within ≤500 ms; the alert is observable; the state machine transitions to `LAND`.
**AC-2: EXCLUSION geofence entry triggers RTL**
Given a multirotor flying outside an EXCLUSION polygon
When the UAV position crosses into the polygon
Then RTL is triggered within ≤500 ms (parity with INCLUSION); the alert is observable.
**AC-3: Battery thresholds**
Given a multirotor flying with battery at 30 %
When `SYS_STATUS` reports battery at 24 %
Then RTL is triggered; transition to `LAND`.
When (in a separate scenario) `SYS_STATUS` drops below 15 %
Then `MAV_CMD_NAV_LAND` is issued (land-now); health → red.
**AC-4: Signed operator override of battery RTL**
Given the battery monitor would otherwise RTL at 24 %
When a signed `BatteryOverride { until_ts }` is received from `operator_bridge`
Then RTL is suppressed until `until_ts`; the override is recorded with operator id + rationale in the audit log.
**AC-5: Middle-waypoint re-upload sequence**
Given a confirmed POI yields a middle-waypoint hint
When `on_middle_waypoint` is invoked
Then the sequence `MISSION_CLEAR_ALL` → upload all waypoints → `MISSION_SET_CURRENT(0)` is issued in order, completing in ≤2 s end-to-end.
**AC-6: Post-flight push trigger**
Given the executor enters `POST_FLIGHT_SYNC`
When the entry guard runs
Then `mission_client::push_mapobjects_diff(mission_id, diff)` is called exactly once; the executor reaches `DONE` regardless of push success.
## Non-Functional Requirements
**Performance**
- Geofence response time: ≤500 ms from violation detection to RTL command.
- Middle-waypoint re-upload: ≤2 s end-to-end.
**Reliability**
- Both geofence variants enforced; symmetric behaviour.
- No infinite retry on re-upload — bounded by the executor's transition-retry budget.
## Runtime Completeness
- **Named capability**: geofence enforcement (both variants) + battery thresholds + re-upload sequence + post-flight push trigger.
- **Production code that must exist**: real point-in-polygon; real `SYS_STATUS` decode; real `MAV_CMD_*` issuance.
- **Unacceptable substitutes**: ignoring EXCLUSION (the pre-existing C++ bug) is unacceptable; ignoring battery overrides without signed proof is unacceptable.
@@ -0,0 +1,79 @@
# ViewPro A40 Vendor Transport
**Task**: AZ-653_gimbal_a40_transport
**Name**: ViewPro A40 vendor protocol UDP transport
**Description**: UDP transport, frame encode/decode, CRC16 (vendor spec), bounded retry on command timeout. Surface vendor faults to health.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure
**Component**: gimbal_controller
**Tracker**: AZ-653
**Epic**: AZ-634
## Problem
The gimbal is a ViewPro A40 vendor product reachable over UDP using a vendor-specified frame format with CRC16. The transport layer must encode and decode every command/response frame this codebase issues (yaw, pitch, zoom, feedback request, mode commands), validate CRC on inbound frames, and re-issue on timeout with bounded retry. The vendor protocol is fixed by the camera — the device's binary protocol is a `restrictions.md` constraint, not a design choice.
## Outcome
- `A40Transport::send(cmd) -> Result<A40Response, A40Error>` writes a CRC-correct vendor frame to the configured UDP endpoint and awaits the matching response within a deadline.
- Inbound frames are CRC-validated; mismatches are dropped and counted as `vendor_faults_total{kind="crc"}`.
- Bounded retry on timeout (default 3 attempts; configurable).
- Health surface: `commands_per_min`, `vendor_faults_total`, `last_command_in_flight`.
## Scope
### Included
- UDP socket (single endpoint).
- CRC16 (vendor polynomial) encode/decode helpers.
- Frame encoders for yaw / pitch / zoom commands + feedback request.
- Frame decoders for yaw / pitch / zoom feedback + vendor fault frames.
- Bounded retry on timeout.
### Excluded
- Sweep pattern primitive (task 15).
- Smooth-pan plan execution (task 16).
- Centre-on-target primitive (task 17).
- Vendor protocol *specification* — assumed to be reverse-engineered or vendor-supplied separately; this task implements against the documented frame layout in `misc/camera/a8/` (which is the predecessor model A8; A40 differs in command codes per architecture.md).
## Acceptance Criteria
**AC-1: CRC round-trip**
Given the encoder produces a yaw command frame for `yaw = 30°`
When the same frame is fed back through the decoder
Then the decoded command matches and `vendor_faults_total{kind="crc"} = 0`.
**AC-2: CRC mismatch counted**
Given an inbound frame with corrupted CRC
When the decoder consumes it
Then the frame is dropped and `vendor_faults_total{kind="crc"}` increments by 1.
**AC-3: Command timeout retries**
Given a fake A40 endpoint that drops the first command silently
When `send(yaw_cmd)` is called with default 3 attempts
Then the call succeeds on retry; `vendor_faults_total{kind="timeout"}` reports 1.
**AC-4: Cap exhaustion returns explicit error**
Given the endpoint never responds
When `send(yaw_cmd)` is called
Then after 3 attempts the call returns `Err(MaxRetriesExceeded)` and the error surfaces to the caller.
## Non-Functional Requirements
**Performance**
- Single command round-trip: ≤200 ms on a healthy link (well under the ≤500 ms decision-to-movement budget).
**Reliability**
- CRC mismatches counted, never silent.
- Bounded retry; no infinite retry.
## Constraints
- Vendor protocol is fixed; no negotiation.
- One A40 per autopilot instance.
## Runtime Completeness
- **Named capability**: ViewPro A40 vendor protocol on UDP.
- **Production code that must exist**: real CRC16; real UDP socket; real per-command encoder/decoder.
- **Allowed external stubs**: in tests, a UDP echo with vendor-frame replay can simulate the camera.
- **Unacceptable substitutes**: a generic "send raw bytes and assume success" path is unacceptable — the protocol's frame format and CRC are non-negotiable.
@@ -0,0 +1,64 @@
# Zoom-Out Sweep Pattern
**Task**: AZ-654_gimbal_zoom_out_sweep
**Name**: Zoom-out sweep pattern primitive
**Description**: Run the zoom-out sweep pattern when `scan_controller` is in `ZoomedOut`. The exact pattern (pendulum / raster / lawn-mower) is gated by `architecture.md §8 Q1`; this task implements one selectable default with the pattern enum and exposes the choice through config.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-653_gimbal_a40_transport
**Component**: gimbal_controller
**Tracker**: AZ-654
**Epic**: AZ-634
## Problem
In `ZoomedOut`, the gimbal must sweep its FOV continuously to maximise coverage. The exact pattern is an open architecture question (Q1); this task implements the `SweepEngine` abstraction and ships `Pendulum` as the safe default, with `Raster` and `LawnMower` enum variants reserved. Switching pattern is config-only — no API change to consumers.
## Outcome
- `SweepEngine::next_step(state) -> GimbalCommand` produces a sequence of yaw / pitch / zoom commands implementing the configured sweep pattern with bounded jitter and no overshoot beyond configured FOV bounds.
- Default pattern is `Pendulum`; `Raster` and `LawnMower` are wired as enum variants (one implemented; the others reserved).
- Sweep config (FOV per zoom tier, dwell time per direction, step size) is loaded from startup config.
## Scope
### Included
- `SweepPattern` enum with all three variants declared; default impl for `Pendulum`.
- `SweepEngine` struct holding the current direction + dwell counter.
- Bounded-jitter command emission.
- FOV-bound enforcement.
### Excluded
- The pattern selection rationale (Q1 — resolved separately).
- Smooth-pan plan execution (task 16).
- Centre-on-target (task 17).
## Acceptance Criteria
**AC-1: Pendulum sweep emits a bounded-jitter command stream**
Given `SweepEngine::new(SweepPattern::Pendulum, config)`
When `next_step()` is called 100 times
Then the yaw values stay within `[config.min_yaw, config.max_yaw]`, never overshoot, and reverse direction at each bound.
**AC-2: Dwell at bounds is respected**
Given a config with `dwell_ms = 500`
When the sweep reaches a yaw bound
Then `next_step()` returns the same yaw for at least 500 ms before reversing direction.
**AC-3: Pattern enum exhaustiveness**
Given the `SweepPattern` enum
When match-exhausting it in client code
Then the compiler covers `Pendulum`, `Raster`, `LawnMower` — unimplemented variants return `Err(NotImplemented)` at runtime, never silently fall back.
## Non-Functional Requirements
**Performance**
- `next_step()` p99 ≤1 ms.
**Reliability**
- Bounded jitter; no overshoot.
## Runtime Completeness
- **Named capability**: zoom-out sweep pattern (default `Pendulum`).
- **Production code that must exist**: real bounded sweep state machine.
- **Unacceptable substitutes**: random walk is not acceptable — sweep coverage must be deterministic and bounded.
@@ -0,0 +1,64 @@
# Smooth-Pan Path-Tracking Plan Executor
**Task**: AZ-655_gimbal_smooth_pan_plan
**Name**: Smooth-pan plan executor (zoom-in path-follow)
**Description**: Accept a pan plan (sequence of yaw / pitch / zoom goals with timing) from `semantic_analyzer` via `scan_controller` and execute it smoothly. Used for follow-the-footpath behaviour during the zoom-in level.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-653_gimbal_a40_transport
**Component**: gimbal_controller
**Tracker**: AZ-655
**Epic**: AZ-634
## Problem
When `scan_controller` is in `ZoomedIn` and `semantic_analyzer` recommends `PanFollowFootpath`, a sequence of yaw/pitch/zoom goals with timing arrives. The executor must interpolate between goals smoothly (no step jumps) and respect the vendor's command rate — if the plan is too dense, drop the lowest-priority goals rather than blocking the queue.
## Outcome
- `PlanExecutor::load(plan: PanPlan)` accepts an ordered sequence `Vec<(yaw, pitch, zoom, at_ts)>`.
- `next_step(now)` returns the interpolated `GimbalCommand` to issue at `now`; goals past their `at_ts` are skipped; goals before `at_ts` are extrapolated linearly.
- The executor self-throttles: emits at most one command per `min_cmd_interval_ms` (default 50 ms), dropping intermediate interpolations.
- Health: `plan_loaded_at`, `commands_emitted_total`, `commands_dropped_to_throttle_total`.
## Scope
### Included
- `PanPlan` data type (`data_model.md §PanPlan`).
- Linear interpolation between adjacent goals.
- Self-throttling.
### Excluded
- Generating the plan (`semantic_analyzer`).
- Sweep pattern (task 15).
- Centre-on-target (task 17).
## Acceptance Criteria
**AC-1: Linear interpolation between goals**
Given a plan with two goals 1 s apart and yaw `0° → 30°`
When `next_step(now=500ms)` is called
Then the returned `yaw` is `15°` ± a defined epsilon.
**AC-2: Self-throttle drops intermediate calls**
Given `min_cmd_interval_ms = 100`
When `next_step()` is called every 10 ms for 1 s
Then exactly ~10 commands are emitted (the rest counted as throttled).
**AC-3: Plan past its end clamps to last goal**
Given a plan whose last `at_ts` is in the past
When `next_step(now)` is called
Then the returned command equals the last goal's `(yaw, pitch, zoom)`; no error.
## Non-Functional Requirements
**Performance**
- `next_step()` p99 ≤1 ms.
**Reliability**
- Throttle drops are counted, never silent.
## Runtime Completeness
- **Named capability**: smooth-pan plan execution + interpolation.
- **Production code that must exist**: real interpolation; real self-throttle.
- **Unacceptable substitutes**: dispatching every plan goal directly without interpolation/throttling is not acceptable (causes jerky panning).
@@ -0,0 +1,64 @@
# Centre-On-Target Primitive + GimbalState Publish
**Task**: AZ-656_gimbal_centre_on_target
**Name**: Centre-on-target primitive + timestamped GimbalState publish
**Description**: During `TargetFollow`, accept a centre-on-target stream (target bbox normalized) from `scan_controller` and command the gimbal to keep the target inside the centre 25 % of frame while visible. Stamp every emitted command + reported state with a monotonic timestamp so `movement_detector` can synchronise.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-653_gimbal_a40_transport
**Component**: gimbal_controller
**Tracker**: AZ-656
**Epic**: AZ-634
## Problem
During target-follow, the gimbal must continuously re-aim to keep the target inside the centre 25 % of frame. The control loop must converge without overshoot, and every emitted command + every reported `GimbalState` must carry a monotonic timestamp so `movement_detector` can synchronise gimbal motion with the per-frame ego-motion estimate.
## Outcome
- `CentreOnTarget::tick(bbox_normalized, current_state) -> GimbalCommand` produces the yaw/pitch command needed to nudge the target toward frame centre; convergence within ≤3 ticks under nominal latency.
- Reported `GimbalState { yaw, pitch, zoom, ts_monotonic, command_in_flight }` is published on the state channel for `frame_ingest` (telemetry tagging) and `movement_detector` (ego-motion sync) consumption.
- If the target bbox is missing for 3 consecutive ticks, emit a `target_lost` signal to `scan_controller`.
## Scope
### Included
- Centre-25% control loop (proportional, configurable gain).
- Monotonic timestamp stamping (single source of truth: `Instant::now()` at emit point).
- `GimbalState` publisher.
- `target_lost` signal on 3 consecutive missing bboxes.
### Excluded
- Target-follow state ownership (`scan_controller`).
- Sweep (task 15) and pan plan (task 16).
## Acceptance Criteria
**AC-1: Centre convergence**
Given a target initially at bbox `(0.7, 0.5, 0.1, 0.1)` (right side of frame) and a healthy A40
When `tick()` is invoked over 3 cycles at 100 ms each
Then by the third cycle the target bbox centre is within the centre 25 % region.
**AC-2: GimbalState carries monotonic timestamp**
Given a sequence of `tick()` calls
When the resulting `GimbalState` is observed
Then `ts_monotonic` is strictly monotonically increasing across observations.
**AC-3: Target loss signals after 3 missing ticks**
Given the target bbox stream goes empty
When 3 consecutive ticks have no bbox
Then a `target_lost` signal is published exactly once; subsequent ticks do not re-emit.
## Non-Functional Requirements
**Performance**
- `tick()` p99 ≤2 ms.
- Centre convergence within ≤3 ticks at 10 Hz.
**Reliability**
- `target_lost` debounced — never spurious.
## Runtime Completeness
- **Named capability**: target-follow centre-25% loop + timestamped GimbalState publish.
- **Production code that must exist**: real control loop; real monotonic timestamping.
- **Unacceptable substitutes**: open-loop "send target position once" is not acceptable — the loop must close.
@@ -0,0 +1,71 @@
# RTSP Session + Reconnect + AI-Lock Signal
**Task**: AZ-657_frame_ingest_rtsp_session
**Name**: RTSP session lifecycle + bounded reconnect + AI-lock plumb-through
**Description**: Open the RTSP session to the ViewPro A40, recover from transient connection loss with bounded exponential backoff (1 s → 30 s cap), and plumb through the `bringCameraDown`/`bringCameraUp` AI-lock signal so downstream consumers can skip detection.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure
**Component**: frame_ingest
**Tracker**: AZ-657
**Epic**: AZ-627
## Problem
The RTSP session is the foundation of the perception pipeline. It must (a) open against the camera at startup, (b) recover from drops with bounded backoff (no infinite retry), and (c) carry the `ai_locked` flag through to every emitted `Frame` so that downstream consumers (`detection_client`, `movement_detector`) know to skip detection while the local supervisor is asserting an RC-takeover lock.
## Outcome
- `RtspSession::open(config) -> Result<Self, OpenError>` opens with TCP or UDP transport per camera config.
- On stream loss the session reopens with exponential backoff `1 s → 2 s → 4 s ...` capped at 30 s.
- A subscription to `bringCameraDown` / `bringCameraUp` toggles `ai_locked` on every subsequently emitted frame.
- Health surface: `reopens_total`, `last_frame_age_ms`, `session_state ∈ {closed, connecting, streaming, failing}`, `ai_locked`.
- Camera output-format mismatch (unexpected SPS/PPS) hard-fails at session open with an explicit error; never silently picks a wrong decode path.
## Scope
### Included
- RTSP client (FFmpeg / GStreamer binding or pure-Rust client — pick what `shared` pins).
- Backoff state machine.
- AI-lock signal source subscription (the supervisor channel is implementation-defined; the local supervisor signals over a unix-domain socket per `architecture.md`).
- Session state surface.
### Excluded
- Frame decoding (task 19).
- Multi-consumer publisher (task 20).
## Acceptance Criteria
**AC-1: Open against ViewPro A40 (fixture)**
Given a fixture RTSP server (e.g. `MediaMTX`) replaying a sample stream
When `RtspSession::open(...)` is called
Then it returns `Ok` within ≤2 s and `session_state = "streaming"`.
**AC-2: Reconnect on drop**
Given a healthy session for 5 s
When the fixture RTSP server is killed and restarted
Then the session reopens within ≤5 s and `reopens_total` increments by 1.
**AC-3: SPS/PPS mismatch hard-fails**
Given a fixture stream that announces an unsupported codec profile
When `RtspSession::open(...)` is called
Then it returns `Err(UnsupportedProfile { details })`; no silent decode-path selection.
**AC-4: AI-lock toggles ai_locked flag**
Given a healthy session emitting frames
When `bringCameraDown` is asserted
Then subsequent emitted frames have `ai_locked = true`; when `bringCameraUp` is asserted, they revert to `false`.
## Non-Functional Requirements
**Performance**
- Reconnect latency: ≤5 s from camera availability (per `description.md §8`).
**Reliability**
- Bounded backoff cap configurable; no infinite retry.
## Runtime Completeness
- **Named capability**: RTSP transport against ViewPro A40 + AI-lock signal plumb.
- **Production code that must exist**: real RTSP session; real AI-lock subscription.
- **Allowed external stubs**: `MediaMTX` or `live555-test` as fixture in dev/CI.
- **Unacceptable substitutes**: bypassing AI-lock entirely is unacceptable — it is a safety boundary.
@@ -0,0 +1,72 @@
# Frame Decoder (NVDEC + Software Fallback)
**Task**: AZ-658_frame_ingest_decoder
**Name**: H.264/265 decoder (NVDEC primary, software fallback) + monotonic timestamps
**Description**: Decode H.264/265 to raw frames using NVDEC on Jetson Orin Nano, with software fallback. Stamp each frame with a monotonic capture timestamp + sequence number at the earliest practical point in the pipeline.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-657_frame_ingest_rtsp_session
**Component**: frame_ingest
**Tracker**: AZ-658
**Epic**: AZ-627
## Problem
Every frame downstream needs a monotonic capture timestamp so `movement_detector` can detect telemetry skew. Decoding must use the hardware decoder (NVDEC on Jetson) where present and fall back to software otherwise, without changing the emitted `Frame` shape. Decode errors on a single frame must be dropped (counted), not abort the stream — cold-start latency is observable once but not an alert by itself.
## Outcome
- `FrameDecoder::decode(packet) -> Result<Frame, DecodeError>` emits a `Frame { seq, capture_ts_monotonic, decode_ts_monotonic, pixels: Arc<Bytes>, width, height, pix_fmt, ai_locked }`.
- NVDEC code path is used when available; software fallback otherwise (selection is automatic and observable in health).
- Single-frame errors are dropped and counted as `decode_errors_total`; the stream is never aborted on a single frame.
- Cold-start latency (first-frame decode time) is surfaced as `decode_ms_first_frame` once per session open.
- Health surface: `decode_ms_p50`, `decode_ms_p99`, `decoder_backend ∈ {NVDEC, Software}`, `decode_errors_total`.
## Scope
### Included
- NVDEC binding (via Jetson Multimedia API or GStreamer `nvv4l2decoder`).
- Software decoder fallback (FFmpeg `libavcodec`).
- Monotonic timestamping at the earliest point in the decode pipeline.
- Sequence-number generation (monotonic u64 per session).
- Single-frame error handling.
### Excluded
- RTSP session lifecycle (task 18).
- Multi-consumer publisher (task 20).
## Acceptance Criteria
**AC-1: Software-path decode of a sample stream**
Given a sample H.264 RTSP stream at 1080p / 30 fps and a host without NVDEC
When the decoder runs for 10 s
Then ≥285 frames are emitted; `decoder_backend = "Software"`; sequence numbers are strictly monotonic.
**AC-2: NVDEC-path selection on Jetson**
Given the host has NVDEC available
When the decoder is initialized
Then `decoder_backend = "NVDEC"`; functional correctness is identical to software path.
**AC-3: Single-frame decode error does not abort the stream**
Given the input contains one corrupted frame
When the decoder runs
Then that single frame is dropped, `decode_errors_total` increments by 1, and subsequent frames continue to be emitted.
**AC-4: Monotonic timestamps**
Given a sequence of decoded frames
When their `capture_ts_monotonic` is read
Then values are strictly monotonically increasing.
## Non-Functional Requirements
**Performance**
- End-to-end RTSP-rx → publish ≤30 ms p99 on Jetson Orin Nano (per `description.md §8`); decoder portion of that budget ≤20 ms p99.
**Reliability**
- Single-frame errors do not abort the stream.
- Cold-start latency surfaced once; not an alert.
## Runtime Completeness
- **Named capability**: H.264/265 decode (NVDEC primary, software fallback) — production decode path required.
- **Production code that must exist**: real NVDEC binding; real software fallback; real monotonic timestamping.
- **Unacceptable substitutes**: software-only decode on Jetson is acceptable as fallback but the NVDEC code path MUST exist (otherwise the latency target cannot be met).
@@ -0,0 +1,64 @@
# Multi-Consumer Frame Publisher + Back-Pressure Drops
**Task**: AZ-659_frame_ingest_publisher
**Name**: Tokio broadcast publisher + per-consumer drop counters + zero-copy `Arc<Bytes>`
**Description**: Publish `Frame`s through a single multi-consumer channel using `Arc<Bytes>` for pixel data so consumers do not copy. Drop frames when downstream consumers fall behind beyond a configured queue depth; record per-consumer drop counters with reason tags.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-657_frame_ingest_rtsp_session, AZ-658_frame_ingest_decoder
**Component**: frame_ingest
**Tracker**: AZ-659
**Epic**: AZ-627
## Problem
Three downstream consumers (`detection_client`, `movement_detector`, `telemetry_stream`) all need the same frames at the same rate. A single-consumer queue would serialise the slowest; a per-consumer fan-out with cloned pixel buffers would multiply memory. The right structure is a Tokio `broadcast` channel (or equivalent) carrying `Arc<Bytes>` so pixels are shared by reference. Slow consumers drop their oldest frame, with the drop counted (and reason-tagged) — never silently coalesced.
## Outcome
- `FramePublisher::subscribe() -> FrameReceiver` returns a per-consumer receiver.
- `Frame` carries `Arc<Bytes>` for `pixels` so consumers do not copy.
- When a consumer falls behind beyond `channel_depth` (configurable, default 4), the oldest frame is dropped for THAT consumer; per-consumer counters increment with reason tag (`{detection_client_slow, movement_detector_slow, telemetry_slow}`).
- Health surface: per-consumer drop counters, total publish count.
## Scope
### Included
- `tokio::sync::broadcast` (or equivalent) with `Arc<Bytes>` payload.
- Per-consumer drop counter (statically known three consumer ids; future-extensible).
- Channel-depth config.
### Excluded
- RTSP session (task 18).
- Decoder (task 19).
## Acceptance Criteria
**AC-1: Three consumers receive every frame at nominal rate**
Given three subscribers consuming at 30 fps and source at 30 fps
When the publisher runs for 10 s
Then each consumer observes ~300 frames; per-consumer drop counters = 0.
**AC-2: Slow consumer drops, fast consumers unaffected**
Given a slow consumer that yields every 200 ms while source is 30 fps and `channel_depth = 4`
When the publisher runs for 5 s
Then the slow consumer's drop counter increments and fast consumers continue to receive every frame.
**AC-3: Zero-copy under load**
Given a publisher emitting at 30 fps for 60 s with three subscribers
When peak memory is sampled
Then memory does not scale linearly with consumer count (i.e. `Arc<Bytes>` is correctly shared).
## Non-Functional Requirements
**Performance**
- Publish-to-consumer p99 ≤5 ms (helps keep total RTSP-rx-to-publish under the 30 ms p99 budget).
**Reliability**
- Drops are counted with reason; never silent.
- No unbounded memory growth on slow consumer.
## Runtime Completeness
- **Named capability**: lossy multi-consumer frame fan-out with `Arc<Bytes>`.
- **Production code that must exist**: real broadcast channel; real per-consumer drop accounting.
- **Unacceptable substitutes**: cloning pixel buffers per consumer is unacceptable (multiplies memory); blocking the publisher on a slow consumer is unacceptable (gates the whole pipeline).
@@ -0,0 +1,77 @@
# Detection gRPC Bi-Directional Stream + Frame Budgeting
**Task**: AZ-660_detection_client_grpc_stream
**Name**: Bi-directional gRPC stream to ../detections + drop-oldest frame budgeting
**Description**: Single bi-directional gRPC stream to the external `../detections` service. Reconnect on stream loss with bounded exponential backoff. Frame budgeting: drop older in-flight frames if a new frame arrives before the previous response, respecting the Tier-1 ≤100 ms/frame target.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-659_frame_ingest_publisher
**Component**: detection_client
**Tracker**: AZ-660
**Epic**: AZ-628
## Problem
`detection_client` is the only autopilot component talking to `../detections`. The contract is a bi-directional gRPC stream; the client must maintain it (reconnect with bounded backoff), respect the Tier-1 latency target by NOT queueing frames indefinitely (drop-oldest in-flight when a newer frame arrives), and never block the upstream `frame_ingest` publisher.
## Outcome
- `DetectionClient::run(frame_rx)` maintains one bi-directional gRPC stream to `../detections`; reconnect on stream loss with exponential backoff capped at 30 s.
- Outbound: send each `Frame` (skipping `ai_locked` ones) up to `max_concurrent_in_flight` (default 2); drop older in-flight frames when the budget is full and a new frame arrives (logged as `budget_drop`).
- Inbound: receive `DetectionBatch` and publish on the output channel; tag with the source frame's `monotonic_ts`.
- Health surface: `gRPC_connection_state`, `requests_in_flight`, `latency_p50/p99`, `errors_by_kind`, `budget_drops_total`.
## Scope
### Included
- `tonic` (or equivalent) gRPC client + bi-directional streaming.
- Reconnect state machine.
- In-flight tracker (sliding window of `frame_seq`).
- Drop-oldest budgeting.
### Excluded
- Schema validation + model_version handling (task 22).
- The `../detections` service itself (separate repo).
## Acceptance Criteria
**AC-1: Happy path against fixture**
Given a fixture gRPC server that returns a `DetectionBatch` per request within 50 ms
When `DetectionClient::run` is started against a 30 fps frame source for 10 s
Then ≥285 `DetectionBatch` are observed on the output channel; latency_p99 ≤100 ms; budget_drops_total = 0.
**AC-2: Reconnect after server restart**
Given a healthy stream
When the gRPC server is killed and restarted
Then the client reconnects within ≤2 s; subsequent frames flow through.
**AC-3: Budget drop on slow server**
Given the server takes 200 ms per response and the source is 30 fps
When the client runs for 5 s
Then `budget_drops_total > 0`, frames continue to flow, and the publisher is never blocked.
**AC-4: ai_locked frames are skipped**
Given a frame stream where every 5th frame has `ai_locked = true`
When the client runs
Then no requests are sent for `ai_locked` frames (observable via outgoing count).
## Non-Functional Requirements
**Performance**
- Per-frame round-trip ≤100 ms p99 (Tier-1 NFR; mostly owned by `../detections`).
- Reconnect latency: ≤2 s after `../detections` returns.
**Reliability**
- Drop-oldest never queues indefinitely.
- Reconnect is bounded.
## Contract
- gRPC service contract owner: `../_docs/03_detections.md`.
- Canonical typed model: `data_model.md §Detection`, `§DetectionBatch`.
## Runtime Completeness
- **Named capability**: bi-directional gRPC stream against `../detections`.
- **Production code that must exist**: real `tonic` (or equivalent) bi-directional stream; real budgeting.
- **Allowed external stubs**: a fixture gRPC server in tests; the real `../detections` for integration.
- **Unacceptable substitutes**: a unary call-per-frame instead of streaming is unacceptable (multiplies per-request overhead).
@@ -0,0 +1,62 @@
# Detection Schema Validation + Model-Version + Health
**Task**: AZ-661_detection_client_schema_and_health
**Name**: Response schema validation + model_version tracking + Tier-1 health degradation signal
**Description**: Validate every `DetectionBatch` response against the schema version the client was built against. Surface a hard error on schema mismatch (never silent downcast). Track `model_version`; on change, surface to `scan_controller` so per-class thresholds can be reloaded. Track sliding-window latency; on `latency_p99 > 100 ms` flip health → yellow so `scan_controller` can degrade to alternate-frame inference.
**Complexity**: 2 points
**Dependencies**: AZ-640_initial_structure, AZ-660_detection_client_grpc_stream
**Component**: detection_client
**Tracker**: AZ-661
**Epic**: AZ-628
## Problem
Schema drift between `../detections` and autopilot must be caught loudly — not silently downcast. The model version can change at runtime (model swap); when it does, the per-class confidence thresholds may need to be reloaded by `scan_controller`. The Tier-1 latency target (≤100 ms) is mostly owned by `../detections` but autopilot must observe drift and surface health degradation so the scan controller can take action.
## Outcome
- Every response is validated against the bundled schema; on mismatch, returns a hard error to the output channel and health → red.
- `last_model_version` is tracked; on change, a `ModelVersionChanged(new_version)` event is emitted on the output channel.
- A sliding-window latency tracker (e.g. last 1 min) emits a `Tier1Degraded { reason: HighLatency }` event when `latency_p99 > 100 ms`.
## Scope
### Included
- Schema validation hook on every response.
- `model_version` tracker.
- Sliding-window latency tracker + degradation signal.
### Excluded
- The reaction to `Tier1Degraded` (lives in `scan_controller`).
- The schema definition itself (lives in the contract).
## Acceptance Criteria
**AC-1: Schema mismatch surfaces as hard error**
Given the fixture server returns a `DetectionBatch` with an unknown field type
When the client validates the response
Then a hard error is emitted on the output channel and `errors_by_kind{kind="schema_mismatch"}` increments by 1.
**AC-2: Model version change is signalled**
Given the server reports `model_version = "v1.2"` on initial stream open
When a subsequent response reports `model_version = "v1.3"`
Then exactly one `ModelVersionChanged("v1.3")` event is emitted.
**AC-3: Latency degradation signal**
Given the server's response latency rises to 150 ms p99 over a 1-min window
When the latency tracker evaluates
Then `Tier1Degraded { reason: HighLatency }` is emitted exactly once until latency falls back below 100 ms.
## Non-Functional Requirements
**Performance**
- Validation overhead: ≤1 ms per response.
**Reliability**
- Schema mismatches never silent.
## Runtime Completeness
- **Named capability**: response schema validation + model-version awareness + latency-degradation signal.
- **Production code that must exist**: real schema validation; real model-version tracker; real percentile tracker.
- **Unacceptable substitutes**: silently downcasting an unknown response shape is unacceptable.
@@ -0,0 +1,64 @@
# Ego-Motion Estimator + Telemetry Sync Gate
**Task**: AZ-662_movement_detector_ego_motion
**Name**: OpenCV optical-flow / global-motion estimator + telemetry-skew gate
**Description**: Compute per-frame ego-motion using OpenCV (LucasKanade optical flow or feature-based homography), refined by the synchronised gimbal + UAV telemetry. Drop frames whose telemetry skew exceeds the per-zoom-band tolerance; never silent.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-659_frame_ingest_publisher, AZ-656_gimbal_centre_on_target, AZ-649_mission_executor_telemetry_forwarding
**Component**: movement_detector
**Tracker**: AZ-662
**Epic**: AZ-629
## Problem
Naive frame differencing is rejected — the UAV and gimbal are moving, so most pixel motion is ego-motion. The estimator must (a) recover camera motion from the frame stream and (b) cross-check against telemetry (gimbal + UAV) within a per-zoom-band skew tolerance. Frames whose telemetry skew exceeds the tolerance MUST be dropped (with a counter), never silently consumed — otherwise the compensation is wrong and false positives flood the operator.
## Outcome
- `EgoMotionEstimator::estimate(frame, gimbal_state, uav_telemetry) -> Result<EgoMotion, SkewExceeded>` returns the per-frame ego-motion vector (or homography) refined by telemetry, OR rejects the frame as skewed.
- Per-zoom-band tolerance from config (defaults per `description.md §5`): zoom-out 50 ms frame↔gimbal / 100 ms frame↔UAV; zoom-in 25 ms / 50 ms.
- Health surface: `telemetry_skew_drops_total`, `optical_flow_degenerate_total`, `current_zoom_band`.
## Scope
### Included
- OpenCV bindings (Rust crate `opencv`).
- Optical-flow primary path (dense LucasKanade or feature-based homography — `opencv::video::CalcOpticalFlow*` or `opencv::calib3d::findHomography`).
- Telemetry-skew gate per zoom band.
- Compensation output (the residual-pixel-motion field; downstream task 24 clusters it).
### Excluded
- Cluster persistence + candidate emission (task 24).
- Q14 fallback (task 25).
## Acceptance Criteria
**AC-1: Synthetic pure-pan: residual ≈ 0**
Given a synthetic frame pair where the camera panned by `dx` and the entire scene is static
When `estimate(frame, gimbal_state, uav_telemetry)` runs
Then the returned ego-motion captures `dx` and the residual motion field is ≈ 0 within epsilon.
**AC-2: Telemetry skew above zoom-out tolerance is dropped**
Given a frame whose gimbal-telemetry timestamp differs by 200 ms while `zoom_band = zoomed_out` (tolerance 50 ms)
When `estimate(...)` is called
Then it returns `Err(SkewExceeded)` and `telemetry_skew_drops_total{band="zoomed_out"}` increments by 1.
**AC-3: Optical-flow degenerate is observable**
Given a fully-saturated white frame
When `estimate(...)` runs
Then it returns `Err(OpticalFlowDegenerate)` and `optical_flow_degenerate_total` increments by 1.
## Non-Functional Requirements
**Performance**
- Per-frame ego-motion estimation: ≤30 ms p99 on Jetson Orin Nano (must coexist with Tier 1 + Tier 2 — per `description.md §9`).
**Reliability**
- Drops never silent.
## Runtime Completeness
- **Named capability**: ego-motion estimation using real OpenCV; telemetry-skew gating.
- **Production code that must exist**: real OpenCV optical-flow / homography path; real synchronisation logic.
- **Allowed external stubs**: synthetic frame pairs in tests; pinned `opencv` Rust crate in CI.
- **Unacceptable substitutes**: a fake/stub estimator that always returns "no motion" is unacceptable in production (would mask real movement candidates).
@@ -0,0 +1,79 @@
# Cluster Persistence + Candidate Emission
**Task**: AZ-663_movement_detector_clustering_and_emission
**Name**: Residual-motion clustering + per-zoom-band persistence + candidate emission with source_zoom_band
**Description**: Subtract estimated ego-motion from per-pixel motion; cluster residuals; emit clusters meeting per-zoom-band minimum size + persistence threshold as `MovementCandidate`s. Self-disable in `TargetFollow` (consume frames to keep history warm; emit nothing).
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-662_movement_detector_ego_motion
**Component**: movement_detector
**Tracker**: AZ-663
**Epic**: AZ-629
## Problem
Once ego-motion is compensated, the remaining residual pixel motion is the candidate signal. Residuals must be clustered (connected components or DBSCAN-like spatial cluster) and tracked across frames; only clusters that persist for the per-zoom-band threshold count as candidates. Single-frame noise blips MUST NOT surface to the operator.
The candidate emission also carries `source_zoom_band` (`zoomed_out | zoomed_in`) so `scan_controller` can apply zoom-band-aware queueing logic.
The component must self-disable when `scan_controller` is in `TargetFollow` — emit zero candidates but keep consuming frames so the motion-history buffer stays warm for the next state transition.
## Outcome
- `MovementClusterer::ingest(frame, residual_motion)` updates per-cluster persistence counters per zoom band.
- A cluster meeting `min_size_px` + `min_persistence_frames` emits a `MovementCandidate { frame_seq, bbox_normalized, residual_velocity_estimate, telemetry_quality, source_frame_ts, source_zoom_band }`.
- Per-zoom-band knobs (defaults per `description.md §5`):
- zoom-out: persistence 35 frames; residual-velocity floor low.
- zoom-in: persistence 610 frames; residual-velocity floor higher.
- Active-state hint `disable` (during `TargetFollow`) suppresses emission but keeps history.
- Health surface: `candidates_per_min_zoomed_out`, `candidates_per_min_zoomed_in`, `current_zoom_band`, `compensation_quality_per_band`.
## Scope
### Included
- Connected-component (or spatial cluster) extraction over the residual motion field.
- Per-cluster persistence tracker, per zoom band.
- Per-band motion-history buffer (a few seconds of frames + residuals; one per zoom band).
- Candidate emission with full metadata.
- Active-state hint handling.
### Excluded
- Ego-motion estimation (task 23).
- Q14 fallback (task 25).
- POI queue ordering (`scan_controller`).
## Acceptance Criteria
**AC-1: Single-frame blip is suppressed**
Given a single isolated 5×5 px residual motion blip in one frame at zoom-out
When the clusterer runs over 30 frames
Then no `MovementCandidate` is emitted (below `min_persistence_frames = 3`).
**AC-2: Persistent moving target emits a candidate**
Given a 20×20 px residual cluster persisting across 5 consecutive frames at zoom-out
When the clusterer runs
Then exactly one `MovementCandidate` is emitted with `source_zoom_band = "zoomed_out"` and `bbox_normalized` localised around the cluster centre.
**AC-3: Zoom-in stricter threshold**
Given the same persistent cluster but at zoom-in with `min_persistence_frames = 8`
When the clusterer runs for only 5 frames
Then no candidate is emitted; 9th frame onwards emits one.
**AC-4: TargetFollow suppresses emission, keeps history warm**
Given the active-state hint is `disable`
When 30 frames with persistent clusters arrive
Then zero candidates are emitted; `compensation_quality_per_band` is still updated; when `disable` is lifted, the next persistent cluster is emitted on the SAME zoom band's threshold (history is warm).
## Non-Functional Requirements
**Performance**
- Per-frame clustering + emission: ≤20 ms p99.
- Candidate enqueue latency: zoom-out ≤1 s, zoom-in ≤1.5 s (per `description.md §9`).
**Reliability**
- Single-frame blips never surface as candidates.
## Runtime Completeness
- **Named capability**: persistent-cluster candidate detection with per-zoom-band tuning.
- **Production code that must exist**: real residual clustering; real per-band persistence tracker.
- **Unacceptable substitutes**: emitting every residual blip without persistence gating is unacceptable (operator would be flooded).
@@ -0,0 +1,70 @@
# FP Cap Monitor + Q14 Fallback Hook
**Task**: AZ-664_movement_detector_fp_cap_and_q14_fallback
**Name**: FP cap monitor + Q14 fallback module hook
**Description**: Monitor per-zoom-band candidate flood. If sustained candidates_per_min exceeds the configured cap, suppress that band's emission (zoom-in only at first; zoom-out down-ranks lowest-confidence). Q14 fallback engages a learned-CV module behind a build-time feature flag — wire the `EgoMotionProvider` trait + a stub fallback impl that returns `not_engaged`; the real ML module is a follow-up if the benchmark gate triggers Q14.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-662_movement_detector_ego_motion, AZ-663_movement_detector_clustering_and_emission
**Component**: movement_detector
**Tracker**: AZ-664
**Epic**: AZ-629
## Problem
If classical OpenCV optical flow fails to meet the per-zoom-band FP cap at zoom-in (Q14 trigger), the system must degrade safely: suppress zoom-in emission, keep zoom-out running, and engage the (optional) learned-CV fallback module if compiled in. The fallback's interface contract is fixed (`Frame + telemetry → Vec<MovementCandidate>`); the impl is a separate engineering effort gated on benchmark-gate results.
This task delivers the trait, the FP-cap monitor, the suppression behaviour, and a stub fallback impl. The real learned-CV impl is out of scope here (separate Q14 follow-up if and when the benchmark gate fires).
## Outcome
- `FpCapMonitor::tick(per_band_rate)` flags when `candidates_per_min_zoomed_in > cap`; suppresses zoom-in emission for the duration of the breach + a configurable hysteresis.
- Zoom-out FP-cap breach down-ranks lowest-confidence candidates rather than suppressing entirely (zoom-out is the only source for far-field threats).
- `EgoMotionProvider` trait with the fixed contract; default `OpenCvEgoMotion` impl wraps task 23; `LearnedCvFallback` stub returns `not_engaged`.
- Build-time feature flag `learned_cv_fallback` reserves the slot; if off, the build is identical and the stub is the only provider.
## Scope
### Included
- `EgoMotionProvider` trait (re-exported from `shared::contracts`).
- `FpCapMonitor` with sliding-window per-band rate + hysteresis.
- Zoom-in suppression behaviour.
- Zoom-out down-rank behaviour.
- Stub `LearnedCvFallback` returning `not_engaged`.
### Excluded
- Real learned-CV implementation (Q14 follow-up, gated on benchmark results).
- Benchmark-gate orchestration (out of scope; manual decision based on benchmark data).
## Acceptance Criteria
**AC-1: Zoom-in suppression on flood**
Given `candidates_per_min_zoomed_in = 20` over 60 s while cap is 10
When the FP-cap monitor evaluates
Then zoom-in emission is suppressed and `health → yellow`; when rate falls below cap + hysteresis, emission resumes.
**AC-2: Zoom-out down-ranks instead of suppressing**
Given a similar zoom-out flood
When the monitor evaluates
Then no emission is suppressed; instead, the lowest-confidence candidates are down-ranked (counted as `down_ranked_total`).
**AC-3: Feature-flag absence does not break build**
Given the binary is built WITHOUT the `learned_cv_fallback` feature
When the build runs
Then the binary builds cleanly and `EgoMotionProvider` is satisfied by `OpenCvEgoMotion` exclusively.
**AC-4: Stub fallback returns not_engaged**
Given the `learned_cv_fallback` feature IS enabled and the stub is registered
When `LearnedCvFallback::estimate(...)` is called
Then it returns `Status::NotEngaged` immediately; no real ML is run.
## Non-Functional Requirements
**Reliability**
- FP-cap monitor never spuriously toggles (hysteresis required).
## Runtime Completeness
- **Named capability**: FP-cap monitor + Q14 fallback trait wiring. Real learned-CV impl is explicitly out of scope here.
- **Production code that must exist**: real FP-cap monitor + real suppression logic + real trait.
- **Allowed external stubs**: `LearnedCvFallback` is a stub by design until benchmark-gate triggers Q14.
- **Unacceptable substitutes**: silently dropping zoom-in candidates without an observable signal is unacceptable.
@@ -0,0 +1,81 @@
# H3 Indexing + Classify
**Task**: AZ-665_mapobjects_store_h3_classify
**Name**: H3 indexing + k-ring classify(detection) → new/moved/existing
**Description**: Compute H3 cell for each detection at the configured resolution (default 10, ~15 m edge). Maintain in-memory `(H3_cell + class) → MapObject` hashmap. Answer `classify(detection)` using k-ring (k=2 default) lookup against `(distance_threshold_m, move_threshold_m, similar_classes)` config.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure
**Component**: mapobjects_store
**Tracker**: AZ-665
**Epic**: AZ-633
## Problem
The H3 spatial index is the foundation of new-vs-existing detection (`architecture.md §7.12`). Each detection's MGRS position is converted to an H3 cell at the configured resolution; the composite key `(H3_cell, class)` keys an in-memory map of known MapObjects. Classification answers `new | moved | existing` by querying the k-ring of cells (boundary correctness) and computing distance against move thresholds.
## Outcome
- `H3Index::cell_of(mgrs, resolution) -> H3Cell`.
- `MapObjectsStore::classify(detection) -> MapObjectClassification ∈ {New, Moved { from_mgrs, to_mgrs }, Existing { existing_id }}`.
- k-ring lookup (default k=2) over the in-memory hashmap.
- `distance_threshold_m` (default 30 m), `move_threshold_m` (default 50 m), `similar_classes` (configured set per `data_model.md §IgnoredItem` class groups) read from config.
- O(1) classify p99 ≤1 ms.
## Scope
### Included
- H3 binding (Rust crate `h3o` or equivalent).
- `MapObjectsStore` struct + in-memory hashmap.
- `classify` API.
- Config-driven thresholds.
### Excluded
- IgnoredItem suppression (task 27).
- Pre-flight hydrate + sync_state machine (task 28).
- Persistence (task 29).
- End-of-pass removed-candidate sweep (task 27).
## Acceptance Criteria
**AC-1: New detection at unseen MGRS**
Given an empty store
When `classify(detection_at_M1, class=A)` is called
Then it returns `Classification::New`.
**AC-2: Existing detection at known MGRS within threshold**
Given the store has a MapObject at `M1, class=A`
When `classify(detection_at_M1+5m, class=A)` is called and `distance_threshold_m = 30`
Then it returns `Classification::Existing { existing_id: ... }`.
**AC-3: Moved detection beyond move threshold**
Given the store has a MapObject at `M1, class=A`
When `classify(detection_at_M1+60m, class=A)` is called and `move_threshold_m = 50`
Then it returns `Classification::Moved { from_mgrs: M1, to_mgrs: M1+60m }`.
**AC-4: k-ring boundary lookup**
Given the store has a MapObject in cell `C1`
When a new detection falls in cell `C2` (boundary cell of `C1`)
Then with k=2 the lookup finds `C1` and returns `Existing` (not `New`).
**AC-5: Classify p99 ≤1 ms**
Given a store warmed with 10 000 MapObjects
When `classify` is called 1 000 times
Then p99 latency is ≤1 ms.
## Non-Functional Requirements
**Performance**
- O(1) classify p99 ≤1 ms (per `description.md §9`).
**Reliability**
- k-ring boundary correctness guaranteed by default config.
## Contract
- Canonical typed model: `data_model.md §MapObject`, `§MapObjectClassification`.
## Runtime Completeness
- **Named capability**: H3 spatial index + k-ring queries — production new/moved/existing dispatch.
- **Production code that must exist**: real H3 crate; real k-ring lookup.
- **Unacceptable substitutes**: Euclidean-distance-only naive search is unacceptable for production (loses boundary correctness and O(1) latency).
@@ -0,0 +1,64 @@
# IgnoredItem Set + End-of-Pass Sweep
**Task**: AZ-666_mapobjects_store_ignored_and_pass_sweep
**Name**: IgnoredItem set + end-of-pass removed-candidate sweep
**Description**: `IgnoredItem` set keyed by `(MGRS, class_group)`. `is_ignored(MGRS, class_group)` suppression query. End-of-pass sweep: after a region's pass ends, return objects in the region that were not re-observed as `removed_candidate`s.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-665_mapobjects_store_h3_classify
**Component**: mapobjects_store
**Tracker**: AZ-666
**Epic**: AZ-633
## Problem
When the operator declines a POI, the (MGRS, class_group) pair is added to the `IgnoredItem` set; subsequent detections matching the pair are suppressed BEFORE they reach the queue. Separately, when a scan pass over a region ends (signal from `scan_controller` / `mission_executor`), MapObjects that were known in the region but NOT re-observed during the pass should be flagged `removed_candidate` — the operator (not the system) decides actual removal.
## Outcome
- `IgnoredSet::append(item: IgnoredItem)` stores the entry.
- `is_ignored(mgrs, class_group) -> bool` answers in O(1).
- `MapObjectsStore::end_of_pass(region_bbox) -> Vec<RemovedCandidate>` returns objects in the region that were NOT re-observed since the pass started.
- Per-region pass tracker (start_ts, observed_ids) maintained.
## Scope
### Included
- `IgnoredSet` using a `HashSet<(H3Cell, ClassGroup)>` keyed structure.
- Class-group resolution (read group from config; e.g. `military_vehicle_group`, `concealed_position_group`, `movement_candidate`).
- Per-region pass tracker.
- End-of-pass sweep query.
### Excluded
- H3 classify (task 26).
- Pre-flight hydrate (task 28).
- Persistence (task 29).
- Append to `pending_observations` / `pending_ignored` (task 28).
## Acceptance Criteria
**AC-1: Ignored item suppresses subsequent detections**
Given `append(IgnoredItem { mgrs: M1, class_group: G })`
When `is_ignored(M1, G)` is called
Then it returns `true`; calls for other pairs return `false`.
**AC-2: End-of-pass returns un-observed objects**
Given a store with MapObjects at `M1, M2, M3` in region `R`
When the pass starts at `t0`, only `M1` is re-observed, and `end_of_pass(R)` is called at `t1`
Then it returns `[M2, M3]` as `RemovedCandidate`s.
**AC-3: End-of-pass excludes ignored**
Given `M2` was un-observed AND `is_ignored(M2.mgrs, M2.class_group) == true`
When `end_of_pass(R)` is called
Then `M2` is NOT in the returned list (ignored objects are not surfaced as removed-candidates).
## Non-Functional Requirements
**Performance**
- `is_ignored` p99 ≤1 ms.
- `end_of_pass` p99 ≤50 ms for a 30 km × 30 km region with ≤1 000 known objects.
## Runtime Completeness
- **Named capability**: IgnoredItem suppression + end-of-pass sweep.
- **Production code that must exist**: real HashSet + real per-region pass tracker.
- **Unacceptable substitutes**: re-querying the store for every detection without an `IgnoredSet` cache is unacceptable (latency violation).
@@ -0,0 +1,80 @@
# Pre-Flight Hydrate + Sync State Machine + Pending Logs
**Task**: AZ-667_mapobjects_store_hydrate_and_pending
**Name**: Pre-flight hydrate from MapObjectsBundle + sync_state machine + pending_observations/pending_ignored append logs
**Description**: Hydrate the store from a `MapObjectsBundle` (from `mission_client`'s pull). Maintain a `sync_state` enum (`synced | cached_fallback | degraded | failed`). Append every NEW / MOVED / EXISTING / REMOVED-CANDIDATE / IgnoredItem event to `pending_observations` / `pending_ignored` for the post-flight push.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-665_mapobjects_store_h3_classify, AZ-666_mapobjects_store_ignored_and_pass_sweep
**Component**: mapobjects_store
**Tracker**: AZ-667
**Epic**: AZ-633
## Problem
The on-device working copy is hydrated pre-flight from the central API. The sync_state machine (`fresh_boot → synced | cached_fallback | degraded`) tracks the relationship to the central source of truth. During flight, every classification event is appended to `pending_observations` (or, for declines, `pending_ignored`) — central writes are forbidden mid-flight (Frozen choice 6). The pending logs feed the post-flight push.
## Outcome
- `hydrate(bundle: MapObjectsBundle) -> Result<()>` loads the bundle into the in-memory hashmap + IgnoredSet; sets `sync_state = synced` (or `cached_fallback` if `bundle.fallback_used`).
- `on_classify_result(classification, detection)` appends a `MapObjectObservation` to `pending_observations` for NEW / MOVED / EXISTING / REMOVED-CANDIDATE.
- `on_decline(ignored_item)` appends to `pending_ignored`.
- `drain_pending() -> (Vec<MapObjectObservation>, Vec<IgnoredItem>)` is called by `mission_client::push_mapobjects_diff` post-flight.
- Health surface: `sync_state`, `pending_observations_count`, `pending_ignored_count`, `last_pull_ts`, `last_push_ts`.
- On `DELETE /missions/{id}` cascade signal from `mission_client`, drop mission-scoped objects.
## Scope
### Included
- `MapObjectsBundle` hydration (model = `data_model.md §MapObjectsBundle`).
- Sync-state enum + transitions.
- Append-only `pending_observations` + `pending_ignored` logs (in-memory; durable disk handoff lives in `mission_client` task 08).
- Drain API.
- Mission-cascade handler.
### Excluded
- H3 classify (task 26).
- Disk persistence (task 29) — this task keeps pending in memory + lets `mission_client` task 08 handle disk durability.
- Post-flight push (lives in `mission_client` task 08).
## Acceptance Criteria
**AC-1: Hydrate from bundle**
Given a `MapObjectsBundle` with N MapObjects and M IgnoredItems
When `hydrate(bundle)` is called
Then the store contains all N + M entries and `sync_state = "synced"`.
**AC-2: Fallback bundle sets cached_fallback**
Given a bundle with `fallback_used = true`
When `hydrate(bundle)` is called
Then `sync_state = "cached_fallback"`.
**AC-3: Classify appends pending observation**
Given the store hydrated and a detection that classifies as `New`
When `on_classify_result(New, detection)` is called
Then `pending_observations_count` increments by 1.
**AC-4: Drain returns and clears pending**
Given pending_observations_count = 5, pending_ignored_count = 2
When `drain_pending()` is called
Then it returns 5 observations + 2 ignored items; counts return to 0.
**AC-5: Cascade drops mission-scoped objects**
Given `M1` (mission A) and `M2` (mission B) objects in the store
When the cascade signal for mission A arrives
Then `M1` is dropped; `M2` remains.
## Non-Functional Requirements
**Performance**
- Hydrate from a 30 km × 30 km bundle: ≤2 s (peer of pre-flight pull's 30 s budget).
- Append per classification: ≤100 µs.
## Contract
- Canonical typed model: `data_model.md §MapObjectsBundle`, `§MapObjectObservation`.
## Runtime Completeness
- **Named capability**: hydrate + sync_state + pending event logs.
- **Production code that must exist**: real hydrate; real pending append; real drain.
- **Unacceptable substitutes**: central writes mid-flight are forbidden (Frozen choice 6).
@@ -0,0 +1,76 @@
# Persistence — In-Memory + JSON Snapshot (Q3 Default)
**Task**: AZ-668_mapobjects_store_persistence
**Name**: In-memory + JSON snapshot persistence (default per Q3)
**Description**: Crash-recovery and post-flight upload durability for the in-memory MapObjects state. Default engine: in-memory + atomic JSON snapshot to `${state_dir}/mapobjects/<mission_id>.json` per checkpoint. Q3 reserves the slot for SQLite+H3 / KV alternatives.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-665_mapobjects_store_h3_classify, AZ-667_mapobjects_store_hydrate_and_pending
**Component**: mapobjects_store
**Tracker**: AZ-668
**Epic**: AZ-633
## Problem
The in-memory hashmap is authoritative for the active mission, but a crash mid-mission must not lose the pending diff. The persistence engine choice is Q3 (open); the default is in-memory + JSON snapshot (atomic rename), which keeps the engine choice cleanly behind a `MapObjectsPersistence` trait so SQLite+H3 or RocksDB can swap in later without touching call sites.
## Outcome
- `MapObjectsPersistence` trait with `save_snapshot(state) -> Result<()>` and `load_snapshot(path) -> Result<State>`.
- `JsonSnapshotEngine` impl that writes to `${state_dir}/mapobjects/<mission_id>.json` via atomic rename (write to `.tmp` then rename).
- Snapshot cadence: configurable; default every 30 s OR on every N pending-observation appends, whichever first.
- Crash recovery: at startup, load the most recent snapshot for any mission that did not reach `POST_FLIGHT_SYNC`.
- Health surface: `last_snapshot_ts`, `snapshot_size_bytes`, `snapshot_errors_total`.
- Persistence corruption on startup: refuse to start with stale state; surface explicit error to the operator.
## Scope
### Included
- `MapObjectsPersistence` trait.
- `JsonSnapshotEngine` (default impl).
- Atomic rename pattern.
- Crash-recovery load.
- Snapshot cadence policy.
### Excluded
- SQLite+H3 alternative (Q3 follow-up if chosen later).
- KV alternative (Q3 follow-up).
- The post-flight push itself (`mission_client` task 08).
## Acceptance Criteria
**AC-1: Snapshot + reload round-trip**
Given a store with 100 MapObjects + 10 IgnoredItems + 5 pending observations
When `save_snapshot()` writes to disk and a fresh process calls `load_snapshot()`
Then the loaded state equals the saved state.
**AC-2: Atomic rename prevents partial writes**
Given a snapshot write is interrupted mid-write (simulated kill -9)
When a fresh process starts
Then it loads the previous good snapshot, not the partial one (no corruption observed).
**AC-3: Crash recovery loads pending**
Given a previous run terminated with non-empty pending_observations
When the new process calls `load_snapshot()` for the same mission_id
Then pending_observations is non-empty and matches the pre-crash count.
**AC-4: Corruption surfaces explicit error**
Given a snapshot file with truncated content
When `load_snapshot()` runs
Then it returns `Err(CorruptSnapshot)` and `snapshot_errors_total` increments; the store does NOT silently start empty.
## Non-Functional Requirements
**Performance**
- Snapshot of a 30 km × 30 km mission (≤1 000 MapObjects): ≤1 s.
- Crash recovery: ≤2 s to a usable state (per `description.md §9`).
**Reliability**
- Atomic rename — no partial-write corruption.
- Corruption never silent.
## Runtime Completeness
- **Named capability**: persistent MapObjects state with crash recovery — default engine in-memory + JSON snapshot per Q3.
- **Production code that must exist**: real disk write; real atomic rename; real corruption-detection on load.
- **Allowed external stubs**: `tempfile` for test fixtures.
- **Unacceptable substitutes**: a no-op persistence in production is unacceptable (crash mid-flight loses the diff).
@@ -0,0 +1,64 @@
# Primitive Graph Builder + Path Freshness Scoring
**Task**: AZ-669_semantic_analyzer_primitive_graph
**Name**: Primitive graph from Tier-1 detections + path-freshness scoring
**Description**: Build a small ROI-scoped primitive graph from Tier-1 detections (path nodes, endpoint nodes, context nodes). Score path freshness using texture, edge clarity, undisturbed-surroundings cues.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-660_detection_client_grpc_stream, AZ-661_detection_client_schema_and_health
**Component**: semantic_analyzer
**Tracker**: AZ-669
**Epic**: AZ-630
## Problem
Tier 2 reasons over zoom-in crops using a primitive graph built from Tier-1 detections. The graph captures footpaths (path nodes), branch piles / dark entrances / dugouts (endpoint nodes), and trees / tree-blocks (context nodes). Path-freshness scoring combines surface texture, edge clarity, and undisturbed-surroundings cues into a single freshness score consumed by the recommended-action policy.
## Outcome
- `PrimitiveGraph::build(roi, detections) -> Graph` builds the graph from Tier-1 detections inside the ROI.
- `FreshnessScorer::score(graph, frame_crop) -> PathFreshnessScore` returns a normalized 01 score per path node.
- Graph validation: disconnected paths trigger an explicit warning (consumed by task 32).
- Health surface: `graphs_built_total`, `freshness_score_p50/p99`, `disconnected_graphs_total`.
## Scope
### Included
- Graph data structures (path / endpoint / context node types).
- Detection-to-node mapping (per-class).
- Freshness scoring (computer-vision-style: edge density, texture variance, surrounding undisturbed area).
- Graph validation.
### Excluded
- ROI CNN inference (task 31).
- Recommended-action policy (task 32).
- VLM (separate component).
## Acceptance Criteria
**AC-1: Graph contains all relevant detections**
Given a `DetectionBatch` with 3 footpath bboxes + 2 branch-pile bboxes + 5 tree bboxes inside the ROI
When `build(roi, batch)` runs
Then the graph contains 3 path nodes + 2 endpoint nodes + 5 context nodes.
**AC-2: Freshness score is bounded**
Given any valid graph + frame crop
When `score(graph, crop)` runs
Then every emitted freshness score is in `[0.0, 1.0]`.
**AC-3: Disconnected graph is flagged**
Given a graph with two unconnected path components
When validation runs
Then `disconnected_graphs_total` increments by 1 and the graph is marked invalid.
## Non-Functional Requirements
**Performance**
- Graph build: ≤30 ms per ROI on Jetson Orin Nano.
- Freshness scoring: ≤50 ms per ROI.
## Runtime Completeness
- **Named capability**: primitive graph construction + path-freshness scoring — production reasoning path.
- **Production code that must exist**: real graph construction; real freshness scorer.
- **Allowed external stubs**: `opencv` for texture/edge feature extraction.
- **Unacceptable substitutes**: a constant-score scorer in production is unacceptable.
@@ -0,0 +1,70 @@
# ROI CNN Inference + Size/Timeout Bounds + Concealment Scoring
**Task**: AZ-670_semantic_analyzer_roi_cnn
**Name**: ONNX/TensorRT ROI CNN + ROI size/timeout enforcement + concealment scoring
**Description**: Lightweight CNN session (ONNX/TensorRT) for endpoint-candidate concealment scoring. Bound every inference by strict ROI size and timeout. Never run on a full frame.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-669_semantic_analyzer_primitive_graph
**Component**: semantic_analyzer
**Tracker**: AZ-670
**Epic**: AZ-630
## Problem
Endpoint candidates (branch piles, dark entrances, dugouts) need a concealment score that combines visual cues a primitive graph can't capture alone. A lightweight CNN session (ONNX or TensorRT) runs on bounded ROI crops with a strict timeout — never on a full frame. Oversize ROIs are rejected pre-decode. Inference timeout returns a structured `Tier2Evidence { status: timeout }` so `scan_controller` can decide to skip VLM and surface a low-evidence POI.
## Outcome
- `RoiInference::infer(roi_crop) -> Result<ConcealmentScore, RoiError>` runs the CNN session.
- ROI size check pre-decode: reject if larger than `max_roi_bytes` config; `RoiError::Oversize`.
- Wall-clock timeout `inference_timeout_ms` (default 200 ms); on timeout returns `RoiError::Timeout`.
- CNN backend: ONNX Runtime primary (CPU); TensorRT optional behind a build-time feature for Jetson.
- Health surface: `tier2_latency_p50/p99`, `roi_size_bytes_p99`, `errors_total`, `oversize_rejections_total`, `timeouts_total`.
## Scope
### Included
- ONNX Runtime binding (Rust crate `ort` or equivalent).
- TensorRT integration behind feature flag (defer real impl if not Jetson-ready).
- ROI size + timeout bounds.
- Concealment scoring (raw CNN output + post-process).
### Excluded
- Primitive graph + freshness scoring (task 30).
- Recommended-action policy (task 32).
- The CNN model weights themselves (treated as a build/deploy artefact; ONNX model file path is config).
## Acceptance Criteria
**AC-1: Inference happy path**
Given a 256×256 RGB ROI and a fixture CNN model
When `infer(roi)` runs
Then it returns `Ok(ConcealmentScore { value: f32, model_version })` within ≤200 ms p99.
**AC-2: Oversize ROI rejected pre-decode**
Given an ROI larger than `max_roi_bytes`
When `infer(roi)` is called
Then it returns `Err(RoiError::Oversize)` immediately; no decode happens.
**AC-3: Inference timeout returns explicit error**
Given a fixture CNN that takes 500 ms
When `infer(roi)` is called with `inference_timeout_ms = 200`
Then it returns `Err(RoiError::Timeout)` and `timeouts_total` increments by 1.
**AC-4: TensorRT feature absent does not break build**
Given the binary is built WITHOUT the `tensorrt` feature
When the build runs
Then it builds cleanly using ONNX Runtime only.
## Non-Functional Requirements
**Performance**
- Per-ROI inference: ≤200 ms p99 (per `description.md §8`).
- Concealed-position recall ≥60 %, precision ≥20 % (per `description.md §8`; both measured against the benchmark dataset, not asserted here).
## Runtime Completeness
- **Named capability**: ROI CNN inference on ONNX (TensorRT optional).
- **Production code that must exist**: real ONNX session; real ROI bounds; real timeout.
- **Allowed external stubs**: a tiny fixture ONNX model for unit tests.
- **Unacceptable substitutes**: running on the full frame instead of an ROI is unacceptable (memory + latency).
@@ -0,0 +1,69 @@
# Recommended-Action Policy + Pan Plan Emission
**Task**: AZ-671_semantic_analyzer_action_policy
**Name**: Tier2Evidence action policy + pan-plan emission for footpath-follow
**Description**: At intersections, recommend `PanFollowFootpath | HoldEndpoint | PanBroad | ReturnToZoomOut` based on the primitive graph + freshness + concealment scores + Tier2Evidence shape. Emit a pan plan (sequence of pan goals) when `PanFollowFootpath` is chosen.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-669_semantic_analyzer_primitive_graph, AZ-670_semantic_analyzer_roi_cnn
**Component**: semantic_analyzer
**Tracker**: AZ-671
**Epic**: AZ-630
## Problem
Once the primitive graph + freshness + concealment scores are computed, the policy must choose what the gimbal does next. At intersections, the freshest / most-promising branch is recommended for `gimbal_controller` to pan toward; an explicit `pan plan` (sequence of pan goals with timing) is emitted that keeps the path centered while the UAV moves.
## Outcome
- `ActionPolicy::recommend(graph, freshness, concealment, current_roi) -> Tier2Evidence` returns the typed evidence with `recommended_next_action`.
- For `PanFollowFootpath`, the evidence carries an attached `PanPlan` (sequence of `(yaw, pitch, zoom, at_ts)` goals) consumed by `gimbal_controller` (task 16).
- `HoldEndpoint`, `PanBroad`, `ReturnToZoomOut` are returned without a pan plan.
- For graph-invalid (disconnected) cases, returns `recommended_next_action: ReturnToZoomOut` + `path_freshness: undefined`.
## Scope
### Included
- Policy rule table (graph-shape × scores × current ROI → action).
- Pan plan generator (footpath traversal sequence).
- Tier2Evidence type assembly.
### Excluded
- Plan execution (`gimbal_controller` task 16).
- VLM gating (lives in `scan_controller`).
## Acceptance Criteria
**AC-1: Single fresh footpath → PanFollowFootpath**
Given a graph with one path node, freshness > 0.7, no endpoint nodes
When `recommend(...)` runs
Then it returns `Tier2Evidence { recommended_next_action: PanFollowFootpath, pan_plan: Some(...) }`.
**AC-2: Branched intersection picks freshest branch**
Given a graph with three path nodes meeting at an intersection, freshness `[0.3, 0.9, 0.5]`
When `recommend(...)` runs
Then the emitted pan plan's first non-trivial pan goal lies along the branch with freshness 0.9.
**AC-3: High-concealment endpoint → HoldEndpoint**
Given a graph with one endpoint node, concealment > 0.8
When `recommend(...)` runs
Then it returns `Tier2Evidence { recommended_next_action: HoldEndpoint, pan_plan: None }`.
**AC-4: Disconnected graph → ReturnToZoomOut**
Given a graph marked invalid (disconnected paths)
When `recommend(...)` runs
Then it returns `Tier2Evidence { recommended_next_action: ReturnToZoomOut, path_freshness: undefined, pan_plan: None }`.
## Non-Functional Requirements
**Performance**
- Policy + pan-plan generation: ≤20 ms p99 (well within the ≤200 ms Tier 2 budget).
## Contract
- Canonical typed model: `data_model.md §Tier2Evidence`, `§PanPlan`.
## Runtime Completeness
- **Named capability**: Tier-2 action policy + pan-plan emission.
- **Production code that must exist**: real policy rules; real pan-plan generator.
- **Unacceptable substitutes**: a "always return PanFollowFootpath" placeholder is unacceptable in production.
@@ -0,0 +1,70 @@
# VLM Provider Trait + Disabled Default Impl + Feature Flag
**Task**: AZ-672_vlm_client_provider_trait
**Name**: VlmAssessmentProvider trait + default disabled impl + build-time feature gating
**Description**: Define `VlmAssessmentProvider` trait (in `shared::contracts`) and a default impl that always returns `status: disabled`. The `vlm_client` crate is behind a build-time feature flag; with the feature off the default impl is used and the binary builds + runs identically without `vlm_client`.
**Complexity**: 2 points
**Dependencies**: AZ-640_initial_structure
**Component**: vlm_client
**Tracker**: AZ-672
**Epic**: AZ-631
## Problem
VLM is optional in two ways: at runtime (`vlm_enabled` flag) and at build time (`vlm_client` Cargo feature). `scan_controller` depends only on the trait — never on the `vlm_client` crate directly — so the binary builds and runs with VLM absent. The default trait impl returns `status: disabled` so the call-site code path is identical whether VLM is enabled or absent.
## Outcome
- `VlmAssessmentProvider` trait in `shared::contracts::vlm`:
```text
trait VlmAssessmentProvider {
async fn assess(&self, roi_crop: &RoiCrop, prompt: &str) -> VlmAssessment;
}
```
- Default impl `DisabledVlmProvider` returns `VlmAssessment { status: Disabled, .. }` for every call.
- `vlm_client` Cargo feature gates inclusion of the real `vlm_client` crate; with feature off, only `DisabledVlmProvider` is registered.
- Runtime flag `vlm_enabled = false` causes the composition root to install `DisabledVlmProvider` even when the feature is compiled in.
## Scope
### Included
- Trait definition in `shared::contracts::vlm`.
- `DisabledVlmProvider` default impl (also in `shared` so it's available regardless of feature).
- Cargo feature flag wiring in `Cargo.toml` (workspace + binary).
- Runtime flag plumb from config.
### Excluded
- The real NanoLLM IPC client (task 34).
- Schema validation (task 35).
## Acceptance Criteria
**AC-1: Disabled default returns disabled status**
Given a `DisabledVlmProvider`
When `assess(roi, "...")` is called
Then it returns `VlmAssessment { status: Status::Disabled, .. }` immediately (≤1 ms).
**AC-2: Binary builds without vlm_client feature**
Given the binary is built with `--no-default-features` (or whatever toggles the `vlm_client` feature off)
When the build runs
Then it succeeds; the `vlm_client` crate is NOT a build dependency.
**AC-3: Runtime vlm_enabled = false uses disabled impl**
Given the binary is built WITH the `vlm_client` feature but config sets `vlm_enabled = false`
When the composition root constructs the provider
Then `DisabledVlmProvider` is installed; the real NanoLLM client is NOT constructed.
## Non-Functional Requirements
**Performance**
- `DisabledVlmProvider::assess` ≤1 ms.
## Contract
- Canonical typed model: `data_model.md §VlmAssessment`.
## Runtime Completeness
- **Named capability**: optional-VLM trait + disabled default.
- **Production code that must exist**: real trait; real disabled impl; real feature-flag wiring.
- **Unacceptable substitutes**: hardcoding `vlm_client` as a non-optional dependency is unacceptable per `description.md §9 Optionality Model`.
@@ -0,0 +1,72 @@
# NanoLLM UDS Client + Peer-Cred Check + Pre-Send Validation
**Task**: AZ-673_vlm_client_nanollm_ipc
**Name**: Unix-domain socket client to NanoLLM + peer-cred check + ROI pre-send validation
**Description**: Maintain the Unix-domain-socket connection to the NanoLLM process. Perform a peer-credential check on connect (where supported). Validate ROI payload (size, format) BEFORE sending across the IPC channel. No network egress — UDS only.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-672_vlm_client_provider_trait
**Component**: vlm_client
**Tracker**: AZ-673
**Epic**: AZ-631
## Problem
VLM runs as a local NanoLLM/VILA1.5-3B process. The link is a Unix-domain socket — no network egress, ever. The connection MUST be peer-credential-checked on connect (Linux `SO_PEERCRED`) to confirm the peer process belongs to the expected user / GID; failure is a hard error requiring operator intervention, not a silent retry. ROI payloads MUST be validated for size + format BEFORE crossing the socket — never spend network IPC time on a payload that's known-too-big.
## Outcome
- `NanoLlmClient::connect(socket_path) -> Result<Self, ConnectError>` opens a UDS connection and performs `SO_PEERCRED` check; mismatch returns `Err(PeerCredMismatch)`.
- `NanoLlmClient::assess(roi_crop, prompt) -> VlmAssessment` validates the ROI pre-send and sends a single request; awaits one response within ≤5 s; returns `VlmAssessment`.
- Bounded reconnect on transport loss; on peer-cred failure NO reconnect happens (operator intervention required).
- Health surface: `vlm_latency_p50/p99`, `errors_by_kind`, `peer_cred_check_pass_rate`.
## Scope
### Included
- UDS client (`tokio::net::UnixStream`).
- `SO_PEERCRED` check (Linux; on macOS dev hosts, log a warning and proceed for development purposes only — production target is Jetson Linux).
- Pre-send size + format validation.
- Reconnect state machine (bounded).
- Bounded request deadline.
### Excluded
- VlmAssessment schema validation (task 35).
- Provider trait wiring (task 33).
## Acceptance Criteria
**AC-1: Happy path against fixture NanoLLM**
Given a fixture NanoLLM process listening on a UDS path with correct peer-cred
When `connect` is called and then `assess(roi, "is this concealed?")` is called
Then `connect` returns Ok; `assess` returns `VlmAssessment { status: Ok, label, confidence, .. }` within ≤5 s.
**AC-2: Peer-cred mismatch hard-fails connect**
Given a fixture peer with wrong UID
When `connect` is called
Then it returns `Err(PeerCredMismatch)`; subsequent connect attempts are blocked until config-driven intervention (no automatic retry); health → red.
**AC-3: Oversize ROI rejected pre-send**
Given an ROI larger than `max_roi_bytes`
When `assess(...)` is called
Then it returns `VlmAssessment { status: SchemaInvalid, .. }` synchronously without writing to the socket.
**AC-4: Response timeout returns explicit status**
Given a fixture NanoLLM that never responds within 5 s
When `assess(...)` is called
Then it returns `VlmAssessment { status: Timeout, .. }` after ≤5 s; subsequent requests are not blocked.
## Non-Functional Requirements
**Performance**
- Per-ROI latency: ≤5 s p99 (per `description.md §8`).
**Reliability**
- No network egress (hard rule — UDS only).
- Peer-cred mismatch never silently retried.
## Runtime Completeness
- **Named capability**: NanoLLM/VILA1.5-3B IPC over UDS + peer-cred enforcement.
- **Production code that must exist**: real UDS connection; real `SO_PEERCRED`; real pre-send validation.
- **Allowed external stubs**: a Python NanoLLM stub script in tests that echoes a canned response.
- **Unacceptable substitutes**: TCP to localhost instead of UDS is unacceptable (violates the no-network-egress rule).
@@ -0,0 +1,72 @@
# VlmAssessment Schema Validation + Model-Version Tracking
**Task**: AZ-674_vlm_client_schema_and_model_version
**Name**: VlmAssessment schema validation + model_version tracking + status enum coverage
**Description**: Validate every NanoLLM response against the `VlmAssessment` schema. On schema-invalid, return `status: schema_invalid` + log the raw response (size-capped) for offline analysis. Capture `model_version` on every assessment for forensic correlation; log on change.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-673_vlm_client_nanollm_ipc
**Component**: vlm_client
**Tracker**: AZ-674
**Epic**: AZ-631
## Problem
The NanoLLM process emits free-form text, but the autopilot consumes ONLY a validated structured `VlmAssessment`. Schema-invalid responses MUST not propagate as malformed evidence — they're returned as `status: schema_invalid` with the raw response logged size-capped for offline analysis. Model-version capture supports forensic correlation when an assessment's quality is later disputed.
## Outcome
- `VlmAssessmentParser::parse(raw_response) -> VlmAssessment` validates the response against the schema; on failure returns `VlmAssessment { status: SchemaInvalid, .. }` and logs the raw response (size-capped to e.g. 4 KB) at warn level.
- `model_version` field is populated on every assessment from the NanoLLM-reported version; changes are logged at info level once per change.
- Status enum exhaustively covers `Ok | Inconclusive | Timeout | SchemaInvalid | IpcError | Disabled`; consumer match-exhaustion is enforced by the type.
## Scope
### Included
- Schema definition in `shared/contracts/vlm-assessment.json` (or equivalent Rust schema).
- Parser implementation.
- Model-version change detection.
- Size-capped raw-response logging.
### Excluded
- The UDS transport (task 34).
- Provider trait wiring (task 33).
## Acceptance Criteria
**AC-1: Valid response parses successfully**
Given a fixture NanoLLM response with all required fields
When `parse(raw)` runs
Then it returns `VlmAssessment { status: Ok, label, confidence, model_version, .. }`.
**AC-2: Schema-invalid response returns schema_invalid + logs**
Given a fixture response missing a required field
When `parse(raw)` runs
Then it returns `VlmAssessment { status: SchemaInvalid, .. }` and the raw response excerpt (size-capped) is observable in log output.
**AC-3: Model version change logged once**
Given an assessment with `model_version = "v1.0"` followed by another with `model_version = "v1.1"`
When the change is detected
Then a single log entry observes the change; subsequent assessments with `v1.1` do NOT re-log.
**AC-4: Status enum is exhaustive**
Given consumer code that matches on `VlmAssessment.status`
When a new variant is added (compile-time)
Then the compiler forces handling of the new variant; no `_ => …` catch-all in the policy code-path.
## Non-Functional Requirements
**Performance**
- Schema validation: ≤2 ms.
**Reliability**
- Schema mismatches never silent.
## Contract
- Canonical typed model: `data_model.md §VlmAssessment`. Schema lives at `shared/contracts/vlm-assessment.json`.
## Runtime Completeness
- **Named capability**: VlmAssessment schema validation + model-version awareness.
- **Production code that must exist**: real schema validator; real model-version tracker.
- **Unacceptable substitutes**: silently mapping a schema-invalid response to `status: Ok` with placeholder fields is unacceptable.
@@ -0,0 +1,68 @@
# Telemetry gRPC Server + Per-Client Lossy Subscriber
**Task**: AZ-675_telemetry_stream_grpc_server
**Name**: Tonic gRPC server bind + per-client lossy subscriber bounded queue
**Description**: Bring up the operator-bound telemetry gRPC server (Tonic). Per-client subscriber has a bounded queue. Slow clients drop oldest, count drops; never block the producer.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-649_mission_executor_telemetry_forwarding, AZ-657_frame_ingest_rtsp_session
**Component**: telemetry_stream
**Tracker**: AZ-675
**Epic**: AZ-637
## Problem
`telemetry_stream` is the operator-bound publisher for `TelemetrySample`, `GimbalState`, `DetectionEvent`, `MovementCandidate`, `MapObjectsBundle`. Per-client throttling MUST be lossy and per-client so a slow client never starves a healthy one. The server runs over the operator-link gRPC channel — same physical transport as `operator_bridge` but a separate logical service.
## Outcome
- Tonic gRPC server bound on `telemetry.listen_addr` exposing a single subscribe-style streaming RPC per topic (or a multiplex RPC).
- Each connected client has a `(bounded_queue, drop_counter, last_sent_seq)` state.
- Producer fan-out copies (refcount where possible) the message into each subscriber's queue. Full queue → drop oldest, increment `drops_total{client_id, topic}`.
- Disconnects cleanly tear down the subscriber.
- Health surface: `subscribed_clients`, `drops_total{client_id, topic}`, `bytes_out_per_topic`.
## Scope
### Included
- Tonic server bind + cleanup.
- Per-client subscriber state.
- Drop-oldest back-pressure.
- Disconnect handling.
### Excluded
- The .proto schema (lives in `shared/contracts/telemetry-stream.proto`; if absent, add it as a side-effect of this task).
- Diff-based snapshot emission for `MapObjectsBundle` (task 38).
- Operator commands (lives in `operator_bridge` component).
## Acceptance Criteria
**AC-1: Multiple subscribers receive the same stream**
Given 3 clients subscribed to `TelemetrySample`
When 100 samples are published
Then each client receives all 100 (assuming no slowness); ordering preserved.
**AC-2: Slow subscriber drops oldest, healthy unaffected**
Given client A reads slowly and client B reads at full speed
When producer pushes 1000 samples while A is paused
Then client A's queue grows up to `max_queue` and then drops oldest (drops_total{A} > 0); client B receives all 1000.
**AC-3: Disconnect cleanly removes subscriber**
Given a connected client
When the gRPC stream is canceled
Then `subscribed_clients` decrements by 1; producer fan-out no longer copies to that client.
## Non-Functional Requirements
**Performance**
- Per-message fan-out CPU: ≤2 ms p99 for ≤10 clients (per architecture NFR class).
- Tx tail latency end-to-end (producer → wire) ≤100 ms p95 over a healthy link.
**Reliability**
- No producer-side blocking on slow clients (hard rule).
## Runtime Completeness
- **Named capability**: Tonic gRPC operator telemetry stream with lossy per-client throttling.
- **Production code that must exist**: real gRPC server; real per-client subscriber state machine; real drop counters.
- **Allowed external stubs**: an in-process gRPC client in tests.
- **Unacceptable substitutes**: a single global queue (head-of-line blocking) is unacceptable.
@@ -0,0 +1,65 @@
# Video Path Selection (Forward RTSP vs Encoded Bytes) + AI-Lock Coordination
**Task**: AZ-676_telemetry_stream_video_path
**Name**: Operator-bound video path: forward RTSP URL OR carry encoded bytes; coordinate with frame_ingest ai_locked signal
**Description**: Two delivery modes for the operator video path (config-driven): (1) forward the RTSP URL to the operator (most common), (2) carry encoded bytes over the operator gRPC stream. Coordinate with `frame_ingest`'s `ai_locked` signal so AI inference is suppressed only while operator-led control occupies the frame budget.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-657_frame_ingest_rtsp_session, AZ-675_telemetry_stream_grpc_server
**Component**: telemetry_stream
**Tracker**: AZ-676
**Epic**: AZ-637
## Problem
The operator sees the camera feed. Two modes are supported because some operator stacks attach to the RTSP source directly (lower onboard cost, recommended default), and others need bytes carried over the same operator-link channel (no separate RTSP socket to the operator).
When the operator-bound feed is active (either mode), `frame_ingest` MUST raise `ai_locked = true` so Tier-1 inference does not run on the same frames the operator is actively driving. The mechanism is a shared `Arc<AtomicBool>` (or equivalent) toggled by `telemetry_stream`'s session start/stop, read by `frame_ingest` (task 18) and `detection_client` (task 21).
## Outcome
- Config flag `video_path = "rtsp_forward" | "bytes_inline"`; default `rtsp_forward`.
- `rtsp_forward`: emit the canonical RTSP URL as part of the session-start telemetry.
- `bytes_inline`: take frames from `frame_ingest`'s broadcast channel and forward bytes to subscribed operator clients.
- `ai_locked` shared flag plumbed at startup; flipped to `true` while at least one operator session is consuming the video path, `false` otherwise.
- Health surface: `video_path_mode`, `ai_locked_state`, `bytes_inline_drops_total`.
## Scope
### Included
- Both modes (rtsp_forward + bytes_inline).
- ai_locked toggle wiring.
- Session-tracking (active client count gating ai_locked).
### Excluded
- RTSP server stream itself (it's owned by the camera; we just forward the URL).
- `frame_ingest` reading the flag (task 18 owns that).
- Snapshot/diff for MapObjects (task 38).
## Acceptance Criteria
**AC-1: rtsp_forward mode emits URL only**
Given `video_path = "rtsp_forward"` and a client subscribes
When the session starts
Then the client receives the configured RTSP URL in the session-start message; no bytes are streamed by this component.
**AC-2: bytes_inline forwards encoded frames**
Given `video_path = "bytes_inline"` and a client subscribes
When `frame_ingest` publishes 100 frames
Then the client receives all 100 (modulo bounded-queue drops handled by task 36).
**AC-3: ai_locked toggles on session start/stop**
Given no operator session is active (`ai_locked = false`)
When the first client subscribes to the video stream
Then `ai_locked` flips to `true`; when all clients disconnect, `ai_locked` flips back to `false`.
## Non-Functional Requirements
**Performance**
- bytes_inline: frame copy cost ≤2 ms p99 per frame on Jetson Orin Nano.
- AI-lock toggle latency: ≤50 ms from subscribe → flag flip.
## Runtime Completeness
- **Named capability**: operator video path (dual mode) + ai_locked coordination.
- **Production code that must exist**: both modes; real ai_locked atomic wired to consumers.
- **Unacceptable substitutes**: rtsp_forward that doesn't actually emit the URL (or bytes_inline that doesn't read frame_ingest) is unacceptable.
@@ -0,0 +1,62 @@
# Pre-Flight MapObjects Snapshot + In-Flight Diffs + Reconnect Resync
**Task**: AZ-677_telemetry_stream_mapobjects_snapshot
**Name**: MapObjects bundle: pre-flight snapshot + in-flight diff stream + reconnect re-snapshot
**Description**: Emit a full `MapObjectsBundle` snapshot on operator client connect/reconnect, then stream diff messages as the store appends new observations / ignored items. On client reconnect after disconnect, emit a fresh snapshot rather than trying to replay diffs.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-675_telemetry_stream_grpc_server, AZ-667_mapobjects_store_hydrate_and_pending
**Component**: telemetry_stream
**Tracker**: AZ-677
**Epic**: AZ-637
## Problem
The operator views the live map state. Sending the entire `MapObjectsBundle` on every change is wasteful, but streaming diffs without a baseline forces the operator to recover from missing state on reconnect. The pattern: snapshot on connect, then diffs while connected. On disconnect-then-reconnect, treat as fresh client → re-snapshot. No best-effort gap-filling.
## Outcome
- On client subscribe to `MapObjectsBundle` topic: read current store state via `MapObjectsStore::snapshot()`; emit one `MapObjectsBundleSnapshot` message.
- During the session: subscribe to the store's append log (pending_observations + pending_ignored streams); emit `MapObjectsDiff { added: [...], moved: [...], removed_candidates: [...], ignored: [...] }` messages.
- On client disconnect: drop the subscriber.
- On reconnect: treat as new subscribe; emit a fresh snapshot. NO diff replay.
- Health: `mapobjects_snapshot_bytes`, `mapobjects_diff_count`, `mapobjects_resnap_count`.
## Scope
### Included
- Snapshot emission on subscribe.
- Diff stream from store append log.
- Re-snapshot on reconnect.
### Excluded
- Store implementation (task 28).
- Per-client subscriber state machine (task 36).
## Acceptance Criteria
**AC-1: First subscribe receives snapshot**
Given a store with 50 MapObjects + 10 IgnoredItems hydrated
When a client subscribes to the MapObjectsBundle topic
Then it receives exactly one `MapObjectsBundleSnapshot` containing 50 + 10 entries.
**AC-2: In-flight changes emit diffs**
Given a connected client
When 3 new observations and 1 ignored item are appended to the store
Then the client receives one or more `MapObjectsDiff` messages whose combined contents = `{added: 3, ignored: 1}`.
**AC-3: Reconnect re-snapshots**
Given a client disconnected mid-session and the store grew by 5 entries while disconnected
When the client reconnects
Then the client receives a fresh `MapObjectsBundleSnapshot` reflecting the current state; NO diff replay.
## Non-Functional Requirements
**Performance**
- Snapshot serialization: ≤200 ms p99 for ≤10 000 MapObjects.
- Diff fan-out: ≤2 ms p99 per append.
## Runtime Completeness
- **Named capability**: snapshot + diff transport for MapObjects.
- **Production code that must exist**: real snapshot emission; real diff streaming; real re-snapshot on reconnect.
- **Unacceptable substitutes**: emitting full snapshots on every change (bandwidth) or replaying diffs across reconnect (consistency hazard) are both unacceptable.
@@ -0,0 +1,82 @@
# Operator Command Auth: Signature + Replay-Protection + Session Validation
**Task**: AZ-678_operator_bridge_command_auth
**Name**: Validate operator command signature, replay-protection sequence, and session token before dispatch
**Description**: Every incoming operator command MUST pass three checks before any business logic runs: (1) authentication signature over `(session_token, sequence_number, payload)`, (2) replay-protection sequence number is monotonically increasing per session, (3) session token is currently valid. Modem-link encryption is not sufficient (per `architecture.md §5` + Q9).
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-675_telemetry_stream_grpc_server
**Component**: operator_bridge
**Tracker**: AZ-678
**Epic**: AZ-628
## Problem
Operator commands shape the drone's behaviour (target-follow start, mission resume, safety-override). A hostile injection on the operator-link could otherwise direct the UAV to a forbidden target or release a safety. The authoritative principle ("operator commands authenticated, signed, replay-protected, session-bound") is committed in `architecture.md §5`; the exact scheme is open (`§8 Q9`) — this task implements the validator layer in a scheme-agnostic shape and a default HMAC-SHA256 scheme as the placeholder concrete implementation, behind a trait so Q9 resolution can swap it cleanly.
## Outcome
- Trait `OperatorCommandValidator { fn validate(&self, cmd: &SignedCommand) -> Result<ValidatedCommand, AuthError> }` in `shared::contracts::operator_auth`.
- Default impl `HmacOperatorValidator` using SHA-256 HMAC over `(session_token || sequence_number || payload_bytes)`.
- Per-session replay-protection: highest-seen sequence number stored in-process; rejection on equal-or-less.
- Session token validity check against an in-process session registry (sessions added on Ground Station auth; expired after `session_ttl_secs`).
- All rejections counted by reason: `signature_invalid`, `replay_detected`, `session_unknown`, `session_expired`.
- Health: `auth_rejections_total{reason}`; sustained signature failures → health red.
## Scope
### Included
- Validator trait + HMAC default impl.
- Per-session sequence-number tracker.
- Session registry (in-process, in-memory).
- Rejection-reason metrics.
### Excluded
- The transport (Ground Station → operator_bridge); this task validates already-deserialized `SignedCommand`s.
- Session creation handshake from Ground Station (handled at session establishment; this task only consumes sessions).
- Dispatch to `scan_controller` / `mission_executor` (tasks 41 + 42).
## Acceptance Criteria
**AC-1: Valid signed command passes**
Given a `SignedCommand` with correct HMAC over `(token, seq, payload)` and a known session with `seq > last_seen`
When `validate(cmd)` runs
Then it returns `Ok(ValidatedCommand)`; `last_seen` advances to `seq`.
**AC-2: Invalid signature rejected**
Given a `SignedCommand` whose HMAC does not match the computed value
When `validate(cmd)` runs
Then it returns `Err(AuthError::SignatureInvalid)`; `auth_rejections_total{reason: signature_invalid}` increments; sequence number is NOT advanced.
**AC-3: Replay detected**
Given a session with `last_seen = 10`
When a command arrives with `sequence_number = 10` (or `< 10`)
Then it returns `Err(AuthError::ReplayDetected)`; `auth_rejections_total{reason: replay_detected}` increments.
**AC-4: Unknown / expired session rejected**
Given a command bearing a session token not in the registry (or whose TTL has elapsed)
When `validate(cmd)` runs
Then it returns `Err(AuthError::SessionUnknown)` or `SessionExpired`; appropriate counter increments.
**AC-5: Sustained signature failures escalate health**
Given `auth_rejections_total{reason: signature_invalid}` exceeds `signature_failure_red_threshold` per minute (config)
When health is sampled
Then health returns red and surfaces the failure rate.
## Non-Functional Requirements
**Performance**
- `validate` ≤1 ms p99.
**Security**
- Reject-then-log; never log the raw payload of a rejected command at info level (size-cap + redact).
- No timing oracle — use constant-time HMAC compare.
## Contract
- Canonical typed model: `data_model.md §OperatorCommand`, `§SignedCommand`. Open Q9 (`architecture.md §8`) is acknowledged; this task ships a working HMAC default and a trait so resolution can swap.
## Runtime Completeness
- **Named capability**: operator command authentication + replay-protection + session binding.
- **Production code that must exist**: real HMAC validator; real per-session monotonic counter; real session registry; constant-time compare.
- **Unacceptable substitutes**: "accept all signed commands as valid" or sequence number tracked globally (cross-session replay) are unacceptable.
@@ -0,0 +1,66 @@
# POI Surface to Operator + Deadline Carriage
**Task**: AZ-679_operator_bridge_poi_surface
**Name**: POI surface event format + deadline carriage + push via telemetry_stream
**Description**: Translate `POI` events from `scan_controller` into the wire format defined in `architecture.md §7.10 Drone ⇄ Operator Sync Message Format`. Carry the operator-decision deadline computed by `scan_controller`. Push via `telemetry_stream`.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-675_telemetry_stream_grpc_server
**Component**: operator_bridge
**Tracker**: AZ-679
**Epic**: AZ-628
## Problem
The operator UI consumes `POI` events with all the evidence required to decide (Tier-1 evidence summary, Tier-2 path/concealment scores when present, optional VLM status, photo metadata, deadline). The wire format is fixed by `architecture.md §7.10`; this task implements the in→out mapping and the deadline carriage (computed upstream by `scan_controller`'s confidence-scaled window).
## Outcome
- `PoiSurfaceMapper::map(poi: &Poi) -> OperatorPoiEvent` produces the wire-format message.
- Fields populated: `poi_id`, `mgrs`, `class_group`, `confidence`, `vlm_status`, `tier2_evidence_summary`, `photo_metadata`, `deadline_unix_ms`.
- On `scan_controller` dequeue (cap rotation / age-out / completion), emit a `PoiDequeued { poi_id, reason }` event.
- Health: `pois_surfaced_per_min`.
## Scope
### Included
- `Poi` → wire format mapping.
- Deadline carriage (already computed upstream).
- Dequeue event emission.
### Excluded
- POI queue + rate-cap (lives in `scan_controller` task 44).
- Operator command dispatch (task 41).
- Transport (lives in `telemetry_stream`).
## Acceptance Criteria
**AC-1: All required fields populated**
Given a `Poi` with Tier-1 + Tier-2 + VLM evidence
When `map(poi)` runs
Then the resulting `OperatorPoiEvent` has every required field per `architecture.md §7.10` non-empty (or explicit null with `vlm_status` reflecting state).
**AC-2: VLM-disabled case carries explicit status**
Given a `Poi` whose VLM evidence is `status: Disabled`
When `map(poi)` runs
Then `OperatorPoiEvent.vlm_status = "disabled"` and `vlm_label = null`; operator can render without inferring absence.
**AC-3: Dequeue emits event**
Given a surfaced POI with id X
When `scan_controller` rotates the queue and X is dequeued
Then a `PoiDequeued { poi_id: X, reason }` event is emitted to `telemetry_stream`.
## Non-Functional Requirements
**Performance**
- POI surface mapping: ≤1 ms p99.
- POI surface → operator visible: ≤1 s under normal modem (end-to-end with `telemetry_stream`).
## Contract
- Canonical typed model: `data_model.md §POI`. Wire format: `architecture.md §7.10`.
## Runtime Completeness
- **Named capability**: POI surface in canonical wire format.
- **Production code that must exist**: real mapper; real dequeue emission.
- **Unacceptable substitutes**: dropping `tier2_evidence_summary` because "it's optional" is unacceptable when the evidence exists in the POI.
@@ -0,0 +1,78 @@
# Operator Command Dispatch (confirm / decline / target-follow start/release) + Idempotency
**Task**: AZ-680_operator_bridge_command_dispatch
**Name**: Dispatch validated operator commands to scan_controller + per-command-id idempotency
**Description**: After `validate()` (task 39), dispatch the command to `scan_controller`: confirm → `(target_mgrs, target_class)`, decline → `(MGRS, class_group)` for IgnoredItem append, target-follow start/release → state transition. Per-command-id idempotency cache (60 s window) so re-transmits get the cached result rather than double-acting.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-678_operator_bridge_command_auth
**Component**: operator_bridge
**Tracker**: AZ-680
**Epic**: AZ-628
## Problem
Operator commands can be re-transmitted over a lossy modem link; the autopilot must not double-act. Each command carries a unique `command_id`; results are cached for 60 s. The dispatch step validates the command is for a known POI (or a target-follow session, BIT report, or safety-override scope) before forwarding to `scan_controller`.
## Outcome
- `CommandDispatcher::dispatch(validated_cmd) -> CommandAck` routes the command into `scan_controller`'s API and returns an ack.
- Idempotency cache `command_id → CommandAck` with 60 s TTL; cache-hit returns the cached ack and does NOT re-dispatch.
- POI-id-bound commands validated against currently surfaced POI set; unknown POI id → `Ack::Error { reason: unknown_poi_id }`.
- Deadline expiration handled: command past deadline → `Ack::Error { reason: expired }`.
- Health: `commands_in_flight`, `decision_latency_p50/p99`.
## Scope
### Included
- Dispatch switch on command kind (`Confirm | Decline | TargetFollowStart | TargetFollowRelease`).
- Idempotency cache with TTL.
- POI id validity check.
- Deadline check.
### Excluded
- Auth/replay (task 39).
- BIT-degraded ack + safety-override (task 42; separate command kinds with separate flow).
- POI queue mechanics (task 44).
## Acceptance Criteria
**AC-1: Confirm forwards target hint**
Given a validated `Confirm { command_id, poi_id }` for a currently-surfaced POI
When `dispatch(cmd)` runs
Then `scan_controller::on_confirm(target_mgrs, target_class)` is invoked exactly once; ack `Ok` returned; cache populated.
**AC-2: Re-transmit returns cached ack**
Given the same `command_id` re-arrives within 60 s of the first dispatch
When `dispatch(cmd)` runs
Then the cached ack is returned; `scan_controller::on_confirm` is NOT invoked again.
**AC-3: Unknown POI id rejected**
Given a `Confirm` with `poi_id` not in the surfaced set
When `dispatch(cmd)` runs
Then `Ack::Error { reason: unknown_poi_id }` returned; nothing dispatched.
**AC-4: Expired POI rejected**
Given a `Confirm` whose deadline has passed
When `dispatch(cmd)` runs
Then `Ack::Error { reason: expired }` returned; nothing dispatched.
**AC-5: Decline appends IgnoredItem via scan_controller**
Given a validated `Decline { command_id, poi_id }`
When `dispatch(cmd)` runs
Then `scan_controller::on_decline(mgrs, class_group)` invoked exactly once.
## Non-Functional Requirements
**Performance**
- Dispatch overhead: ≤2 ms p99.
- Operator command → autopilot effect: ≤1 s under normal modem (end-to-end).
## Contract
- Canonical typed model: `data_model.md §OperatorCommand`, `§POI`.
## Runtime Completeness
- **Named capability**: operator command dispatch + idempotency.
- **Production code that must exist**: real dispatch switch; real TTL cache; real validity + deadline checks.
- **Unacceptable substitutes**: no idempotency cache (the operator UI WILL retransmit on flaky modem) is unacceptable.
@@ -0,0 +1,77 @@
# BIT-DEGRADED Acknowledgement + Safety-Override Command Path
**Task**: AZ-681_operator_bridge_safety_and_bit_ack
**Name**: Forward signed BIT-degraded ack + safety-override commands to mission_executor
**Description**: Forward signed BIT-degraded acknowledgement commands (F9) and signed safety-override commands (F10: lost-link / battery suppression) to `mission_executor`. Severity check: operator MAY ack a DEGRADED but MUST NEVER ack a FAIL.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-678_operator_bridge_command_auth, AZ-650_mission_executor_bit_f9, AZ-652_mission_executor_safety_and_resume
**Component**: operator_bridge
**Tracker**: AZ-681
**Epic**: AZ-628
## Problem
Two operator-command kinds bypass `scan_controller` and go straight to `mission_executor`: BIT-DEGRADED acknowledgement (gating takeoff after a non-FAIL BIT result), and safety-override (suppressing a lost-link RTL or battery-RTL trigger, scoped + bounded by `architecture.md` F10). Both MUST be signed (already enforced by task 39). The BIT severity check must reject any attempt to ack a FAIL.
## Outcome
- `SafetyCommandDispatcher::dispatch(validated_cmd) -> CommandAck` routes:
- `BitDegradedAck { bit_report_id }``mission_executor::on_bit_degraded_ack(bit_report_id)`; rejected if report severity = FAIL.
- `SafetyOverride { scope, duration }``mission_executor::on_safety_override(scope, duration)`.
- Severity check on BIT ack: looks up the originating `BitReport`; if `severity = Fail` returns `Ack::Error { reason: cannot_acknowledge_fail }` without dispatching.
- Idempotency cache (shared with task 41).
- Health: `safety_overrides_active`, `bit_ack_rejections_total{reason}`.
## Scope
### Included
- Dispatch switch for `BitDegradedAck` and `SafetyOverride`.
- Severity check against BIT report cache.
- Audit log entry for every safety command (signed, redacted payload).
### Excluded
- Auth (task 39).
- POI commands (task 41).
- The BIT machinery itself (lives in `mission_executor` task 11/AZ-650).
- The safety-override enforcement / scope-clamping (lives in `mission_executor` task 13/AZ-652).
## Acceptance Criteria
**AC-1: BIT-DEGRADED ack succeeds**
Given a validated `BitDegradedAck { bit_report_id: X }` where the BIT report severity = `Degraded`
When `dispatch(cmd)` runs
Then `mission_executor::on_bit_degraded_ack(X)` invoked exactly once; ack `Ok` returned.
**AC-2: BIT-FAIL ack rejected**
Given a validated `BitDegradedAck { bit_report_id: X }` where the BIT report severity = `Fail`
When `dispatch(cmd)` runs
Then `Ack::Error { reason: cannot_acknowledge_fail }` returned; `mission_executor::on_bit_degraded_ack` is NOT invoked; `bit_ack_rejections_total{reason: cannot_acknowledge_fail}` increments.
**AC-3: Safety-override forwards with scope + duration**
Given a validated `SafetyOverride { scope: BatteryRtl, duration_secs: 60 }`
When `dispatch(cmd)` runs
Then `mission_executor::on_safety_override(BatteryRtl, 60)` invoked exactly once; ack `Ok` returned; an audit log entry exists (operator id, scope, duration, ts).
**AC-4: Audit log redacts secrets**
Given any dispatched safety command
When the audit log entry is written
Then the entry contains the command id, timestamp, operator id, scope/duration — but NEVER the raw signature bytes or session token.
## Non-Functional Requirements
**Performance**
- Dispatch overhead: ≤2 ms p99.
**Security**
- Severity check is mandatory; no path bypasses it.
- Audit log entry is the only persistent trace of safety commands; redaction is mandatory.
## Contract
- Canonical typed model: `data_model.md §OperatorCommand`, `§BitReport`.
## Runtime Completeness
- **Named capability**: BIT-degraded ack + safety-override dispatch with severity gate + audit log.
- **Production code that must exist**: real severity check; real audit log writer (file or structured logger).
- **Unacceptable substitutes**: accepting a FAIL ack "because the signature was valid" defeats the whole F9 gate and is unacceptable.
@@ -0,0 +1,73 @@
# Typed State Machine: ZoomedOut / ZoomedIn / TargetFollow
**Task**: AZ-682_scan_controller_state_machine
**Name**: Typed enum state machine with explicit transitions and no ad-hoc booleans
**Description**: Define the typed state enum (`ZoomedOut | ZoomedIn { roi, hold_started_at } | TargetFollow { target_id, started_at }`) and all explicit transitions. Tick latency budget ≤10 ms p99. Frame-rate floor monitor suppresses `ZoomedOut → ZoomedIn` when sustained FPS < 10.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-649_mission_executor_telemetry_forwarding
**Component**: scan_controller
**Tracker**: AZ-682
**Epic**: AZ-635
## Problem
`scan_controller` is the system's brain. State variables and transitions must be exhaustive and typed — ad-hoc booleans drift and create unreachable bugs. The frame-rate-floor guard (`sustained FPS < 10 → suppress zoom-in`) prevents the controller from entering a state where Tier 2 + VLM saturate the budget while Tier 1 starves.
## Outcome
- `ScanState` enum exhausting all variants; transitions modeled as functions returning new `ScanState`.
- `Tick` function: pure transition step taking inputs (`DetectionBatch | MovementCandidate | Tier2Evidence | VlmAssessment | OperatorEvent | TelemetrySample | MissionState | MapObjectsSyncState`) and returning `(new_state, emitted_actions)`.
- Frame-rate floor monitor: rolling FPS window from `frame_ingest` health pulses; below 10 fps sustained over `floor_window_secs`, suppress `ZoomedOut → ZoomedIn` transitions and surface health → yellow.
- Health: `state`, `tick_latency_p99`, `last_state_change_ts`, `fps_floor_active`.
## Scope
### Included
- State enum + transition functions.
- Tick function (pure, single-input).
- Frame-rate floor monitor.
- Restart starts in `ZoomedOut` with empty queue (per `description.md §5`).
### Excluded
- POI queue + rate-cap + decision window (task 44).
- Evidence ladder + zoom-in candidate handling (task 45).
- MapObjects dispatch (task 46).
- Gimbal command issuance + degraded-sync handling (task 47).
## Acceptance Criteria
**AC-1: State enum is exhausted by tick**
Given any `ScanState` variant and any `TickInput`
When `tick(state, input)` runs
Then it returns a `(new_state, actions)` with no `_ => …` catch-all in transition logic; new variants force compile errors elsewhere.
**AC-2: Transition catalogue is complete**
Given the architecture catalogue of allowed transitions (per `system-flows.md §F4`)
When transitions are enumerated in the code
Then every `(from_state, trigger) → to_state` from the spec is covered; spec-disallowed transitions are typed-impossible OR explicitly rejected with a recorded reason.
**AC-3: Frame-rate floor suppresses zoom-in**
Given sustained FPS < 10 over the `floor_window_secs` window
When a tick that would otherwise transition `ZoomedOut → ZoomedIn` runs
Then the transition is suppressed; state remains `ZoomedOut`; `fps_floor_active = true`; health → yellow.
**AC-4: Tick latency budget**
Given a stream of 1000 sequential ticks
When measured
Then `tick_latency_p99 ≤ 10 ms`.
## Non-Functional Requirements
**Performance**
- Tick latency: ≤10 ms p99.
- State transition is deterministic (same inputs in same order → same outputs).
## Contract
- Canonical typed model: `data_model.md §POI`, `§MapObject`, `§IgnoredItem`. State variants per `description.md §1`.
## Runtime Completeness
- **Named capability**: deterministic typed state machine for scan_controller.
- **Production code that must exist**: real state enum; real transitions; real frame-rate floor monitor.
- **Unacceptable substitutes**: `is_zoomed_in: bool` instead of a sum type is unacceptable per `description.md §4` ("no ad-hoc booleans").
@@ -0,0 +1,80 @@
# POI Queue: Priority Ordering + 5/min Rate Cap + Confidence-Scaled Decision Window
**Task**: AZ-683_scan_controller_poi_queue_and_window
**Name**: POI queue ordered by confidence × proximity × age + 5/min hard cap + decision-window mapping
**Description**: Maintain the POI queue ordered by `confidence × proximity_to_current_camera × age_factor`. Hard-cap output to ≤5 POIs/min. Map confidence to operator-decision deadline: 40% → 30 s, 100% → 120 s, linear; below 40% the POI is NOT surfaced. Timeout → forget. Decline → IgnoredItem append (via dispatch in task 46).
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-682_scan_controller_state_machine
**Component**: scan_controller
**Tracker**: AZ-683
**Epic**: AZ-635
## Problem
The 5-POI/min cap is a product-level invariant (operator cognitive load). The decision window is confidence-scaled because the operator deserves more time on ambiguous targets and less on obvious ones. The priority ordering keeps the most actionable POI in front of the operator first. Confidence < 40 % means "don't surface" — silent suppression, not a low-priority enqueue.
## Outcome
- `PoiQueue` with ordered insertion by `priority = f(confidence, proximity, age)`.
- Hard cap: rolling 60-s window tracks POIs surfaced; new POI surface request blocked if cap would be exceeded; blocked POI stays in queue for later.
- `decision_window(confidence)` returns `Duration` per the linear 40 % → 30 s / 100 % → 120 s mapping; returns `None` for confidence < 40 %.
- Timeout → `forget` (POI removed from queue, no IgnoredItem).
- Decline command from `operator_bridge` → emit `MapObjectsAction::AppendIgnored(mgrs, class_group)` (dispatched by task 46).
- Health: `pois_in_queue`, `pois_per_min`.
## Scope
### Included
- Priority-ordered queue.
- Rate cap with rolling-window enforcement.
- Decision-window mapping.
- Timeout → forget.
### Excluded
- State machine (task 43).
- Evidence ladder (task 45).
- MapObjects dispatch (task 46).
- Gimbal issuance (task 47).
- Operator command dispatch (`operator_bridge` task 41).
## Acceptance Criteria
**AC-1: Priority ordering correct**
Given 3 POIs with `(confidence, proximity, age) = (0.9, 0.5, 0), (0.6, 0.9, 0), (0.7, 0.6, 60)`
When `next()` is called repeatedly
Then the order respects `confidence × proximity × age_factor`.
**AC-2: 5/min hard cap**
Given 10 POIs above 40 % confidence in 30 s
When the cap window is queried
Then at most 5 are surfaced; the rest remain queued until window rolls.
**AC-3: Decision window linear mapping**
Given confidences `[0.40, 0.70, 1.00]`
When `decision_window(c)` is called for each
Then the returned `Duration` is `[30 s, 75 s, 120 s]` (linear).
**AC-4: Sub-40 % not surfaced**
Given a POI with confidence 0.39
When considered for surface
Then `decision_window` returns `None`; the POI is NOT surfaced.
**AC-5: Timeout forgets**
Given a surfaced POI whose deadline expires with no operator action
When timeout fires
Then the POI is removed from the queue; no IgnoredItem is created.
## Non-Functional Requirements
**Performance**
- Queue insertion: ≤1 ms p99.
- Cap-window check: O(window_size) where window_size is small (5).
**Product**
- 5/min hard cap is a non-negotiable invariant from `description.md §8`.
## Runtime Completeness
- **Named capability**: priority POI queue + 5/min rate cap + confidence-scaled deadline.
- **Production code that must exist**: real priority math; real cap enforcement; real linear mapping.
- **Unacceptable substitutes**: ignoring the 5/min cap "for testing" in production is unacceptable.
@@ -0,0 +1,88 @@
# Evidence Ladder + Zoom-In Candidate Handling
**Task**: AZ-684_scan_controller_evidence_ladder
**Name**: Evidence ladder Tier1 → Tier2 → optional VLM + source-zoom-band candidate dispatch
**Description**: When entering `ZoomedIn { roi }`, dispatch the ROI to `semantic_analyzer` (Tier 2) and, if `vlm_enabled` and concealment-score warrants, to `vlm_client`. Combine evidence into the surfaced POI. Handle `MovementCandidate { source_zoom_band }`: if `zoomed_in` and inside current ROI → bump confidence; if outside ROI but inside FOV → enqueue candidate-POI; only interrupt current hold if higher priority.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-682_scan_controller_state_machine, AZ-683_scan_controller_poi_queue_and_window, AZ-660_detection_client_grpc_stream, AZ-671_semantic_analyzer_action_policy, AZ-672_vlm_client_provider_trait
**Component**: scan_controller
**Tracker**: AZ-684
**Epic**: AZ-635
## Problem
The system's brain combines three evidence sources at different cadences and costs: Tier 1 (per frame), Tier 2 (per zoom-in hold), Tier 3 / VLM (per endpoint hold, optional). The ladder is gated: VLM only fires when Tier 2 says "concealment warrants further inspection" AND `vlm_enabled`. Zoom-in MovementCandidates have priority semantics: a candidate in the current ROI bumps the current hold's confidence; outside the ROI it enqueues a candidate-POI; only a higher-priority candidate interrupts the current hold (per `description.md §4`).
## Outcome
- On `ZoomedOut → ZoomedIn { roi }` transition: emit `Tier2EvidenceRequest { roi }` to `semantic_analyzer`.
- On `Tier2Evidence` return: if `recommended_next_action ∈ {HoldEndpoint}` AND `vlm_enabled` AND concealment ≥ `vlm_invoke_threshold` → emit `VlmAssess { roi_crop, prompt }`; otherwise skip VLM (use Tier 1+2 evidence alone).
- Combine evidence into the surfaced POI (Tier 1 + Tier 2 evidence summary + VLM status).
- On `MovementCandidate { source_zoom_band: ZoomedIn }` while in `ZoomedIn { roi }`:
- If candidate position ∈ current ROI → bump current-POI confidence; do NOT enqueue separately.
- If candidate position ∉ current ROI but ∈ current zoomed FOV → enqueue as candidate-POI.
- Interrupt current hold only if `candidate_priority > current_hold_priority`.
- `vlm_status` carried through to operator: `Ok | Inconclusive | Timeout | SchemaInvalid | IpcError | Disabled`.
## Scope
### Included
- Evidence dispatch on state transitions.
- VLM gating logic.
- Zoom-in candidate handling rules.
- Evidence-combination → POI.
### Excluded
- The state machine itself (task 43).
- Queue + cap (task 44).
- MapObjects dispatch (task 46).
- Gimbal issuance (task 47).
- The Tier-2 / VLM components (separate epics).
## Acceptance Criteria
**AC-1: ZoomedIn entry dispatches Tier 2**
Given the state transitions from `ZoomedOut` to `ZoomedIn { roi: R }`
When the transition completes
Then exactly one `Tier2EvidenceRequest { roi: R }` is dispatched.
**AC-2: VLM invoked when warranted + enabled**
Given `Tier2Evidence { recommended_next_action: HoldEndpoint, concealment: 0.85 }` arrives, `vlm_enabled = true`, `vlm_invoke_threshold = 0.7`
When the evidence is processed
Then exactly one `VlmAssess { roi_crop, prompt }` is dispatched.
**AC-3: VLM skipped when disabled**
Given the same evidence, but `vlm_enabled = false` (or VLM trait is `DisabledVlmProvider`)
When the evidence is processed
Then NO `VlmAssess` is dispatched; the surfaced POI carries `vlm_status: Disabled`.
**AC-4: In-ROI candidate bumps confidence**
Given `ZoomedIn { roi: R }` with current-POI confidence 0.6, and `MovementCandidate { source_zoom_band: ZoomedIn, position ∈ R }`
When the candidate is processed
Then the current POI's confidence is bumped (per the config rule); no new candidate-POI is enqueued.
**AC-5: Out-of-ROI in-FOV candidate enqueues**
Given the same state, and a candidate at position ∉ R but ∈ current FOV
When processed
Then a new candidate-POI is enqueued with `source_zoom_band: ZoomedIn` priority weighting.
**AC-6: Lower-priority candidate does NOT interrupt**
Given a current hold with priority 0.8 and a candidate with priority 0.5
When the candidate is processed
Then the current hold is NOT interrupted; the candidate is enqueued for later.
## Non-Functional Requirements
**Performance**
- Evidence-ladder dispatch latency: ≤5 ms p99 from trigger to emit.
- Tier 2 + VLM budgets enforced upstream (those components own their timeouts).
## Contract
- Canonical typed models: `data_model.md §POI`, `§MovementCandidate`, `§Tier2Evidence`, `§VlmAssessment`.
## Runtime Completeness
- **Named capability**: evidence ladder Tier 1 → Tier 2 → optional VLM with zoom-in candidate routing.
- **Production code that must exist**: real dispatch logic; real VLM gating; real candidate-routing rules.
- **Unacceptable substitutes**: always running VLM regardless of `vlm_enabled` or concealment score (wastes Tier-3 budget) is unacceptable.
@@ -0,0 +1,82 @@
# MapObjects Dispatch + IgnoredItem Suppression + Degraded-Sync Behaviour
**Task**: AZ-685_scan_controller_mapobjects_dispatch
**Name**: Dispatch new/moved/existing/removed-candidate to mapobjects_store + IgnoredItem suppression + degraded-sync POI policy
**Description**: For each new detection or movement candidate: compute H3 cell, ask `mapobjects_store` to classify, only surface non-existing entries. Suppress new POIs whose `(MGRS, class_group)` matches an existing `IgnoredItem`. When `mapobjects_store::sync_state = degraded`, suppress diff classifications (don't corrupt the central log) but continue to surface POIs on Tier-1 + movement evidence alone.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-682_scan_controller_state_machine, AZ-684_scan_controller_evidence_ladder, AZ-665_mapobjects_store_h3_classify, AZ-666_mapobjects_store_ignored_and_pass_sweep, AZ-667_mapobjects_store_hydrate_and_pending
**Component**: scan_controller
**Tracker**: AZ-685
**Epic**: AZ-635
## Problem
The classification step decides whether a candidate becomes a POI or is silently absorbed as an existing MapObject. The IgnoredItem suppression keeps the operator from re-seeing rejections from earlier in the mission (or earlier missions hydrated from the central store). Degraded sync is the special case: the local store is out of sync with the canonical central store, so classification could corrupt the central log on next push — in that case suppress classifications but keep surfacing POIs to the operator on direct evidence.
## Outcome
- On every new candidate (Tier-1 detection OR movement candidate): emit `MapObjectsClassifyRequest { detection }`; consume the returned classification (`new | moved | existing | ignored`).
- Surface only `new | moved` to the POI queue; `existing` silently absorbs into the existing MapObject (bump observation_count via mapobjects_store); `ignored` is suppressed.
- On operator decline: emit `MapObjectsAction::AppendIgnored(mgrs, class_group)`.
- On `mapobjects_store::sync_state = degraded`: skip the classify step entirely; surface POIs on Tier-1 + movement evidence alone; suppress `AppendIgnored` writes (they'd dirty the central log on next push); set health → red.
- On `sync_state` recovery → resume classify dispatch.
## Scope
### Included
- Classify request / response routing.
- Ignored suppression at surface time.
- Decline → AppendIgnored emission.
- Degraded-sync POI policy (Tier-1 + movement only).
### Excluded
- The store itself (tasks 26-28).
- POI queue (task 44).
- Evidence ladder (task 45).
## Acceptance Criteria
**AC-1: Existing classification absorbs silently**
Given a detection that `mapobjects_store` classifies as `existing`
When dispatched
Then no POI is surfaced; the existing MapObject's observation_count is bumped (via mapobjects_store).
**AC-2: New classification surfaces**
Given a detection that classifies as `new`
When dispatched
Then a POI is enqueued (subject to queue + cap rules).
**AC-3: Ignored suppression at surface**
Given a candidate whose `(MGRS, class_group)` is in the IgnoredItem set
When dispatched
Then the classify step returns `ignored`; no POI is surfaced.
**AC-4: Decline appends IgnoredItem**
Given an operator decline command (from `operator_bridge` task 41) for POI X
When processed
Then `MapObjectsAction::AppendIgnored(X.mgrs, X.class_group)` is emitted exactly once.
**AC-5: Degraded sync suppresses classify**
Given `mapobjects_store::sync_state = degraded`
When new detections arrive
Then no classify requests are dispatched; POIs are surfaced on Tier-1 + movement evidence alone; health → red.
**AC-6: Sync recovery resumes classify**
Given sync_state transitions degraded → synced
When new detections arrive
Then classify dispatch resumes; health returns from red.
## Non-Functional Requirements
**Performance**
- Classify dispatch latency: ≤2 ms p99.
## Contract
- Canonical typed models: `data_model.md §MapObject`, `§IgnoredItem`, `§POI`.
## Runtime Completeness
- **Named capability**: classify-then-dispatch + ignored suppression + degraded-sync policy.
- **Production code that must exist**: real classify routing; real ignored suppression; real degraded-sync gate.
- **Unacceptable substitutes**: continuing to dispatch classify under degraded sync (corrupts central log) is unacceptable.
@@ -0,0 +1,84 @@
# Gimbal Command Issuance + Mission Hint Emission + Component-Health Gates
**Task**: AZ-686_scan_controller_gimbal_issuance
**Name**: Emit GimbalCommand per state tick + middle-waypoint hint on confirm + react to component-health degradations
**Description**: Translate state-machine transitions into `GimbalCommand`s (yaw / pitch / zoom): in `ZoomedOut` issue sweep commands; in `ZoomedIn { roi }` issue smooth-pan plan steps; in `TargetFollow` issue centre-on-target commands. On operator confirm, hand `middle-waypoint hint` to `mission_executor`. React to component-health degradations per `description.md §6 Failure Modes` (gimbal not ready → stay in state + alert; detection_client red → continue sweep, no Tier-1 POIs; etc.).
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-682_scan_controller_state_machine, AZ-683_scan_controller_poi_queue_and_window, AZ-684_scan_controller_evidence_ladder, AZ-654_gimbal_zoom_out_sweep, AZ-655_gimbal_smooth_pan_plan, AZ-656_gimbal_centre_on_target, AZ-648_mission_executor_state_machine
**Component**: scan_controller
**Tracker**: AZ-686
**Epic**: AZ-635
## Problem
The state machine + queue + evidence ladder produce DECISIONS; this task produces ACTIONS. The gimbal commands per state are sharp: `ZoomedOut` = sweep, `ZoomedIn` = smooth-pan plan from Tier 2, `TargetFollow` = centre-on-target. Component-health failures must produce explicit fallback behaviour (per `description.md §6`); the controller never silently drops a scan step.
## Outcome
- On `ZoomedOut` tick: emit `GimbalCommand::SweepStep` consistent with `gimbal_controller`'s sweep primitive (task 15).
- On `ZoomedIn { roi }` tick: emit the next pan-plan step from `Tier2Evidence.pan_plan` (or hold if plan is exhausted).
- On `TargetFollow { target_id }` tick: emit `GimbalCommand::CentreOnTarget { target_id }`.
- On operator confirm: emit `MissionAction::MiddleWaypointHint { target_mgrs, target_class }` to `mission_executor`.
- On `gimbal_controller` health red: stay in current state; surface alert; do NOT advance ticks that depend on gimbal acknowledgement.
- On `detection_client` health red: continue `ZoomedOut`; emit no Tier-1 POIs; movement candidates still flow.
- On `movement_detector` health red: continue; lose movement-candidate enqueueing.
- On `semantic_analyzer` health red: skip Tier 2; surface POIs with Tier-1-only evidence; flag in operator overlay.
- On `operator_bridge` disconnected: pause POI surfacing; continue `ZoomedOut`; resume on reconnect.
## Scope
### Included
- Per-state gimbal command emission.
- Middle-waypoint hint emission.
- Component-health-driven fallback behaviour table.
### Excluded
- State machine itself (task 43).
- Queue + cap (task 44).
- Evidence ladder (task 45).
- MapObjects dispatch (task 46).
- Gimbal primitives (tasks 15-17).
- Mission executor APIs (tasks 9-13).
## Acceptance Criteria
**AC-1: Per-state gimbal command**
Given current state `ZoomedOut | ZoomedIn { roi } | TargetFollow { target_id }`
When `tick` produces an action
Then the emitted `GimbalCommand` matches the per-state contract (sweep / pan-plan step / centre-on-target).
**AC-2: Operator confirm emits mission hint**
Given an operator confirm for POI X
When processed
Then `MissionAction::MiddleWaypointHint { target_mgrs: X.mgrs, target_class: X.class_group }` is emitted exactly once.
**AC-3: Gimbal red stays in state**
Given current state `ZoomedOut` and `gimbal_controller` health = red
When `tick` runs
Then no new gimbal command is emitted; state is unchanged; an alert is surfaced; health → yellow on scan_controller.
**AC-4: Detection red continues sweep without Tier-1**
Given current state `ZoomedOut` and `detection_client` health = red
When `tick` runs
Then sweep continues (gimbal commands still emitted), Tier-1 POIs are NOT surfaced; movement-candidate POIs still surface.
**AC-5: Semantic red surfaces Tier-1-only POIs**
Given `ZoomedIn { roi }` and `semantic_analyzer` health = red
When the zoom-in hold completes
Then any surfaced POI carries Tier-1-only evidence; the operator overlay shows the degraded flag.
## Non-Functional Requirements
**Performance**
- Gimbal command emit overhead: ≤2 ms p99.
- Zoom-out → zoom-in transition: ≤2 s including physical zoom (per `description.md §8`).
## Contract
- Canonical typed models: `data_model.md §GimbalCommand`, `§POI`.
## Runtime Completeness
- **Named capability**: per-state gimbal issuance + mission hint + health-driven fallback.
- **Production code that must exist**: real per-state command synthesis; real health-fallback table.
- **Unacceptable substitutes**: silently advancing the state when gimbal is red (commands fall on the floor) is unacceptable.