[AZ-666] [AZ-673] [AZ-648] ignored set + UDS VLM + mission FSM batch 5
ci/woodpecker/push/build-arm Pipeline failed

AZ-666 mapobjects_store:
- internal/ignored.rs (HashSet<(mgrs, class_group)> for O(1) suppression)
- internal/passes.rs (per-region PassTracker with observed-id set and
  end-of-pass removed-candidate sweep)
- Classification::Ignored wired into classify; apply_decline +
  is_ignored + pass_start + end_of_pass on MapObjectsStoreHandle
- new tests/ignored_and_sweep.rs (3 AC + 2 supplementary)

AZ-673 vlm_client:
- internal/peer_cred.rs (Linux SO_PEERCRED via libc getsockopt;
  PeerCredOutcome::SkippedNonLinux on macOS dev hosts per
  description.md §8)
- internal/prompt.rs (pre-send ROI size + format + prompt
  non-emptiness validation)
- internal/wire.rs (length-prefixed JSON envelope with base64 ROI)
- internal/uds_client.rs (tokio UnixStream client; bounded
  reconnect; hard-stop on peer-cred mismatch; per-request deadline)
- VlmClient with both eager (open/connect) and lazy (new) ctor
- workspace Cargo.toml: base64 + libc as workspace deps

AZ-648 mission_executor:
- internal/types.rs (Variant, MissionState, TransitionKey,
  Telemetry, TransitionEvent, StepOutcome)
- internal/driver.rs (MissionDriver trait + DriverError +
  DriverAction)
- internal/fsm.rs (variant-agnostic Transition + FsmCore + step_one
  with per-transition retry budget keyed by TransitionKey)
- internal/multirotor.rs + internal/fixed_wing.rs (typed transition
  tables; multirotor has Armed/TakeOff, fixed-wing parks in
  WaitAuto for operator AUTO)
- public API: MissionExecutor::run spawns the FSM task and returns
  a clone-safe MissionExecutorHandle (state, health, subscribe,
  paused_reason, retry_count)
- new tests/state_machine.rs (AC-1..AC-4 via ScriptedDriver fake;
  SITL conformance lands with AZ-649 telemetry forwarding)

Workspace: cargo fmt + clippy -D warnings clean; full
cargo test --workspace --all-features green (1 ignored = AZ-665
perf gate). Tasks moved todo/ → done/, autodev state set to batch
6 selection.

Refs: _docs/03_implementation/batch_05_cycle1_report.md
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-19 16:54:00 +03:00
parent 69c0629350
commit b5cc0c321c
30 changed files with 3343 additions and 111 deletions
@@ -1,82 +0,0 @@
# Mission Executor State Machine (Both Variants)
**Task**: AZ-648_mission_executor_state_machine
**Name**: Variant-aware mission state machine
**Description**: Typed state machine for both multirotor and fixed-wing variants. Transitions are explicit and fully enumerated; bounded retry per transition with explicit max-retry. No infinite retry. State is in-process only; restart re-runs from `DISCONNECTED`.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-641_mavlink_transport_and_heartbeat, AZ-642_mavlink_codec, AZ-643_mavlink_ack_demux_and_signing
**Component**: mission_executor
**Tracker**: AZ-648
**Epic**: AZ-636
## Problem
`mission_executor` drives the airframe through a typed state machine. The flow differs per variant (multirotor vs fixed-wing); both variants share the same transition discipline and observability surface. Every transition has a bounded retry budget — on cap exhaustion health flips to red and the failure surfaces via `operator_bridge`. **No infinite retry** is permitted (per `architecture.md §5`).
## Outcome
- A typed `MissionState` enum encodes:
- Multirotor: `DISCONNECTED → CONNECTED → HEALTH_OK → BIT_OK → ARMED → TAKE_OFF → MISSION_UPLOADED → FLY_MISSION → LAND → POST_FLIGHT_SYNC → DONE`.
- Fixed-wing: `DISCONNECTED → CONNECTED → HEALTH_OK → BIT_OK → MISSION_UPLOADED → WAIT_AUTO → FLY_MISSION → LAND → POST_FLIGHT_SYNC → DONE`.
- `MissionExecutor::tick(now, telemetry)` advances the state machine; each transition is gated by an explicit guard.
- Per-transition retry counter + last-failure reason; on cap exhaustion the machine pauses and health → red.
- Health surface: current state, `state_duration_ms`, `transition_failures_by_state`, retry counts.
## Scope
### Included
- Both variant state graphs.
- Bounded retry per transition (configurable; default 3 attempts).
- `Variant` enum (`Multirotor`, `FixedWing`) wired from startup config.
- State-transition events published on an output channel for `scan_controller` and `telemetry_stream`.
- Mission re-upload sequence (`MISSION_CLEAR_ALL` → upload waypoints → `MISSION_SET_CURRENT`) — invoked from `MISSION_UPLOADED` entry guards.
### Excluded
- BIT (F9) — separate task 11.
- Lost-link failsafe ladder (F10) — separate task 12.
- Geofence + battery enforcement — separate task 13.
- Middle-waypoint re-upload — separate task 13 (logic) but exercised here for the base mission upload.
- Post-flight push trigger — separate task 13.
## Acceptance Criteria
**AC-1: Happy-path multirotor flow against SITL**
Given a multirotor SITL + `mavlink_layer` healthy + a valid in-memory mission
When `mission_executor::run()` is started
Then it reaches `DONE` traversing the multirotor state graph; transitions are observable as events; mission progress reaches all waypoints.
**AC-2: Happy-path fixed-wing flow against SITL**
Given a fixed-wing SITL + the operator's GCS sets AUTO mode externally
When `mission_executor::run()` is started
Then it traverses the fixed-wing graph (no `ARMED`/`TAKE_OFF`; `WAIT_AUTO` waits for the AUTO transition) and reaches `DONE`.
**AC-3: Bounded retry on mission-upload rejection**
Given SITL is configured to reject `MISSION_ACK` for the first attempt and accept the second
When the executor reaches `MISSION_UPLOADED`
Then the retry counter increments to 1, the second attempt succeeds, and the machine proceeds.
**AC-4: Cap exhaustion flips health to red**
Given SITL is configured to reject `MISSION_ACK` for all 3 default attempts
When the executor reaches `MISSION_UPLOADED`
Then the machine pauses, health → red, and the failure is observable on the output channel; no transition past `MISSION_UPLOADED`.
## Non-Functional Requirements
**Performance**
- Mission-upload retry budget: configurable; default 3 attempts.
- State-machine tick: ≤10 ms p99.
**Reliability**
- No infinite retry anywhere.
## Constraints
- `mavlink_layer::send_command` is the only path to the airframe.
- Variant is fixed at startup; no runtime swap.
## Runtime Completeness
- **Named capability**: variant-aware state machine + mission upload via MAVLink.
- **Production code that must exist**: explicit transition guards; real retry counters; real mission-upload sequence.
- **Allowed external stubs**: ArduPilot SITL is the conformance target (both `arducopter` and `arduplane`).
- **Unacceptable substitutes**: a generic "if-else cascade" instead of typed state transitions is not acceptable.
@@ -1,64 +0,0 @@
# IgnoredItem Set + End-of-Pass Sweep
**Task**: AZ-666_mapobjects_store_ignored_and_pass_sweep
**Name**: IgnoredItem set + end-of-pass removed-candidate sweep
**Description**: `IgnoredItem` set keyed by `(MGRS, class_group)`. `is_ignored(MGRS, class_group)` suppression query. End-of-pass sweep: after a region's pass ends, return objects in the region that were not re-observed as `removed_candidate`s.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-665_mapobjects_store_h3_classify
**Component**: mapobjects_store
**Tracker**: AZ-666
**Epic**: AZ-633
## Problem
When the operator declines a POI, the (MGRS, class_group) pair is added to the `IgnoredItem` set; subsequent detections matching the pair are suppressed BEFORE they reach the queue. Separately, when a scan pass over a region ends (signal from `scan_controller` / `mission_executor`), MapObjects that were known in the region but NOT re-observed during the pass should be flagged `removed_candidate` — the operator (not the system) decides actual removal.
## Outcome
- `IgnoredSet::append(item: IgnoredItem)` stores the entry.
- `is_ignored(mgrs, class_group) -> bool` answers in O(1).
- `MapObjectsStore::end_of_pass(region_bbox) -> Vec<RemovedCandidate>` returns objects in the region that were NOT re-observed since the pass started.
- Per-region pass tracker (start_ts, observed_ids) maintained.
## Scope
### Included
- `IgnoredSet` using a `HashSet<(H3Cell, ClassGroup)>` keyed structure.
- Class-group resolution (read group from config; e.g. `military_vehicle_group`, `concealed_position_group`, `movement_candidate`).
- Per-region pass tracker.
- End-of-pass sweep query.
### Excluded
- H3 classify (task 26).
- Pre-flight hydrate (task 28).
- Persistence (task 29).
- Append to `pending_observations` / `pending_ignored` (task 28).
## Acceptance Criteria
**AC-1: Ignored item suppresses subsequent detections**
Given `append(IgnoredItem { mgrs: M1, class_group: G })`
When `is_ignored(M1, G)` is called
Then it returns `true`; calls for other pairs return `false`.
**AC-2: End-of-pass returns un-observed objects**
Given a store with MapObjects at `M1, M2, M3` in region `R`
When the pass starts at `t0`, only `M1` is re-observed, and `end_of_pass(R)` is called at `t1`
Then it returns `[M2, M3]` as `RemovedCandidate`s.
**AC-3: End-of-pass excludes ignored**
Given `M2` was un-observed AND `is_ignored(M2.mgrs, M2.class_group) == true`
When `end_of_pass(R)` is called
Then `M2` is NOT in the returned list (ignored objects are not surfaced as removed-candidates).
## Non-Functional Requirements
**Performance**
- `is_ignored` p99 ≤1 ms.
- `end_of_pass` p99 ≤50 ms for a 30 km × 30 km region with ≤1 000 known objects.
## Runtime Completeness
- **Named capability**: IgnoredItem suppression + end-of-pass sweep.
- **Production code that must exist**: real HashSet + real per-region pass tracker.
- **Unacceptable substitutes**: re-querying the store for every detection without an `IgnoredSet` cache is unacceptable (latency violation).
@@ -1,72 +0,0 @@
# NanoLLM UDS Client + Peer-Cred Check + Pre-Send Validation
**Task**: AZ-673_vlm_client_nanollm_ipc
**Name**: Unix-domain socket client to NanoLLM + peer-cred check + ROI pre-send validation
**Description**: Maintain the Unix-domain-socket connection to the NanoLLM process. Perform a peer-credential check on connect (where supported). Validate ROI payload (size, format) BEFORE sending across the IPC channel. No network egress — UDS only.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-672_vlm_client_provider_trait
**Component**: vlm_client
**Tracker**: AZ-673
**Epic**: AZ-631
## Problem
VLM runs as a local NanoLLM/VILA1.5-3B process. The link is a Unix-domain socket — no network egress, ever. The connection MUST be peer-credential-checked on connect (Linux `SO_PEERCRED`) to confirm the peer process belongs to the expected user / GID; failure is a hard error requiring operator intervention, not a silent retry. ROI payloads MUST be validated for size + format BEFORE crossing the socket — never spend network IPC time on a payload that's known-too-big.
## Outcome
- `NanoLlmClient::connect(socket_path) -> Result<Self, ConnectError>` opens a UDS connection and performs `SO_PEERCRED` check; mismatch returns `Err(PeerCredMismatch)`.
- `NanoLlmClient::assess(roi_crop, prompt) -> VlmAssessment` validates the ROI pre-send and sends a single request; awaits one response within ≤5 s; returns `VlmAssessment`.
- Bounded reconnect on transport loss; on peer-cred failure NO reconnect happens (operator intervention required).
- Health surface: `vlm_latency_p50/p99`, `errors_by_kind`, `peer_cred_check_pass_rate`.
## Scope
### Included
- UDS client (`tokio::net::UnixStream`).
- `SO_PEERCRED` check (Linux; on macOS dev hosts, log a warning and proceed for development purposes only — production target is Jetson Linux).
- Pre-send size + format validation.
- Reconnect state machine (bounded).
- Bounded request deadline.
### Excluded
- VlmAssessment schema validation (task 35).
- Provider trait wiring (task 33).
## Acceptance Criteria
**AC-1: Happy path against fixture NanoLLM**
Given a fixture NanoLLM process listening on a UDS path with correct peer-cred
When `connect` is called and then `assess(roi, "is this concealed?")` is called
Then `connect` returns Ok; `assess` returns `VlmAssessment { status: Ok, label, confidence, .. }` within ≤5 s.
**AC-2: Peer-cred mismatch hard-fails connect**
Given a fixture peer with wrong UID
When `connect` is called
Then it returns `Err(PeerCredMismatch)`; subsequent connect attempts are blocked until config-driven intervention (no automatic retry); health → red.
**AC-3: Oversize ROI rejected pre-send**
Given an ROI larger than `max_roi_bytes`
When `assess(...)` is called
Then it returns `VlmAssessment { status: SchemaInvalid, .. }` synchronously without writing to the socket.
**AC-4: Response timeout returns explicit status**
Given a fixture NanoLLM that never responds within 5 s
When `assess(...)` is called
Then it returns `VlmAssessment { status: Timeout, .. }` after ≤5 s; subsequent requests are not blocked.
## Non-Functional Requirements
**Performance**
- Per-ROI latency: ≤5 s p99 (per `description.md §8`).
**Reliability**
- No network egress (hard rule — UDS only).
- Peer-cred mismatch never silently retried.
## Runtime Completeness
- **Named capability**: NanoLLM/VILA1.5-3B IPC over UDS + peer-cred enforcement.
- **Production code that must exist**: real UDS connection; real `SO_PEERCRED`; real pre-send validation.
- **Allowed external stubs**: a Python NanoLLM stub script in tests that echoes a canned response.
- **Unacceptable substitutes**: TCP to localhost instead of UDS is unacceptable (violates the no-network-egress rule).