mirror of
https://github.com/azaion/autopilot.git
synced 2026-06-21 15:11:09 +00:00
[AZ-666] [AZ-673] [AZ-648] ignored set + UDS VLM + mission FSM batch 5
ci/woodpecker/push/build-arm Pipeline failed
ci/woodpecker/push/build-arm Pipeline failed
AZ-666 mapobjects_store: - internal/ignored.rs (HashSet<(mgrs, class_group)> for O(1) suppression) - internal/passes.rs (per-region PassTracker with observed-id set and end-of-pass removed-candidate sweep) - Classification::Ignored wired into classify; apply_decline + is_ignored + pass_start + end_of_pass on MapObjectsStoreHandle - new tests/ignored_and_sweep.rs (3 AC + 2 supplementary) AZ-673 vlm_client: - internal/peer_cred.rs (Linux SO_PEERCRED via libc getsockopt; PeerCredOutcome::SkippedNonLinux on macOS dev hosts per description.md §8) - internal/prompt.rs (pre-send ROI size + format + prompt non-emptiness validation) - internal/wire.rs (length-prefixed JSON envelope with base64 ROI) - internal/uds_client.rs (tokio UnixStream client; bounded reconnect; hard-stop on peer-cred mismatch; per-request deadline) - VlmClient with both eager (open/connect) and lazy (new) ctor - workspace Cargo.toml: base64 + libc as workspace deps AZ-648 mission_executor: - internal/types.rs (Variant, MissionState, TransitionKey, Telemetry, TransitionEvent, StepOutcome) - internal/driver.rs (MissionDriver trait + DriverError + DriverAction) - internal/fsm.rs (variant-agnostic Transition + FsmCore + step_one with per-transition retry budget keyed by TransitionKey) - internal/multirotor.rs + internal/fixed_wing.rs (typed transition tables; multirotor has Armed/TakeOff, fixed-wing parks in WaitAuto for operator AUTO) - public API: MissionExecutor::run spawns the FSM task and returns a clone-safe MissionExecutorHandle (state, health, subscribe, paused_reason, retry_count) - new tests/state_machine.rs (AC-1..AC-4 via ScriptedDriver fake; SITL conformance lands with AZ-649 telemetry forwarding) Workspace: cargo fmt + clippy -D warnings clean; full cargo test --workspace --all-features green (1 ignored = AZ-665 perf gate). Tasks moved todo/ → done/, autodev state set to batch 6 selection. Refs: _docs/03_implementation/batch_05_cycle1_report.md Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,82 @@
|
||||
# Mission Executor State Machine (Both Variants)
|
||||
|
||||
**Task**: AZ-648_mission_executor_state_machine
|
||||
**Name**: Variant-aware mission state machine
|
||||
**Description**: Typed state machine for both multirotor and fixed-wing variants. Transitions are explicit and fully enumerated; bounded retry per transition with explicit max-retry. No infinite retry. State is in-process only; restart re-runs from `DISCONNECTED`.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-640_initial_structure, AZ-641_mavlink_transport_and_heartbeat, AZ-642_mavlink_codec, AZ-643_mavlink_ack_demux_and_signing
|
||||
**Component**: mission_executor
|
||||
**Tracker**: AZ-648
|
||||
**Epic**: AZ-636
|
||||
|
||||
## Problem
|
||||
|
||||
`mission_executor` drives the airframe through a typed state machine. The flow differs per variant (multirotor vs fixed-wing); both variants share the same transition discipline and observability surface. Every transition has a bounded retry budget — on cap exhaustion health flips to red and the failure surfaces via `operator_bridge`. **No infinite retry** is permitted (per `architecture.md §5`).
|
||||
|
||||
## Outcome
|
||||
|
||||
- A typed `MissionState` enum encodes:
|
||||
- Multirotor: `DISCONNECTED → CONNECTED → HEALTH_OK → BIT_OK → ARMED → TAKE_OFF → MISSION_UPLOADED → FLY_MISSION → LAND → POST_FLIGHT_SYNC → DONE`.
|
||||
- Fixed-wing: `DISCONNECTED → CONNECTED → HEALTH_OK → BIT_OK → MISSION_UPLOADED → WAIT_AUTO → FLY_MISSION → LAND → POST_FLIGHT_SYNC → DONE`.
|
||||
- `MissionExecutor::tick(now, telemetry)` advances the state machine; each transition is gated by an explicit guard.
|
||||
- Per-transition retry counter + last-failure reason; on cap exhaustion the machine pauses and health → red.
|
||||
- Health surface: current state, `state_duration_ms`, `transition_failures_by_state`, retry counts.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Both variant state graphs.
|
||||
- Bounded retry per transition (configurable; default 3 attempts).
|
||||
- `Variant` enum (`Multirotor`, `FixedWing`) wired from startup config.
|
||||
- State-transition events published on an output channel for `scan_controller` and `telemetry_stream`.
|
||||
- Mission re-upload sequence (`MISSION_CLEAR_ALL` → upload waypoints → `MISSION_SET_CURRENT`) — invoked from `MISSION_UPLOADED` entry guards.
|
||||
|
||||
### Excluded
|
||||
- BIT (F9) — separate task 11.
|
||||
- Lost-link failsafe ladder (F10) — separate task 12.
|
||||
- Geofence + battery enforcement — separate task 13.
|
||||
- Middle-waypoint re-upload — separate task 13 (logic) but exercised here for the base mission upload.
|
||||
- Post-flight push trigger — separate task 13.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Happy-path multirotor flow against SITL**
|
||||
Given a multirotor SITL + `mavlink_layer` healthy + a valid in-memory mission
|
||||
When `mission_executor::run()` is started
|
||||
Then it reaches `DONE` traversing the multirotor state graph; transitions are observable as events; mission progress reaches all waypoints.
|
||||
|
||||
**AC-2: Happy-path fixed-wing flow against SITL**
|
||||
Given a fixed-wing SITL + the operator's GCS sets AUTO mode externally
|
||||
When `mission_executor::run()` is started
|
||||
Then it traverses the fixed-wing graph (no `ARMED`/`TAKE_OFF`; `WAIT_AUTO` waits for the AUTO transition) and reaches `DONE`.
|
||||
|
||||
**AC-3: Bounded retry on mission-upload rejection**
|
||||
Given SITL is configured to reject `MISSION_ACK` for the first attempt and accept the second
|
||||
When the executor reaches `MISSION_UPLOADED`
|
||||
Then the retry counter increments to 1, the second attempt succeeds, and the machine proceeds.
|
||||
|
||||
**AC-4: Cap exhaustion flips health to red**
|
||||
Given SITL is configured to reject `MISSION_ACK` for all 3 default attempts
|
||||
When the executor reaches `MISSION_UPLOADED`
|
||||
Then the machine pauses, health → red, and the failure is observable on the output channel; no transition past `MISSION_UPLOADED`.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Mission-upload retry budget: configurable; default 3 attempts.
|
||||
- State-machine tick: ≤10 ms p99.
|
||||
|
||||
**Reliability**
|
||||
- No infinite retry anywhere.
|
||||
|
||||
## Constraints
|
||||
|
||||
- `mavlink_layer::send_command` is the only path to the airframe.
|
||||
- Variant is fixed at startup; no runtime swap.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: variant-aware state machine + mission upload via MAVLink.
|
||||
- **Production code that must exist**: explicit transition guards; real retry counters; real mission-upload sequence.
|
||||
- **Allowed external stubs**: ArduPilot SITL is the conformance target (both `arducopter` and `arduplane`).
|
||||
- **Unacceptable substitutes**: a generic "if-else cascade" instead of typed state transitions is not acceptable.
|
||||
@@ -0,0 +1,64 @@
|
||||
# IgnoredItem Set + End-of-Pass Sweep
|
||||
|
||||
**Task**: AZ-666_mapobjects_store_ignored_and_pass_sweep
|
||||
**Name**: IgnoredItem set + end-of-pass removed-candidate sweep
|
||||
**Description**: `IgnoredItem` set keyed by `(MGRS, class_group)`. `is_ignored(MGRS, class_group)` suppression query. End-of-pass sweep: after a region's pass ends, return objects in the region that were not re-observed as `removed_candidate`s.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-640_initial_structure, AZ-665_mapobjects_store_h3_classify
|
||||
**Component**: mapobjects_store
|
||||
**Tracker**: AZ-666
|
||||
**Epic**: AZ-633
|
||||
|
||||
## Problem
|
||||
|
||||
When the operator declines a POI, the (MGRS, class_group) pair is added to the `IgnoredItem` set; subsequent detections matching the pair are suppressed BEFORE they reach the queue. Separately, when a scan pass over a region ends (signal from `scan_controller` / `mission_executor`), MapObjects that were known in the region but NOT re-observed during the pass should be flagged `removed_candidate` — the operator (not the system) decides actual removal.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `IgnoredSet::append(item: IgnoredItem)` stores the entry.
|
||||
- `is_ignored(mgrs, class_group) -> bool` answers in O(1).
|
||||
- `MapObjectsStore::end_of_pass(region_bbox) -> Vec<RemovedCandidate>` returns objects in the region that were NOT re-observed since the pass started.
|
||||
- Per-region pass tracker (start_ts, observed_ids) maintained.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- `IgnoredSet` using a `HashSet<(H3Cell, ClassGroup)>` keyed structure.
|
||||
- Class-group resolution (read group from config; e.g. `military_vehicle_group`, `concealed_position_group`, `movement_candidate`).
|
||||
- Per-region pass tracker.
|
||||
- End-of-pass sweep query.
|
||||
|
||||
### Excluded
|
||||
- H3 classify (task 26).
|
||||
- Pre-flight hydrate (task 28).
|
||||
- Persistence (task 29).
|
||||
- Append to `pending_observations` / `pending_ignored` (task 28).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Ignored item suppresses subsequent detections**
|
||||
Given `append(IgnoredItem { mgrs: M1, class_group: G })`
|
||||
When `is_ignored(M1, G)` is called
|
||||
Then it returns `true`; calls for other pairs return `false`.
|
||||
|
||||
**AC-2: End-of-pass returns un-observed objects**
|
||||
Given a store with MapObjects at `M1, M2, M3` in region `R`
|
||||
When the pass starts at `t0`, only `M1` is re-observed, and `end_of_pass(R)` is called at `t1`
|
||||
Then it returns `[M2, M3]` as `RemovedCandidate`s.
|
||||
|
||||
**AC-3: End-of-pass excludes ignored**
|
||||
Given `M2` was un-observed AND `is_ignored(M2.mgrs, M2.class_group) == true`
|
||||
When `end_of_pass(R)` is called
|
||||
Then `M2` is NOT in the returned list (ignored objects are not surfaced as removed-candidates).
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- `is_ignored` p99 ≤1 ms.
|
||||
- `end_of_pass` p99 ≤50 ms for a 30 km × 30 km region with ≤1 000 known objects.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: IgnoredItem suppression + end-of-pass sweep.
|
||||
- **Production code that must exist**: real HashSet + real per-region pass tracker.
|
||||
- **Unacceptable substitutes**: re-querying the store for every detection without an `IgnoredSet` cache is unacceptable (latency violation).
|
||||
@@ -0,0 +1,72 @@
|
||||
# NanoLLM UDS Client + Peer-Cred Check + Pre-Send Validation
|
||||
|
||||
**Task**: AZ-673_vlm_client_nanollm_ipc
|
||||
**Name**: Unix-domain socket client to NanoLLM + peer-cred check + ROI pre-send validation
|
||||
**Description**: Maintain the Unix-domain-socket connection to the NanoLLM process. Perform a peer-credential check on connect (where supported). Validate ROI payload (size, format) BEFORE sending across the IPC channel. No network egress — UDS only.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-640_initial_structure, AZ-672_vlm_client_provider_trait
|
||||
**Component**: vlm_client
|
||||
**Tracker**: AZ-673
|
||||
**Epic**: AZ-631
|
||||
|
||||
## Problem
|
||||
|
||||
VLM runs as a local NanoLLM/VILA1.5-3B process. The link is a Unix-domain socket — no network egress, ever. The connection MUST be peer-credential-checked on connect (Linux `SO_PEERCRED`) to confirm the peer process belongs to the expected user / GID; failure is a hard error requiring operator intervention, not a silent retry. ROI payloads MUST be validated for size + format BEFORE crossing the socket — never spend network IPC time on a payload that's known-too-big.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `NanoLlmClient::connect(socket_path) -> Result<Self, ConnectError>` opens a UDS connection and performs `SO_PEERCRED` check; mismatch returns `Err(PeerCredMismatch)`.
|
||||
- `NanoLlmClient::assess(roi_crop, prompt) -> VlmAssessment` validates the ROI pre-send and sends a single request; awaits one response within ≤5 s; returns `VlmAssessment`.
|
||||
- Bounded reconnect on transport loss; on peer-cred failure NO reconnect happens (operator intervention required).
|
||||
- Health surface: `vlm_latency_p50/p99`, `errors_by_kind`, `peer_cred_check_pass_rate`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- UDS client (`tokio::net::UnixStream`).
|
||||
- `SO_PEERCRED` check (Linux; on macOS dev hosts, log a warning and proceed for development purposes only — production target is Jetson Linux).
|
||||
- Pre-send size + format validation.
|
||||
- Reconnect state machine (bounded).
|
||||
- Bounded request deadline.
|
||||
|
||||
### Excluded
|
||||
- VlmAssessment schema validation (task 35).
|
||||
- Provider trait wiring (task 33).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Happy path against fixture NanoLLM**
|
||||
Given a fixture NanoLLM process listening on a UDS path with correct peer-cred
|
||||
When `connect` is called and then `assess(roi, "is this concealed?")` is called
|
||||
Then `connect` returns Ok; `assess` returns `VlmAssessment { status: Ok, label, confidence, .. }` within ≤5 s.
|
||||
|
||||
**AC-2: Peer-cred mismatch hard-fails connect**
|
||||
Given a fixture peer with wrong UID
|
||||
When `connect` is called
|
||||
Then it returns `Err(PeerCredMismatch)`; subsequent connect attempts are blocked until config-driven intervention (no automatic retry); health → red.
|
||||
|
||||
**AC-3: Oversize ROI rejected pre-send**
|
||||
Given an ROI larger than `max_roi_bytes`
|
||||
When `assess(...)` is called
|
||||
Then it returns `VlmAssessment { status: SchemaInvalid, .. }` synchronously without writing to the socket.
|
||||
|
||||
**AC-4: Response timeout returns explicit status**
|
||||
Given a fixture NanoLLM that never responds within 5 s
|
||||
When `assess(...)` is called
|
||||
Then it returns `VlmAssessment { status: Timeout, .. }` after ≤5 s; subsequent requests are not blocked.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Per-ROI latency: ≤5 s p99 (per `description.md §8`).
|
||||
|
||||
**Reliability**
|
||||
- No network egress (hard rule — UDS only).
|
||||
- Peer-cred mismatch never silently retried.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: NanoLLM/VILA1.5-3B IPC over UDS + peer-cred enforcement.
|
||||
- **Production code that must exist**: real UDS connection; real `SO_PEERCRED`; real pre-send validation.
|
||||
- **Allowed external stubs**: a Python NanoLLM stub script in tests that echoes a canned response.
|
||||
- **Unacceptable substitutes**: TCP to localhost instead of UDS is unacceptable (violates the no-network-egress rule).
|
||||
Reference in New Issue
Block a user