mirror of
https://github.com/azaion/autopilot.git
synced 2026-06-22 03:01:10 +00:00
[AZ-641] [AZ-642] [AZ-644] mavlink transport + codec + mission pull
Lands the second batch under epic AZ-626's implementation plan.
mavlink_layer (AZ-641 + AZ-642):
- Hand-rolled MAVLink v2 codec covering the §7.7 surface: HEARTBEAT,
SYS_STATUS, SET_MODE, ATTITUDE, GLOBAL_POSITION_INT, MISSION_* (7),
COMMAND_LONG, COMMAND_ACK, EXTENDED_SYS_STATE, STATUSTEXT (17 total).
- Streaming decoder demuxes arbitrary-sized byte arrivals, drops malformed
frames with typed parse-error counters (crc/truncated/unknown_id/seq_gap),
and surfaces sequence gaps without hard-failing the link.
- Encoder tracks the per-link tx_seq counter and applies the MAVLink v2
trailing-zero payload truncation rule.
- UDP and POSIX-serial transports behind a single async Transport trait;
the run loop owns transport open with bounded exponential backoff
(2 s serial / 5 s UDP cap) and a tokio::select! per-link read+write
loop.
- 1 Hz outbound HEARTBEAT scheduler + inbound-heartbeat watchdog that
fires LinkUp / LinkLost on a broadcast channel and feeds health detail
(connected, last_heartbeat_age_ms, signing_enabled, parse_errors).
mission_client (AZ-644):
- HTTPS GET /missions/{id} over rustls (no OpenSSL on the airframe).
- Bundled JSON Schema (crates/shared/contracts/mission-schema.json,
draft-07, additionalProperties:false) validates every response;
schema-invalid bodies surface as FetchError::SchemaInvalid with a
1 KiB sample of the raw body for offline analysis.
- Transient failures (timeout, 5xx, 429) retry with bounded exponential
backoff up to MissionClientOptions.max_attempts (default 5); permanent
failures (4xx, malformed URL) abort immediately.
- Health surface mirrors AC-1's contract: last_fetch_ts,
fetch_errors_total, schema_version, connection_state.
Caught and fixed before commit (NOT a code-review finding — caught by
the unit test that hand-computed CRC("123456789")): the hand-rolled
X.25 CRC accumulator was operating in u16 throughout. The MAVLink C
reference declares `tmp` as uint8_t, which silently truncates the
shifted-in bits. Round-trip tests passed (encoder and decoder shared
the bug); a real MAVLink peer would have rejected every frame. Fixed
by mirroring the C reference: `let mut tmp: u8 = …; tmp ^= tmp.wrapping_shl(4);`.
Added a regression test asserting CRC("123456789") == 0x6F91 against
pymavlink's reference value (NOT the textbook 0x29B1 — MAVLink uses a
byte-wise variant, not the bit-reflected CCITT).
AC verification (full detail in
_docs/03_implementation/batch_02_cycle1_report.md):
AZ-641: AC-1 + AC-3 + AC-4 verified via UDP loopback integration tests;
AC-2 (serial) requires a socat pty pair and runs in the SITL/CI
tier (test exists as #[ignore]-marked stub).
AZ-642: AC-1 + AC-2 + AC-3 verified via exhaustive codec round-trip and
decoder negative-path tests; AC-4 (SITL round-trip) requires
ArduPilot SITL — the CRC fix above means the codec is now
wire-correct, ready for the sitl-conformance Woodpecker stage.
AZ-644: all four ACs verified via wiremock-driven integration tests.
Workspace gates green:
- cargo check --workspace clean
- cargo check --workspace --no-default-features clean
- cargo fmt --all -- --check clean
- cargo clippy --workspace --all-targets -- -D warnings clean
- cargo test --workspace pass (1 expected ignore)
Layering invariants from module-layout.md hold: mavlink_layer and
mission_client are Layer 2 actors importing only `shared`; no sibling
Layer-2 imports; MavlinkHandle implements shared::contracts::MavlinkSink.
Jira: AZ-641, AZ-642, AZ-644 transitioned To Do → In Progress at batch
start; the matching In Testing transitions follow this commit.
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,80 +0,0 @@
|
||||
# MAVLink Transport and Heartbeat
|
||||
|
||||
**Task**: AZ-641_mavlink_transport_and_heartbeat
|
||||
**Name**: MAVLink transport + heartbeat
|
||||
**Description**: Single connection abstraction (UDP or serial, picked at startup), 1 Hz outgoing HEARTBEAT, bounded reconnect on transport loss, autopilot-heartbeat timeout detection.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-640_initial_structure
|
||||
**Component**: mavlink_layer
|
||||
**Tracker**: AZ-641
|
||||
**Epic**: AZ-637
|
||||
|
||||
## Problem
|
||||
|
||||
`mavlink_layer` needs a single, stable connection abstraction to the airframe autopilot (ArduPilot / PX4). The connection is either UDP or serial — picked once at startup from the connection URI (`udp://...` or `serial:///dev/...`); no runtime URI swap. The link must self-heal on transport loss with bounded backoff and surface link health to the rest of the system without silent failure.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `MavlinkConnection` opens once at startup and reconnects automatically on transport loss with bounded exponential backoff (≤2 s on serial / ≤5 s on UDP).
|
||||
- A 1 Hz outgoing `HEARTBEAT` keeps the autopilot's GCS-link path alive.
|
||||
- Autopilot heartbeats received on the inbound stream are timestamped; a configurable wall-clock timeout flips link state to `lost` and surfaces it via `health()` and a typed signal consumed by `mission_executor`.
|
||||
- Health surface includes `connected`, `last_heartbeat_age_ms`, `signing_enabled`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Connection-URI parser (`udp://host:port` and `serial:///dev/...`).
|
||||
- UDP socket and serial port concrete transports behind a single `Transport` trait.
|
||||
- Bounded exponential backoff on transport-open failure and on read failure.
|
||||
- 1 Hz outgoing `HEARTBEAT` timer.
|
||||
- Inbound heartbeat timestamping + wall-clock timeout → `link_lost` signal.
|
||||
- `ComponentHealth` surface fields above.
|
||||
|
||||
### Excluded
|
||||
- Message encoding / decoding (separate task 03).
|
||||
- Command-ack demux and retry (separate task 04).
|
||||
- MAVLink-2 signing (separate task 04; only the `signing_enabled` flag is plumbed here).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: UDP connection opens and survives drop**
|
||||
Given a configured `udp://127.0.0.1:14550` endpoint
|
||||
When the autopilot is not listening at process start
|
||||
Then `MavlinkLayer::run()` retries with exponential backoff up to its cap and reports `connected = false` via `health()`; when the autopilot becomes reachable, the link reconnects within ≤5 s.
|
||||
|
||||
**AC-2: Serial connection opens and survives drop**
|
||||
Given a configured `serial:///dev/pts/N` endpoint backed by a `socat` pair (or equivalent)
|
||||
When the peer end is closed and reopened
|
||||
Then `mavlink_layer` reconnects within ≤2 s and resumes heartbeat emission.
|
||||
|
||||
**AC-3: Heartbeat emitted at 1 Hz**
|
||||
Given a healthy link
|
||||
When the connection is open for 10 s
|
||||
Then exactly 10 ± 1 outbound `HEARTBEAT` frames are observed by the peer.
|
||||
|
||||
**AC-4: Autopilot heartbeat loss flips link state**
|
||||
Given a healthy link that has been emitting peer heartbeats
|
||||
When the peer stops sending heartbeats
|
||||
Then within the configured timeout (default 3 s) `health()` reports `link_lost = true` and a typed `LinkLost` signal is emitted on the public output channel.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Reconnect latency: ≤2 s serial, ≤5 s UDP.
|
||||
- Heartbeat cadence: 1 Hz ± 50 ms.
|
||||
|
||||
**Reliability**
|
||||
- No infinite retry — bounded backoff cap is configurable (default 30 s).
|
||||
- Transport-open failure surfaces to health → red; never silently absorbed.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Hand-rolled — no third-party MAVLink SDK (per `architecture.md §5`).
|
||||
- Single connection per process; no runtime URI swap.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: MAVLink emission (HEARTBEAT) and link liveness.
|
||||
- **Production code that must exist**: real UDP socket and real serial port transports.
|
||||
- **Allowed external stubs**: in CI / integration tests, the peer end may be `socat` for serial or a loopback UDP listener.
|
||||
- **Unacceptable substitutes**: a "fake transport" that swallows writes and synthesises heartbeats is not allowed in production code — only as a test double under `#[cfg(test)]`.
|
||||
@@ -1,79 +0,0 @@
|
||||
# MAVLink Message Codec (§7.7 Surface)
|
||||
|
||||
**Task**: AZ-642_mavlink_codec
|
||||
**Name**: MAVLink v2 encode/decode for the §7.7 surface
|
||||
**Description**: Encode and decode the ~10–15 MAVLink v2 messages this codebase needs (the §7.7 surface only) with strict validation.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-640_initial_structure
|
||||
**Component**: mavlink_layer
|
||||
**Tracker**: AZ-642
|
||||
**Epic**: AZ-637
|
||||
|
||||
## Problem
|
||||
|
||||
Autopilot speaks a deliberately narrow MAVLink command surface (per `architecture.md §7.7` — ~10–15 messages). Adding messages outside that list requires explicit design review. A hand-rolled MAVLink v2 codec must encode outbound messages with correct sequence numbers, system / component IDs, and (when enabled) signing, and decode inbound messages with strict validation — rejecting malformed frames, unknown IDs, and signing failures.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Outbound encoder produces wire-correct MAVLink v2 frames for the message surface in §7.7 with monotonically incrementing per-link sequence numbers.
|
||||
- Inbound decoder parses the same surface, rejecting malformed frames, unknown message IDs, and frames with sequence-number gaps (logged, not hard-failed).
|
||||
- Decoded messages are exposed as a typed `MavlinkMessage` enum (one variant per supported message kind) on the inbound channel.
|
||||
- Per-message-kind parse error counters are exposed via `health()`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Encode + decode for `HEARTBEAT` (bidir), `COMMAND_LONG` outbound subset (arm/disarm, takeoff, set-mode, change-speed, change-alt, land, RTL), `COMMAND_ACK` inbound, `MISSION_COUNT`, `MISSION_REQUEST_INT`, `MISSION_ITEM_INT`, `MISSION_ACK`, `MISSION_SET_CURRENT`, `MISSION_CURRENT`, `MISSION_ITEM_REACHED`, `MISSION_CLEAR_ALL`, `GLOBAL_POSITION_INT`, `ATTITUDE`, `SYS_STATUS`, `EXTENDED_SYS_STATE`, `STATUSTEXT`, `SET_MODE`.
|
||||
- Per-link outbound `tx_seq` counter with wrap-around handling.
|
||||
- Strict size + CRC validation; reject malformed frames.
|
||||
- Unknown message IDs counted and dropped (not hard-failed).
|
||||
- Sequence-number gap detection (logged, not fatal).
|
||||
|
||||
### Excluded
|
||||
- Transport and reconnect (task 02).
|
||||
- Heartbeat scheduling (task 02).
|
||||
- Ack demultiplexing to callers (task 04).
|
||||
- MAVLink-2 signing (task 04).
|
||||
- Any message not in the §7.7 surface — adding new messages requires design review.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Round-trip every supported message**
|
||||
Given the encoder produces a frame for each message kind in the §7.7 surface with deterministic field values
|
||||
When the same frame is fed back through the decoder
|
||||
Then the typed `MavlinkMessage` matches the original fields and `parse_errors_total` does not increment.
|
||||
|
||||
**AC-2: Malformed frame is rejected**
|
||||
Given a byte buffer with a truncated payload or a wrong CRC
|
||||
When the decoder consumes it
|
||||
Then the frame is dropped, `parse_errors_total{kind="crc" | "truncated"}` increments by 1, and the codec continues processing subsequent bytes.
|
||||
|
||||
**AC-3: Unknown message ID is counted, not fatal**
|
||||
Given an inbound frame with a message ID outside the §7.7 surface
|
||||
When the decoder consumes it
|
||||
Then the frame is dropped, `parse_errors_total{kind="unknown_id"}` increments by 1, and decoding continues.
|
||||
|
||||
**AC-4: SITL round-trip**
|
||||
Given an ArduPilot SITL instance configured for `udp://127.0.0.1:14550`
|
||||
When `mavlink_layer` emits a `COMMAND_LONG` for `MAV_CMD_NAV_RETURN_TO_LAUNCH`
|
||||
Then SITL receives the command and replies with a matching `COMMAND_ACK`; the decoder emits a `MavlinkMessage::CommandAck` with `result = MAV_RESULT_ACCEPTED`.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Per-message encode + decode round-trip: ≤50 ms p99 on a healthy link (per `description.md §8`).
|
||||
|
||||
**Reliability**
|
||||
- No silent acceptance of malformed or signed-mismatch frames.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Hand-rolled — no third-party MAVLink SDK.
|
||||
- Adding any message outside the §7.7 surface requires an explicit design review noted in the PR description.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: MAVLink v2 wire-correct encode/decode for the §7.7 command surface.
|
||||
- **Production code that must exist**: real byte-level encoder + decoder; CRC computation; sequence number handling.
|
||||
- **Allowed external stubs**: ArduPilot SITL is the conformance reference for the SITL round-trip AC.
|
||||
- **Unacceptable substitutes**: a JSON or human-readable "MAVLink-like" envelope is not acceptable — the wire format must be MAVLink v2.
|
||||
@@ -1,81 +0,0 @@
|
||||
# Mission Pull + Schema Validation
|
||||
|
||||
**Task**: AZ-644_mission_client_pull_and_schema
|
||||
**Name**: HTTPS mission fetch + schema validation
|
||||
**Description**: HTTPS REST client to the external `missions` API, mission fetch by `mission_id` on startup, validate the response against the shared `mission-schema`, bounded retry on transient connection loss.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-640_initial_structure
|
||||
**Component**: mission_client
|
||||
**Tracker**: AZ-644
|
||||
**Epic**: AZ-638
|
||||
|
||||
## Problem
|
||||
|
||||
`autopilot` does not own the missions database — it fetches the mission by ID from the external `missions` API at startup. The response must validate against the shared `mission-schema`; on schema-invalid the mission MUST be rejected (no silent downcast). On transient connectivity failure, fetch is retried with bounded exponential backoff; on exceeding the cap, the mission start is refused and health flips to red.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `MissionClient::fetch(mission_id) -> Result<Mission, FetchError>` performs an HTTPS GET against the configured `missions` endpoint, validates the response against the bundled `mission-schema` (schema version recorded), and returns a typed `Mission` (`{ waypoints, geofences, return_point, mission_id, schema_version }`).
|
||||
- Transient failures (timeout, 5xx, DNS) are retried with bounded exponential backoff; max attempts configurable (default 5).
|
||||
- On schema mismatch the call returns `Err(SchemaInvalid)` with a size-capped sample of the raw response for offline analysis.
|
||||
- Health surface includes `last_fetch_ts`, `fetch_errors_total`, `schema_version`, `connection_state`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- HTTPS client (`reqwest` or `hyper` — pick the one already pinned in `shared`).
|
||||
- Auth header plumb-through (concrete scheme deferred to `../_docs/02_missions.md`; passed as opaque `Authorization` header).
|
||||
- Schema validation against `mission-schema` (bundled in `shared/contracts/`).
|
||||
- Bounded exponential backoff.
|
||||
|
||||
### Excluded
|
||||
- Middle-waypoint POST (task 06).
|
||||
- MapObjects pre-flight pull (task 07).
|
||||
- MapObjects post-flight push and durable queue (task 08).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Happy path fetch**
|
||||
Given a fixture `missions` API that returns a schema-valid mission JSON for `mission_id = M1`
|
||||
When `MissionClient::fetch("M1")` is called
|
||||
Then it returns `Ok(Mission { ... })` and `health()` reports `last_fetch_ts` updated, `connection_state = "ok"`.
|
||||
|
||||
**AC-2: Schema-invalid is rejected**
|
||||
Given a fixture `missions` API that returns a valid HTTP 200 but the JSON body has a missing required field
|
||||
When `MissionClient::fetch("M1")` is called
|
||||
Then it returns `Err(SchemaInvalid)` and `health()` records the failure; the raw response excerpt is logged size-capped.
|
||||
|
||||
**AC-3: Transient failure retries within budget**
|
||||
Given the missions API returns `503` for the first two attempts and `200` on the third
|
||||
When `MissionClient::fetch("M1")` is called
|
||||
Then it returns `Ok` after the third attempt; backoff is observed between attempts.
|
||||
|
||||
**AC-4: Cap exhaustion refuses start**
|
||||
Given the missions API is unreachable for all 5 default attempts
|
||||
When `MissionClient::fetch("M1")` is called
|
||||
Then it returns `Err(MaxRetriesExceeded)` and `health()` is red.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Startup fetch completes within ≤5 s on healthy connectivity.
|
||||
|
||||
**Reliability**
|
||||
- No silent downcast on schema mismatch.
|
||||
- No infinite retry — bounded backoff cap is configurable.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Mission schema is shared with the external `missions` repo; the schema file lives in `shared/contracts/mission-schema.json` (bundled at build time).
|
||||
|
||||
## Contract
|
||||
|
||||
- `mission-schema.json` is the authoritative wire contract. Owner: `../_docs/02_missions.md`. Bundled copy in `shared/contracts/mission-schema.json`.
|
||||
- Canonical typed model: `data_model.md §MissionItem`, `§MissionWaypoint`, `§Geofence`.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: HTTPS REST to the external `missions` API + JSON Schema validation.
|
||||
- **Production code that must exist**: real HTTPS request; real JSON Schema validator (e.g. `jsonschema` crate).
|
||||
- **Allowed external stubs**: in tests, the missions API can be a local `wiremock`/`mockito` server.
|
||||
- **Unacceptable substitutes**: skipping schema validation in production "for speed" is not acceptable; validation is a safety boundary.
|
||||
Reference in New Issue
Block a user