Files
autopilot/crates/mavlink_layer/src/internal/retry.rs
T
Oleksandr Bezdieniezhnykh 740bf37d76 [AZ-641] [AZ-642] [AZ-644] mavlink transport + codec + mission pull
Lands the second batch under epic AZ-626's implementation plan.

mavlink_layer (AZ-641 + AZ-642):
- Hand-rolled MAVLink v2 codec covering the §7.7 surface: HEARTBEAT,
  SYS_STATUS, SET_MODE, ATTITUDE, GLOBAL_POSITION_INT, MISSION_* (7),
  COMMAND_LONG, COMMAND_ACK, EXTENDED_SYS_STATE, STATUSTEXT (17 total).
- Streaming decoder demuxes arbitrary-sized byte arrivals, drops malformed
  frames with typed parse-error counters (crc/truncated/unknown_id/seq_gap),
  and surfaces sequence gaps without hard-failing the link.
- Encoder tracks the per-link tx_seq counter and applies the MAVLink v2
  trailing-zero payload truncation rule.
- UDP and POSIX-serial transports behind a single async Transport trait;
  the run loop owns transport open with bounded exponential backoff
  (2 s serial / 5 s UDP cap) and a tokio::select! per-link read+write
  loop.
- 1 Hz outbound HEARTBEAT scheduler + inbound-heartbeat watchdog that
  fires LinkUp / LinkLost on a broadcast channel and feeds health detail
  (connected, last_heartbeat_age_ms, signing_enabled, parse_errors).

mission_client (AZ-644):
- HTTPS GET /missions/{id} over rustls (no OpenSSL on the airframe).
- Bundled JSON Schema (crates/shared/contracts/mission-schema.json,
  draft-07, additionalProperties:false) validates every response;
  schema-invalid bodies surface as FetchError::SchemaInvalid with a
  1 KiB sample of the raw body for offline analysis.
- Transient failures (timeout, 5xx, 429) retry with bounded exponential
  backoff up to MissionClientOptions.max_attempts (default 5); permanent
  failures (4xx, malformed URL) abort immediately.
- Health surface mirrors AC-1's contract: last_fetch_ts,
  fetch_errors_total, schema_version, connection_state.

Caught and fixed before commit (NOT a code-review finding — caught by
the unit test that hand-computed CRC("123456789")): the hand-rolled
X.25 CRC accumulator was operating in u16 throughout. The MAVLink C
reference declares `tmp` as uint8_t, which silently truncates the
shifted-in bits. Round-trip tests passed (encoder and decoder shared
the bug); a real MAVLink peer would have rejected every frame. Fixed
by mirroring the C reference: `let mut tmp: u8 = …; tmp ^= tmp.wrapping_shl(4);`.
Added a regression test asserting CRC("123456789") == 0x6F91 against
pymavlink's reference value (NOT the textbook 0x29B1 — MAVLink uses a
byte-wise variant, not the bit-reflected CCITT).

AC verification (full detail in
_docs/03_implementation/batch_02_cycle1_report.md):

AZ-641: AC-1 + AC-3 + AC-4 verified via UDP loopback integration tests;
        AC-2 (serial) requires a socat pty pair and runs in the SITL/CI
        tier (test exists as #[ignore]-marked stub).
AZ-642: AC-1 + AC-2 + AC-3 verified via exhaustive codec round-trip and
        decoder negative-path tests; AC-4 (SITL round-trip) requires
        ArduPilot SITL — the CRC fix above means the codec is now
        wire-correct, ready for the sitl-conformance Woodpecker stage.
AZ-644: all four ACs verified via wiremock-driven integration tests.

Workspace gates green:
- cargo check --workspace                                clean
- cargo check --workspace --no-default-features          clean
- cargo fmt --all -- --check                             clean
- cargo clippy --workspace --all-targets -- -D warnings  clean
- cargo test --workspace                                 pass (1 expected ignore)

Layering invariants from module-layout.md hold: mavlink_layer and
mission_client are Layer 2 actors importing only `shared`; no sibling
Layer-2 imports; MavlinkHandle implements shared::contracts::MavlinkSink.

Jira: AZ-641, AZ-642, AZ-644 transitioned To Do → In Progress at batch
start; the matching In Testing transitions follow this commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 12:29:49 +03:00

91 lines
2.4 KiB
Rust

//! Bounded exponential backoff helper used by the transport reconnect loop.
//!
//! Caller pattern:
//! ```text
//! let mut backoff = ExponentialBackoff::new(base, cap);
//! loop {
//! match try_open().await {
//! Ok(t) => break t,
//! Err(e) => {
//! tracing::warn!(error = %e, "open failed");
//! tokio::time::sleep(backoff.next_delay()).await;
//! }
//! }
//! }
//! ```
use std::time::Duration;
#[derive(Debug, Clone)]
pub struct ExponentialBackoff {
base: Duration,
cap: Duration,
attempt: u32,
}
impl ExponentialBackoff {
pub fn new(base: Duration, cap: Duration) -> Self {
assert!(base > Duration::ZERO, "backoff base must be positive");
assert!(cap >= base, "backoff cap must be >= base");
Self {
base,
cap,
attempt: 0,
}
}
/// The next delay to sleep for. Doubles each call, capped at `cap`.
pub fn next_delay(&mut self) -> Duration {
let exp = self.attempt.min(31);
let delay = self
.base
.checked_mul(1u32 << exp)
.unwrap_or(self.cap)
.min(self.cap);
self.attempt = self.attempt.saturating_add(1);
delay
}
pub fn reset(&mut self) {
self.attempt = 0;
}
pub fn attempts(&self) -> u32 {
self.attempt
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn doubles_until_cap() {
// Arrange
let mut b = ExponentialBackoff::new(Duration::from_millis(100), Duration::from_secs(2));
// Act / Assert
assert_eq!(b.next_delay(), Duration::from_millis(100));
assert_eq!(b.next_delay(), Duration::from_millis(200));
assert_eq!(b.next_delay(), Duration::from_millis(400));
assert_eq!(b.next_delay(), Duration::from_millis(800));
assert_eq!(b.next_delay(), Duration::from_millis(1600));
assert_eq!(b.next_delay(), Duration::from_secs(2)); // capped
assert_eq!(b.next_delay(), Duration::from_secs(2)); // still capped
}
#[test]
fn reset_returns_to_base() {
// Arrange
let mut b = ExponentialBackoff::new(Duration::from_millis(50), Duration::from_secs(1));
let _ = b.next_delay();
let _ = b.next_delay();
// Act
b.reset();
// Assert
assert_eq!(b.next_delay(), Duration::from_millis(50));
}
}