[AZ-651] [AZ-668] lost-link failsafe ladder + mapobjects persistence (batch 7)

AZ-651 (mission_executor lost-link ladder):
- LostLinkLadder pure-logic state machine (LinkOk -> Degraded -> Lost
  -> LinkLostInFollow + MavlinkLost branch). Configurable thresholds
  via LostLinkConfig.
- LostLinkCommandIssuer trait + MavlinkCommandIssuer production impl
  emitting MAV_CMD_NAV_RETURN_TO_LAUNCH via MavlinkHandle::send_command.
- LostLinkDriver task wires the ladder to operator-link watch, MAVLink
  LinkEvent broadcast, and optional target-follow signal. On RTL,
  driver calls the issuer THEN MissionExecutorHandle::failsafe_trigger.
- failsafe_trigger(LinkLost | LinkLostInFollow) short-circuits FlyMission
  -> Land via direct FSM state mutation + TransitionEvent emission;
  Paused state is intentionally NOT overridden.
- Tests: 4/4 ACs locally green (degraded-no-rtl; lost-fires-once;
  follow-grace; mavlink-loss-no-rtl) plus driver + FSM integration.

AZ-668 (mapobjects_store persistence):
- Snapshot serializable shape + Store::{to_snapshot,from_snapshot}
  round trip.
- MapObjectsPersistence async trait + JsonSnapshotEngine default impl
  (write to .tmp, sync_all, atomic rename, best-effort parent fsync).
- PersistenceError::{Corrupt, SchemaMismatch} surfaces explicit errors
  on bad blob; PersistenceMetrics tracks last_snapshot_ts,
  snapshot_size_bytes, snapshot_errors_total.
- MapObjectsStore::from_snapshot factory for crash recovery from the
  composition root.
- Tests: 4/4 ACs locally green (round-trip; atomic rename ignores
  partial .tmp; crash recovery preserves pending; corruption returns
  explicit error) plus schema-mismatch + metrics smoke checks.

Quality gates:
- cargo fmt: clean.
- cargo clippy -p mission_executor -p mapobjects_store --tests: 0 warns.
- cargo test --workspace: all green.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-19 18:59:28 +03:00
parent 23366a5c6d
commit 2bcd4a8059
16 changed files with 1940 additions and 8 deletions
@@ -0,0 +1,579 @@
//! AZ-651 — lost-link failsafe ladder.
//!
//! Two distinct link concerns are tracked:
//!
//! 1. **Operator modem link** (Ground-Station ↔ autopilot). This is the
//! link the ladder watches. Its state climbs:
//! `LinkOk` → `LinkDegraded` (530 s) → `LinkLost` (>30 s) →
//! (optionally) `LinkLostInFollow` when target-follow is active, with
//! a configurable 30 s grace before promotion to `LinkLost`.
//!
//! 2. **MAVLink link** (autopilot ↔ ArduPilot). This one is owned by
//! `mavlink_layer`'s heartbeat watchdog. When *it* fires `LinkLost`,
//! the airframe runs its OWN built-in failsafe — autopilot must NOT
//! issue `MAV_CMD_NAV_RETURN_TO_LAUNCH` itself. The ladder records the
//! state (`MavlinkLost`) and surfaces it to health, but never emits
//! an RTL trigger while the MAVLink link is down.
//!
//! The ladder is **pure logic**: `tick(now, input)` is deterministic.
//! Wiring (subscribe to MAVLink `LinkEvent`s, drive ticks on a 100 ms
//! schedule, call `MavlinkHandle::send_command`, set the executor's
//! failsafe flag) lives in [`LostLinkDriver::run`].
use std::sync::Arc;
use std::time::Duration;
use async_trait::async_trait;
use tokio::sync::{broadcast, watch, Mutex};
use tokio::task::JoinHandle;
use tokio::time::Instant;
use mavlink_layer::{CommandLong, LinkEvent, MavlinkHandle, SendCommandError};
use shared::error::AutopilotError;
use crate::FailsafeKind;
use crate::MissionExecutorHandle;
/// MAVLink `MAV_CMD_NAV_RETURN_TO_LAUNCH` command id.
pub const MAV_CMD_NAV_RETURN_TO_LAUNCH: u16 = 20;
/// Default operator-link thresholds and tick cadence per AZ-651 §Outcome.
#[derive(Debug, Clone, Copy)]
pub struct LostLinkConfig {
/// Time-since-last-operator-heartbeat after which the ladder moves
/// from `LinkOk` to `LinkDegraded`. Default 5 s.
pub degraded_after: Duration,
/// Time-since-last-operator-heartbeat after which the ladder moves
/// from `LinkDegraded` to `LinkLost` (or `LinkLostInFollow` if
/// target-follow is active). Default 30 s.
pub lost_after: Duration,
/// Additional grace before `LinkLostInFollow` is promoted to
/// `LinkLost` (and RTL fires). Default 30 s — operators commonly
/// have brief connectivity drops mid-follow.
pub follow_grace: Duration,
/// Driver tick cadence. Default 100 ms (well under the AZ-651 NFR
/// budget of ≤5 ms per tick — the cadence is what we wait on; the
/// tick itself runs in microseconds).
pub tick_interval: Duration,
}
impl Default for LostLinkConfig {
fn default() -> Self {
Self {
degraded_after: Duration::from_secs(5),
lost_after: Duration::from_secs(30),
follow_grace: Duration::from_secs(30),
tick_interval: Duration::from_millis(100),
}
}
}
/// Where the ladder currently sits.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
#[non_exhaustive]
pub enum LadderState {
/// Operator-link heartbeats are arriving within `degraded_after`.
LinkOk,
/// Operator-link heartbeats have been absent for `degraded_after`
/// but less than `lost_after`. Health yellow; no command issued.
LinkDegraded,
/// Operator-link absent past `lost_after`, target-follow inactive.
/// On entry, the driver issues RTL exactly once and flips the
/// executor's failsafe flag.
LinkLost,
/// Operator-link absent past `lost_after` AND target-follow is
/// active. Stay here for `follow_grace`, then promote to `LinkLost`.
LinkLostInFollow,
/// The MAVLink link to ArduPilot is down. Airframe handles its own
/// failsafe; autopilot NEVER issues RTL itself in this state. The
/// ladder still tracks operator-link state internally — once
/// MAVLink recovers, the operator-link ladder picks up where it
/// left off.
MavlinkLost,
}
/// Per-tick input to the ladder. Externalising every signal keeps the
/// logic pure and deterministic; tests construct these directly.
#[derive(Debug, Clone, Copy)]
pub struct LadderInput {
pub now: Instant,
pub op_link_up: bool,
pub mavlink_link_up: bool,
pub target_follow_active: bool,
}
/// Per-tick output. `rtl_should_fire` is the actionable bit — when
/// `true`, the caller must issue exactly one `MAV_CMD_NAV_RETURN_TO_LAUNCH`
/// and flip the executor's failsafe flag. `previous_state` is exposed
/// (rather than reconstructed) so consumers don't have to track it.
#[derive(Debug, Clone, Copy)]
pub struct LadderOutput {
pub previous_state: LadderState,
pub state: LadderState,
pub state_changed: bool,
pub rtl_should_fire: bool,
}
/// Broadcast event emitted on state transitions and RTL trigger. Lets
/// `operator_bridge` / `telemetry_stream` surface failsafe state to the
/// operator UI without polling.
#[derive(Debug, Clone, Copy)]
#[non_exhaustive]
pub enum LadderEvent {
StateChanged {
from: LadderState,
to: LadderState,
},
RtlIssued {
rtl_count: u64,
},
RtlSendFailed {
rtl_count: u64,
},
}
/// Pure ladder logic. Stateful only across ticks; one `LostLinkLadder`
/// per autopilot instance.
#[derive(Debug)]
pub struct LostLinkLadder {
config: LostLinkConfig,
state: LadderState,
/// `Some(t)` while operator link has been down since `t`.
op_link_down_since: Option<Instant>,
/// `Some(t)` while we have been in `LinkLostInFollow` since `t`.
follow_lost_since: Option<Instant>,
/// Count of RTL triggers since construction. Exposed for health.
rtl_count: u64,
/// `Some(t)` when the operator link last transitioned down. Public
/// via [`LostLinkLadder::time_in_state`].
state_entered_at: Option<Instant>,
}
impl LostLinkLadder {
pub fn new(config: LostLinkConfig) -> Self {
Self {
config,
state: LadderState::LinkOk,
op_link_down_since: None,
follow_lost_since: None,
rtl_count: 0,
state_entered_at: None,
}
}
pub fn state(&self) -> LadderState {
self.state
}
pub fn rtl_count(&self) -> u64 {
self.rtl_count
}
/// How long has the ladder been in its current state? `None` if the
/// ladder has never advanced past its initial `LinkOk`.
pub fn time_in_state(&self, now: Instant) -> Option<Duration> {
self.state_entered_at
.map(|t| now.saturating_duration_since(t))
}
/// Advance the ladder by one tick. Returns the actionable outcome.
/// Caller is responsible for honouring `rtl_should_fire`.
pub fn tick(&mut self, input: LadderInput) -> LadderOutput {
let prev = self.state;
// MAVLink down dominates — airframe handles its own failsafe.
// Track operator-link state internally so when MAVLink recovers
// we resume the right rung of the ladder, but never fire RTL.
if !input.mavlink_link_up {
self.advance_op_link_tracking(input);
self.set_state(LadderState::MavlinkLost, input.now, prev);
return LadderOutput {
previous_state: prev,
state: LadderState::MavlinkLost,
state_changed: prev != LadderState::MavlinkLost,
rtl_should_fire: false,
};
}
// MAVLink is up. Pure operator-link ladder.
let new_state = self.compute_op_link_state(input);
let entering_lost = new_state == LadderState::LinkLost && prev != LadderState::LinkLost;
let rtl_should_fire = entering_lost;
if rtl_should_fire {
self.rtl_count += 1;
}
self.set_state(new_state, input.now, prev);
LadderOutput {
previous_state: prev,
state: new_state,
state_changed: prev != new_state,
rtl_should_fire,
}
}
/// Update `op_link_down_since` / `follow_lost_since` from the
/// current input WITHOUT promoting the visible state. Used while
/// the ladder is masked by `MavlinkLost`.
fn advance_op_link_tracking(&mut self, input: LadderInput) {
if input.op_link_up {
self.op_link_down_since = None;
self.follow_lost_since = None;
} else if self.op_link_down_since.is_none() {
self.op_link_down_since = Some(input.now);
}
}
fn compute_op_link_state(&mut self, input: LadderInput) -> LadderState {
if input.op_link_up {
self.op_link_down_since = None;
self.follow_lost_since = None;
return LadderState::LinkOk;
}
let down_since = *self.op_link_down_since.get_or_insert(input.now);
let elapsed = input.now.saturating_duration_since(down_since);
if elapsed < self.config.degraded_after {
// Still within the initial OK window. Keep `down_since`
// sticky so a short blip doesn't reset the clock.
LadderState::LinkOk
} else if elapsed < self.config.lost_after {
self.follow_lost_since = None;
LadderState::LinkDegraded
} else if input.target_follow_active {
let follow_since = *self.follow_lost_since.get_or_insert(input.now);
if input.now.saturating_duration_since(follow_since) < self.config.follow_grace {
LadderState::LinkLostInFollow
} else {
LadderState::LinkLost
}
} else {
self.follow_lost_since = None;
LadderState::LinkLost
}
}
fn set_state(&mut self, new_state: LadderState, now: Instant, prev: LadderState) {
if prev != new_state {
self.state_entered_at = Some(now);
}
self.state = new_state;
}
}
// ============================================================================
// Driver — owns the ladder and wires it to MAVLink + the executor.
// ============================================================================
/// Pluggable command issuer. Production wires this to
/// [`MavlinkCommandIssuer`] which calls
/// `MavlinkHandle::send_command(MAV_CMD_NAV_RETURN_TO_LAUNCH)`. Tests
/// supply a spy implementation so RTL invocations can be counted
/// without spinning up a real MAVLink loopback.
///
/// The trait deliberately stays narrow (`issue_rtl` only) — adding more
/// commands here would couple every failsafe to one trait, and
/// AZ-652 / AZ-650 each own their own command surface.
#[async_trait]
pub trait LostLinkCommandIssuer: Send + Sync {
async fn issue_rtl(&self) -> Result<(), AutopilotError>;
}
/// Production `LostLinkCommandIssuer` backed by `mavlink_layer`.
#[derive(Debug, Clone)]
pub struct MavlinkCommandIssuer {
pub handle: MavlinkHandle,
pub target_system: u8,
pub target_component: u8,
/// Optional override for the `send_command` deadline (default uses
/// `MavlinkLayerOptions::command_ack_deadline`).
pub ack_deadline: Option<Duration>,
}
impl MavlinkCommandIssuer {
pub fn new(handle: MavlinkHandle, target_system: u8, target_component: u8) -> Self {
Self {
handle,
target_system,
target_component,
ack_deadline: None,
}
}
}
#[async_trait]
impl LostLinkCommandIssuer for MavlinkCommandIssuer {
async fn issue_rtl(&self) -> Result<(), AutopilotError> {
let cmd = CommandLong {
param1: 0.0,
param2: 0.0,
param3: 0.0,
param4: 0.0,
param5: 0.0,
param6: 0.0,
param7: 0.0,
command: MAV_CMD_NAV_RETURN_TO_LAUNCH,
target_system: self.target_system,
target_component: self.target_component,
confirmation: 0,
};
self.handle
.send_command(cmd, self.ack_deadline)
.await
.map(|_ack| ())
.map_err(|e| match e {
SendCommandError::Timeout(d) => {
AutopilotError::Internal(format!("RTL command ack timeout after {d:?}"))
}
SendCommandError::Duplicate(id) => {
AutopilotError::Internal(format!("RTL command duplicate in flight (id={id})"))
}
SendCommandError::ChannelClosed(reason) => {
AutopilotError::Internal(format!("RTL command channel closed: {reason}"))
}
})
}
}
/// Public read-side handle for the lost-link ladder. Cloneable; safe
/// to share across `operator_bridge` / `telemetry_stream` / health.
#[derive(Debug, Clone)]
pub struct LostLinkLadderHandle {
inner: Arc<Mutex<LostLinkLadder>>,
events_tx: broadcast::Sender<LadderEvent>,
}
impl LostLinkLadderHandle {
pub async fn state(&self) -> LadderState {
self.inner.lock().await.state()
}
pub async fn rtl_count(&self) -> u64 {
self.inner.lock().await.rtl_count()
}
pub fn subscribe(&self) -> broadcast::Receiver<LadderEvent> {
self.events_tx.subscribe()
}
}
/// Driver — owns the ladder and ticks it from external signals.
///
/// Construct with [`LostLinkDriver::new`] then call
/// [`LostLinkDriver::spawn`]. The returned [`LostLinkLadderHandle`] is
/// read-only; events can be subscribed to via the handle.
pub struct LostLinkDriver<C: LostLinkCommandIssuer + 'static> {
config: LostLinkConfig,
command_issuer: Arc<C>,
executor: MissionExecutorHandle,
/// Operator-link state — `true` means heartbeats arriving. Updated
/// externally by `operator_bridge` / `telemetry_stream` wiring.
op_link_rx: watch::Receiver<bool>,
/// Most-recent MAVLink link event. Used to flip the
/// `mavlink_link_up` flag fed into the ladder.
mavlink_events_rx: broadcast::Receiver<LinkEvent>,
/// Optional override of "now" — for tests. Production passes
/// `None`, which makes the driver use `tokio::time::Instant::now`.
now_source: Option<Arc<dyn Fn() -> Instant + Send + Sync>>,
/// Optional target-follow signal. `None` means follow-grace is
/// never engaged (the case for current autopilot — AZ-684 will
/// wire `scan_controller`'s target-follow state in later).
target_follow_rx: Option<watch::Receiver<bool>>,
/// Initial assumption for MAVLink link state. Production hands in
/// `false` (link is initially down until the first inbound
/// heartbeat arrives); the driver flips this to `true` on
/// `LinkEvent::LinkUp`.
initial_mavlink_up: bool,
}
impl<C: LostLinkCommandIssuer + 'static> LostLinkDriver<C> {
pub fn new(
config: LostLinkConfig,
command_issuer: Arc<C>,
executor: MissionExecutorHandle,
op_link_rx: watch::Receiver<bool>,
mavlink_events_rx: broadcast::Receiver<LinkEvent>,
) -> Self {
Self {
config,
command_issuer,
executor,
op_link_rx,
mavlink_events_rx,
now_source: None,
target_follow_rx: None,
initial_mavlink_up: false,
}
}
/// Provide a target-follow watch channel. When the watched value
/// is `true`, the ladder engages the `LinkLostInFollow` grace.
pub fn with_target_follow(mut self, rx: watch::Receiver<bool>) -> Self {
self.target_follow_rx = Some(rx);
self
}
/// Treat the MAVLink link as up from the start (skip waiting for
/// the first `LinkUp` event). Useful in tests where the MAVLink
/// peer is presumed healthy.
pub fn with_initial_mavlink_up(mut self, up: bool) -> Self {
self.initial_mavlink_up = up;
self
}
/// Override the clock — only used in tests. Production omits this.
pub fn with_now_source(
mut self,
f: Arc<dyn Fn() -> Instant + Send + Sync>,
) -> Self {
self.now_source = Some(f);
self
}
/// Spawn the driver task. Returns a read-side handle plus the
/// background task's join handle.
pub fn spawn(self, mut shutdown: watch::Receiver<bool>) -> (LostLinkLadderHandle, JoinHandle<()>) {
let (events_tx, _events_rx) = broadcast::channel::<LadderEvent>(64);
let ladder = Arc::new(Mutex::new(LostLinkLadder::new(self.config)));
let handle = LostLinkLadderHandle {
inner: ladder.clone(),
events_tx: events_tx.clone(),
};
let LostLinkDriver {
config,
command_issuer,
executor,
mut op_link_rx,
mut mavlink_events_rx,
now_source,
target_follow_rx,
initial_mavlink_up,
} = self;
let mut tf_rx = target_follow_rx;
let mut mavlink_link_up = initial_mavlink_up;
let join = tokio::spawn(async move {
let mut ticker = tokio::time::interval(config.tick_interval);
ticker.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip);
loop {
tokio::select! {
biased;
_ = shutdown.changed() => {
tracing::info!("lost_link driver shutdown");
return;
}
Ok(ev) = mavlink_events_rx.recv() => {
match ev {
LinkEvent::LinkUp => mavlink_link_up = true,
LinkEvent::LinkLost => mavlink_link_up = false,
}
}
_ = ticker.tick() => {
let now = match &now_source {
Some(f) => (f)(),
None => Instant::now(),
};
let op_link_up = *op_link_rx.borrow_and_update();
let target_follow_active = tf_rx
.as_mut()
.map(|rx| *rx.borrow_and_update())
.unwrap_or(false);
let output = {
let mut guard = ladder.lock().await;
guard.tick(LadderInput {
now,
op_link_up,
mavlink_link_up,
target_follow_active,
})
};
if output.state_changed {
let _ = events_tx.send(LadderEvent::StateChanged {
from: output.previous_state,
to: output.state,
});
}
if output.rtl_should_fire {
let rtl_count_for_log = {
let g = ladder.lock().await;
g.rtl_count()
};
tracing::warn!(
rtl_count = rtl_count_for_log,
"lost_link: operator link lost; issuing RTL"
);
match command_issuer.issue_rtl().await {
Ok(()) => {
let count = ladder.lock().await.rtl_count();
let _ = events_tx
.send(LadderEvent::RtlIssued { rtl_count: count });
}
Err(e) => {
let count = ladder.lock().await.rtl_count();
tracing::error!(error=%e, "lost_link RTL command failed");
let _ = events_tx
.send(LadderEvent::RtlSendFailed { rtl_count: count });
}
}
if let Err(e) =
executor.failsafe_trigger(FailsafeKind::LinkLost).await
{
tracing::error!(error=%e, "lost_link: executor failsafe_trigger failed");
}
}
}
}
}
});
(handle, join)
}
}
#[cfg(test)]
mod tests {
use super::*;
fn make_config() -> LostLinkConfig {
LostLinkConfig {
degraded_after: Duration::from_millis(50),
lost_after: Duration::from_millis(150),
follow_grace: Duration::from_millis(100),
tick_interval: Duration::from_millis(10),
}
}
#[test]
fn empty_state_starts_link_ok() {
// Arrange
let l = LostLinkLadder::new(make_config());
// Assert
assert_eq!(l.state(), LadderState::LinkOk);
assert_eq!(l.rtl_count(), 0);
}
#[test]
fn mavlink_lost_short_circuits_rtl() {
// Arrange — op-link is down for plenty long enough to trigger RTL
let mut l = LostLinkLadder::new(make_config());
let t0 = Instant::now();
// Act — but MAVLink is down too. Should never fire RTL.
for ms in (0..500).step_by(10) {
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(ms),
op_link_up: false,
mavlink_link_up: false,
target_follow_active: false,
});
assert!(!out.rtl_should_fire, "rtl fired at t={ms}");
}
// Assert
assert_eq!(l.state(), LadderState::MavlinkLost);
assert_eq!(l.rtl_count(), 0);
}
}
@@ -3,6 +3,7 @@
pub mod driver;
pub mod fixed_wing;
pub mod fsm;
pub mod lost_link;
pub mod multirotor;
pub mod telemetry;
pub mod types;
+55 -4
View File
@@ -33,6 +33,11 @@ use shared::models::mission::{Coordinate, MissionItem, MissionWaypoint};
mod internal;
pub use internal::driver::{DriverError, MissionDriver};
pub use internal::lost_link::{
LadderEvent, LadderInput, LadderOutput, LadderState, LostLinkCommandIssuer, LostLinkConfig,
LostLinkDriver, LostLinkLadder, LostLinkLadderHandle, MavlinkCommandIssuer,
MAV_CMD_NAV_RETURN_TO_LAUNCH,
};
pub use internal::telemetry::{
Consumer, DropCountingReceiver, MavlinkProjection, TelemetryForwarder,
};
@@ -244,10 +249,56 @@ impl MissionExecutorHandle {
))
}
pub async fn failsafe_trigger(&self, _kind: FailsafeKind) -> Result<()> {
Err(AutopilotError::NotImplemented(
"mission_executor::failsafe_trigger (AZ-651)",
))
/// Apply a failsafe response immediately.
///
/// AZ-651 implements the link-loss family: `LinkLost` and
/// `LinkLostInFollow` both cause the FSM to short-circuit from
/// `FlyMission` to `Land` (and the lost-link driver issues
/// `MAV_CMD_NAV_RETURN_TO_LAUNCH` separately so the airframe also
/// returns home — the FSM transition reflects the autopilot's
/// internal accounting). Other states are NOT overridden: if the
/// FSM is still in `Disconnected` / `Armed` / `TakeOff` /
/// `MissionUploaded`, the airframe failsafe is the right authority
/// and we let it handle the abort.
///
/// Battery and geofence failsafes (`BatteryRtl`, `BatteryHardFloor`,
/// `GeofenceInclusion`, `GeofenceExclusion`) land in AZ-652 with
/// their own state-aware overrides; calling this method with one
/// of those kinds returns `NotImplemented` for now.
///
/// Calling this while the FSM is already `Paused` is a no-op (we
/// do not clobber the existing pause).
pub async fn failsafe_trigger(&self, kind: FailsafeKind) -> Result<()> {
match kind {
FailsafeKind::LinkLost | FailsafeKind::LinkLostInFollow => {
let mut core = self.core.lock().await;
if core.state == MissionState::FlyMission {
let from = core.state;
core.state = MissionState::Land;
let _ = self.events_tx.send(TransitionEvent {
variant: core.variant,
from,
to: MissionState::Land,
at: chrono::Utc::now(),
retry_count: 0,
});
}
// Other states (incl. Paused) — leave alone. The
// airframe's own failsafe (or whatever paused us) is
// authoritative.
Ok(())
}
FailsafeKind::LinkDegraded => {
// Degraded is yellow-health-only; no transition needed.
Ok(())
}
FailsafeKind::BatteryRtl
| FailsafeKind::BatteryHardFloor
| FailsafeKind::GeofenceInclusion
| FailsafeKind::GeofenceExclusion => Err(AutopilotError::NotImplemented(
"mission_executor::failsafe_trigger: battery/geofence land in AZ-652",
)),
}
}
/// Pre-AZ-648 helper kept for callers that only need to validate a
@@ -0,0 +1,473 @@
//! AZ-651 acceptance criteria — lost-link failsafe ladder.
//!
//! AC-1, AC-3, AC-4 are exercised purely against the public
//! `LostLinkLadder` API (deterministic ticks driven by an explicit
//! `Instant`).
//!
//! AC-2 has two halves:
//! - **Pure ladder**: RTL fires exactly once when `LinkOk → LinkLost`
//! happens; subsequent ticks in `LinkLost` do not re-fire. Tested
//! against the ladder directly.
//! - **Integration**: the executor's FSM transitions from
//! `FlyMission` to `Land` when `failsafe_trigger(LinkLost)` is
//! called. Tested via a real `MissionExecutor` and a spy
//! `LostLinkCommandIssuer`.
use std::sync::atomic::{AtomicU32, AtomicU64, Ordering};
use std::sync::Arc;
use std::time::{Duration, Instant as StdInstant};
use async_trait::async_trait;
use mission_executor::{
DriverError, FailsafeKind, LadderInput, LadderState, LostLinkCommandIssuer, LostLinkConfig,
LostLinkDriver, LostLinkLadder, MissionDriver, MissionExecutor, MissionExecutorConfig,
MissionExecutorHandle, MissionState, Telemetry,
};
use shared::error::AutopilotError;
use shared::models::mission::MissionWaypoint;
use tokio::sync::{broadcast, watch};
use tokio::time::Instant;
// =============================================================================
// Pure ladder tests (AC-1, AC-2 fire-once half, AC-3, AC-4, MAVLink recovery)
// =============================================================================
/// Compact config so the tests don't have to wait real wall-clock time.
/// degraded_after = 50 ms, lost_after = 150 ms, follow_grace = 100 ms.
fn fast_config() -> LostLinkConfig {
LostLinkConfig {
degraded_after: Duration::from_millis(50),
lost_after: Duration::from_millis(150),
follow_grace: Duration::from_millis(100),
tick_interval: Duration::from_millis(10),
}
}
/// AC-1 — operator-link degraded then recovers; no RTL.
#[test]
fn ac1_degraded_then_recovers_no_rtl() {
// Arrange
let mut l = LostLinkLadder::new(fast_config());
let t0 = Instant::now();
let out = l.tick(LadderInput {
now: t0,
op_link_up: true,
mavlink_link_up: true,
target_follow_active: false,
});
assert_eq!(out.state, LadderState::LinkOk);
// Act — op-link drops; tick at +70 ms (past degraded_after=50 ms)
l.tick(LadderInput {
now: t0 + Duration::from_millis(10),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: false,
});
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(70),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: false,
});
assert_eq!(out.state, LadderState::LinkDegraded);
assert!(!out.rtl_should_fire);
// Act — op-link recovers before lost_after fires
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(100),
op_link_up: true,
mavlink_link_up: true,
target_follow_active: false,
});
// Assert
assert_eq!(out.state, LadderState::LinkOk);
assert!(out.state_changed);
assert!(!out.rtl_should_fire);
assert_eq!(l.rtl_count(), 0);
}
/// AC-2 (ladder half) — operator-link lost triggers RTL exactly once.
#[test]
fn ac2_operator_link_lost_triggers_rtl_exactly_once() {
// Arrange
let mut l = LostLinkLadder::new(fast_config());
let t0 = Instant::now();
l.tick(LadderInput {
now: t0,
op_link_up: true,
mavlink_link_up: true,
target_follow_active: false,
});
// Act — op-link drops at +10 ms; tick at +170 ms so the down
// duration (160 ms) exceeds lost_after (150 ms).
l.tick(LadderInput {
now: t0 + Duration::from_millis(10),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: false,
});
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(170),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: false,
});
// Assert — entered LinkLost; RTL fires
assert_eq!(out.state, LadderState::LinkLost);
assert!(out.state_changed);
assert!(out.rtl_should_fire);
assert_eq!(l.rtl_count(), 1);
// Act — keep ticking while still in LinkLost; RTL must NOT re-fire
for ms in [180, 200, 300, 500, 1000] {
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(ms),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: false,
});
assert_eq!(out.state, LadderState::LinkLost);
assert!(!out.rtl_should_fire, "rtl re-fired at +{ms} ms");
}
assert_eq!(l.rtl_count(), 1);
}
/// AC-3 — `LinkLostInFollow` grace then RTL.
#[test]
fn ac3_lost_in_follow_grace_then_rtl() {
// Arrange — degraded=50, lost=150, follow_grace=100 → RTL fires at +250 ms total
let mut l = LostLinkLadder::new(fast_config());
let t0 = Instant::now();
l.tick(LadderInput {
now: t0,
op_link_up: true,
mavlink_link_up: true,
target_follow_active: true,
});
// Act — drop op-link at +10 ms; at +170 ms we'd be LinkLost without
// target-follow, but the follow grace engages instead.
l.tick(LadderInput {
now: t0 + Duration::from_millis(10),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: true,
});
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(170),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: true,
});
// Assert — engaged the follow grace
assert_eq!(out.state, LadderState::LinkLostInFollow);
assert!(!out.rtl_should_fire);
assert_eq!(l.rtl_count(), 0);
// Act — still inside grace
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(230),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: true,
});
assert_eq!(out.state, LadderState::LinkLostInFollow);
assert!(!out.rtl_should_fire);
assert_eq!(l.rtl_count(), 0);
// Act — grace expires (grace started at +170 ms; +100 ms = +270 ms)
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(280),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: true,
});
// Assert — promoted to LinkLost; RTL fires once now
assert_eq!(out.state, LadderState::LinkLost);
assert!(out.state_changed);
assert!(out.rtl_should_fire);
assert_eq!(l.rtl_count(), 1);
}
/// AC-4 — MAVLink loss does NOT trigger autopilot-side RTL.
#[test]
fn ac4_mavlink_loss_does_not_trigger_autopilot_rtl() {
// Arrange
let mut l = LostLinkLadder::new(fast_config());
let t0 = Instant::now();
// Act — op-link down AND mavlink down for far longer than lost_after
let mut last_state = LadderState::LinkOk;
for ms in (0..1000).step_by(10) {
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(ms),
op_link_up: false,
mavlink_link_up: false,
target_follow_active: false,
});
// Assert — never fire while mavlink is down
assert!(!out.rtl_should_fire, "rtl fired at +{ms} ms with mavlink down");
last_state = out.state;
}
// Assert
assert_eq!(last_state, LadderState::MavlinkLost);
assert_eq!(l.rtl_count(), 0);
}
/// Supplementary — MAVLink recovers while op-link is still down past
/// lost_after; the ladder resumes the op-link rung and fires RTL once.
#[test]
fn mavlink_recovery_resumes_operator_ladder() {
// Arrange
let mut l = LostLinkLadder::new(fast_config());
let t0 = Instant::now();
l.tick(LadderInput {
now: t0,
op_link_up: true,
mavlink_link_up: true,
target_follow_active: false,
});
// Act — both links go down at +10 ms; run long enough to exceed lost_after
for ms in (10..300).step_by(10) {
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(ms),
op_link_up: false,
mavlink_link_up: false,
target_follow_active: false,
});
assert!(!out.rtl_should_fire);
assert_eq!(out.state, LadderState::MavlinkLost);
}
// Act — mavlink recovers; op-link still down. The internal
// op_link_down_since clock has been ticking since +10 ms, so
// elapsed = 300 ms > lost_after (150 ms) → LinkLost on next tick.
let out = l.tick(LadderInput {
now: t0 + Duration::from_millis(310),
op_link_up: false,
mavlink_link_up: true,
target_follow_active: false,
});
// Assert
assert_eq!(out.previous_state, LadderState::MavlinkLost);
assert_eq!(out.state, LadderState::LinkLost);
assert!(out.rtl_should_fire);
assert_eq!(l.rtl_count(), 1);
}
// =============================================================================
// Integration — driver issues RTL once + FSM transitions FlyMission → Land
// =============================================================================
/// Spy `LostLinkCommandIssuer` that counts RTL invocations.
#[derive(Debug, Default)]
struct SpyCommandIssuer {
rtl_count: AtomicU64,
}
#[async_trait]
impl LostLinkCommandIssuer for SpyCommandIssuer {
async fn issue_rtl(&self) -> Result<(), AutopilotError> {
self.rtl_count.fetch_add(1, Ordering::SeqCst);
Ok(())
}
}
impl SpyCommandIssuer {
fn count(&self) -> u64 {
self.rtl_count.load(Ordering::SeqCst)
}
}
/// Auto-completing `MissionDriver` — every action returns `Ok(())` so
/// the FSM can race through Disconnected → FlyMission once telemetry
/// guards open.
struct AutoDriver {
arm_calls: AtomicU32,
takeoff_calls: AtomicU32,
upload_calls: AtomicU32,
set_auto_calls: AtomicU32,
post_flight_calls: AtomicU32,
}
impl AutoDriver {
fn new() -> Arc<Self> {
Arc::new(Self {
arm_calls: AtomicU32::new(0),
takeoff_calls: AtomicU32::new(0),
upload_calls: AtomicU32::new(0),
set_auto_calls: AtomicU32::new(0),
post_flight_calls: AtomicU32::new(0),
})
}
}
#[async_trait]
impl MissionDriver for AutoDriver {
async fn arm(&self) -> Result<(), DriverError> {
self.arm_calls.fetch_add(1, Ordering::SeqCst);
Ok(())
}
async fn takeoff(&self, _altitude_m: f32) -> Result<(), DriverError> {
self.takeoff_calls.fetch_add(1, Ordering::SeqCst);
Ok(())
}
async fn upload_mission(&self, _items: &[MissionWaypoint]) -> Result<(), DriverError> {
self.upload_calls.fetch_add(1, Ordering::SeqCst);
Ok(())
}
async fn set_auto_mode(&self) -> Result<(), DriverError> {
self.set_auto_calls.fetch_add(1, Ordering::SeqCst);
Ok(())
}
async fn post_flight_sync(&self) -> Result<(), DriverError> {
self.post_flight_calls.fetch_add(1, Ordering::SeqCst);
Ok(())
}
}
/// Drive the executor through telemetry until it reaches `FlyMission`.
/// Uses real time with a short tick interval so the test finishes in
/// well under a second.
async fn drive_to_fly_mission(
handle: &MissionExecutorHandle,
tel_tx: &watch::Sender<Telemetry>,
) {
// mission_reached_final stays false so the FSM idles in FlyMission.
let t = Telemetry {
link_up: true,
health_ok: true,
bit_ok: true,
armed: true,
takeoff_complete: true,
flight_mode_auto: true,
..Telemetry::default()
};
tel_tx.send(t).unwrap();
let deadline = StdInstant::now() + Duration::from_secs(2);
loop {
if matches!(handle.state().await, MissionState::FlyMission) {
return;
}
if StdInstant::now() >= deadline {
panic!(
"FSM never reached FlyMission within 2 s (current state: {:?})",
handle.state().await
);
}
tokio::time::sleep(Duration::from_millis(5)).await;
}
}
fn fast_executor_config() -> MissionExecutorConfig {
let mut cfg = MissionExecutorConfig::multirotor(10.0);
// 2 ms tick — keeps the test fast (~14 ms for 7 transitions).
cfg.tick_interval = Duration::from_millis(2);
cfg
}
/// AC-2 (integration half) — `failsafe_trigger(LinkLost)` while the
/// FSM is in `FlyMission` transitions it to `Land`.
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn ac2_integration_failsafe_trigger_transitions_fly_to_land() {
// Arrange
let exec = MissionExecutor::new(fast_executor_config());
let (tel_tx, tel_rx) = watch::channel(Telemetry::default());
let (handle, fsm_join) = exec.run(AutoDriver::new(), vec![], tel_rx);
drive_to_fly_mission(&handle, &tel_tx).await;
assert_eq!(handle.state().await, MissionState::FlyMission);
// Act
handle
.failsafe_trigger(FailsafeKind::LinkLost)
.await
.expect("failsafe_trigger should succeed");
// Assert — transitioned to Land
assert_eq!(handle.state().await, MissionState::Land);
// Cleanup
fsm_join.abort();
}
/// AC-2 (driver half) — the lost-link driver wires the spy command
/// issuer + executor. Operator-link drop causes:
/// - `issue_rtl` called exactly once
/// - FSM transitions from `FlyMission` to `Land`
/// - subsequent ticks do not re-fire RTL
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn ac2_driver_issues_rtl_once_and_transitions_fsm() {
// Arrange — bring the FSM to FlyMission
let exec = MissionExecutor::new(fast_executor_config());
let (tel_tx, tel_rx) = watch::channel(Telemetry::default());
let (handle, fsm_join) = exec.run(AutoDriver::new(), vec![], tel_rx);
drive_to_fly_mission(&handle, &tel_tx).await;
assert_eq!(handle.state().await, MissionState::FlyMission);
// Arrange — spawn the lost-link driver with fast thresholds
let spy = Arc::new(SpyCommandIssuer::default());
let (op_tx, op_rx) = watch::channel(true);
let (mavlink_events_tx, mavlink_events_rx) =
broadcast::channel::<mavlink_layer::LinkEvent>(8);
let (shutdown_tx, shutdown_rx) = watch::channel(false);
let driver = LostLinkDriver::new(
fast_config(),
spy.clone(),
handle.clone(),
op_rx,
mavlink_events_rx,
)
.with_initial_mavlink_up(true);
let (ladder_handle, ladder_join) = driver.spawn(shutdown_rx);
// Act — drop operator link
op_tx.send(false).unwrap();
// Wait for RTL to fire (lost_after = 150 ms + tick interval slack)
let deadline = StdInstant::now() + Duration::from_secs(2);
loop {
if spy.count() >= 1 {
break;
}
if StdInstant::now() >= deadline {
panic!("RTL never fired within 2 s; ladder state={:?}", ladder_handle.state().await);
}
tokio::time::sleep(Duration::from_millis(5)).await;
}
// Assert — exactly one RTL issued; FSM in Land
assert_eq!(spy.count(), 1);
assert_eq!(ladder_handle.rtl_count().await, 1);
assert_eq!(ladder_handle.state().await, LadderState::LinkLost);
// The executor failsafe_trigger happens after the spy is called,
// so give the driver loop a moment to propagate to the FSM.
let deadline = StdInstant::now() + Duration::from_secs(1);
loop {
if matches!(handle.state().await, MissionState::Land) {
break;
}
if StdInstant::now() >= deadline {
panic!(
"FSM never transitioned to Land within 1 s (state: {:?})",
handle.state().await
);
}
tokio::time::sleep(Duration::from_millis(5)).await;
}
assert_eq!(handle.state().await, MissionState::Land);
// Continue ticking — RTL must NOT re-fire
tokio::time::sleep(Duration::from_millis(300)).await;
assert_eq!(spy.count(), 1);
// Cleanup
shutdown_tx.send(true).unwrap();
let _ = ladder_join.await;
fsm_join.abort();
// Keep the broadcast sender alive until shutdown so the driver
// doesn't see ChannelClosed and tear down early.
let _ = mavlink_events_tx;
}