Files
autopilot/_docs/03_implementation/batch_12_cycle1_report.md
T
Oleksandr Bezdieniezhnykh 745ab806f1
ci/woodpecker/push/build-arm Pipeline failed
[AZ-657] [AZ-682] frame_ingest RTSP lifecycle + scan_controller FSM (batch 12)
AZ-657 (frame_ingest): RTSP session lifecycle FSM with bounded
exponential backoff (1 s → 30 s cap), AI-lock plumb through
watch::Sender that stamps every emitted Frame, and SPS/PPS
hard-fail via OpenError::UnsupportedProfile. The actual RTSP wire
client is abstracted behind an RtspTransport trait so AZ-658 can
pin retina/FFmpeg alongside the decoder; the lifecycle FSM itself
is production code today. tokio::select! around every transport
call so a hung open/read cannot wedge graceful shutdown. 10 unit +
5 integration tests cover happy path, bounded reconnect, stream-
drop reopen, hard-fail no-retry, and AI-lock toggle.

AZ-682 (scan_controller): typed ScanState (ZoomedOut / ZoomedIn /
TargetFollow) with a complete pure transition catalogue, every
(state, trigger) → next_state from description.md §1/§4/§5 covered;
spec-disallowed combos return TransitionOutcome.accepted = false
with RejectReason::UnsupportedTransition (loud, not silent). Frame-
rate floor monitor with hysteresis suppresses ZoomedOut → ZoomedIn
while sustained FPS < 10 fps per description.md §5/§6. Rolling
100-sample tick-latency window surfaces p99; health goes yellow
above the 10 ms budget. 18 unit + 5 integration tests cover the
catalogue, fps-floor activate/clear, and tick-latency budget.

Cumulative review (batches 10-12): all OPEN findings carried
forward without regressions. See
_docs/03_implementation/batch_12_cycle1_report.md §6.

Notes: pre-existing dead-code error in autopilot::Runtime::
vlm_provider_name (origin batch 4) blocks workspace -D warnings
clippy. Recorded in _docs/_process_leftovers/ — not in batch 12
scope.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-20 08:17:27 +03:00

11 KiB
Raw Blame History

Batch 12 / Cycle 1 — Implementation Report

Date: 2026-05-20 Tasks: AZ-657, AZ-682 Verdict: PASS_WITH_WARNINGS (pre-existing autopilot lint pre-dates this batch — see Findings §A1)

1. Scope

Ticket Title Crate Complexity
AZ-657 frame_ingest RTSP session + reconnect + AI-lock frame_ingest 3
AZ-682 scan_controller typed state machine + fps-floor monitor scan_controller 5

2. Approach

AZ-657 — RTSP session lifecycle

Per the task spec, the production deliverable is the **session lifecycle FSM

  • bounded reconnect + AI-lock plumb**. The actual RTSP wire client (retina / FFmpeg / GStreamer binding) is pinned in AZ-658 alongside the H.264 decoder, because the codec choice is what pins the client. To deliver real production code today without prematurely committing to a binding, the lifecycle is abstracted over an RtspTransport trait — the same pattern AZ-653 uses for the A40 UDP wire.

What this batch ships in production:

  • RtspSessionConfig, OpenError (incl. UnsupportedProfile for the AC-3 SPS/PPS hard-fail), StreamError, RtspTransport trait, RtspPacket envelope. (internal/rtsp_client.rs)
  • SessionState FSM (Closed | Connecting { attempt } | Streaming | Failing { attempt }), pure transition(state, trigger, backoff), BackoffPolicy (1 s → 30 s cap per description.md §6), LifecycleStats. (internal/lifecycle.rs)
  • FrameIngest::run(transport, config) — the actor that drives the lifecycle: opens via the transport, races every transport call against a shutdown signal via tokio::select! (so a hung transport cannot wedge graceful exit), pulls packets, stamps Frame.ai_locked from the supervisor watch::Sender<bool>, broadcasts. (src/lib.rs)
  • FrameIngestHandle — public surface: subscribe(), set_ai_lock, session_state, session_state_stream, reopens_total, shutdown, health (Disabled/Yellow/Red mapped per description.md §6).

What ships in AZ-658 (already scaffolded as the RtspTransport trait):

  • The real client binding (retina or FFmpeg-rs).
  • The H.264/265 decoder that turns RtspPacket payloads into pixel buffers.
  • Real-camera + MediaMTX integration tests gated behind a --features live-rtsp flag.

AZ-682 — Scan controller state machine

Per the task spec, scope is the typed FSM + frame-rate floor + tick observability. The POI queue (AZ-683), evidence ladder (AZ-684), mapobjects dispatch (AZ-685), and gimbal issuance (AZ-686) are intentionally left to follow-up tickets. The FSM here is the substrate those tickets build on.

What this batch ships:

  • ScanState { ZoomedOut | ZoomedIn { roi, hold_started_at_ns } | TargetFollow { target_id, started_at_ns } } — typed, exhaustive, lives in internal/state_machine/mod.rs.
  • Trigger catalogue — PoiSelected | RoiRejected | RoiHoldTimeout | TargetConfirmed | TargetLost | OperatorReleaseFollow | OperatorAbort. Every (state, trigger) → next_state from description.md §1/§4/§5 is enumerated; spec-disallowed pairs return TransitionOutcome { accepted: false, reject_reason: UnsupportedTransition } instead of silently no-opping.
  • transition(state, trigger, ctx) — pure function in internal/state_machine/transitions.rs, unit-testable without spinning up the actor.
  • FrameRateGuard — rolling window of frame arrivals, hysteresis band [fps_floor, fps_clear) to dampen oscillation, 1-second window. Gates ZoomedOut → ZoomedIn per description.md §5/§6/§8. (internal/frame_rate_guard.rs)
  • ScanController / ScanControllerHandle — async-safe wrapper around a tokio::Mutex<Inner> holding the state, FPS guard, rolling latency window (100 samples ≈ 10 s at 10 Hz), transition counters. Records per-call latency on submit_trigger and tick; surfaces health() yellow when fps-floor active or tick p99 > 10 ms.
  • OperatorCommand → Trigger mapping for the kinds that don't need POI queue context (MissionAbort → OperatorAbort, ReleaseTargetFollow → OperatorReleaseFollow); the rest deliberately return NotImplemented(AZ-683/AZ-684) so the wiring failure is loud.

3. Files touched

AZ-657

  • crates/frame_ingest/Cargo.toml — added async-trait, thiserror, bytes, serde.
  • crates/frame_ingest/src/lib.rs — full rewrite (lifecycle loop, handle, health).
  • crates/frame_ingest/src/internal/mod.rs — new.
  • crates/frame_ingest/src/internal/rtsp_client.rs — new.
  • crates/frame_ingest/src/internal/lifecycle.rs — new.
  • crates/frame_ingest/tests/rtsp_lifecycle.rs — new (5 ACs + fake transport with explicit script controller).

AZ-682

  • crates/scan_controller/src/lib.rs — full rewrite (handle, metrics, health, operator-cmd mapping).
  • crates/scan_controller/src/internal/mod.rs — new.
  • crates/scan_controller/src/internal/state_machine/mod.rs — new (ScanState + Trigger + TransitionOutcome + RejectReason).
  • crates/scan_controller/src/internal/state_machine/transitions.rs — new (pure transition function + 7 unit tests).
  • crates/scan_controller/src/internal/frame_rate_guard.rs — new (FPS monitor + hysteresis + 6 unit tests).
  • crates/scan_controller/tests/state_machine.rs — new (5 ACs).

4. Test results

Crate Unit Integration Total
frame_ingest 10 5 15
scan_controller 18 5 23

Workspace cargo test --workspace: 280+ tests pass, 1 ignored (pre-existing flaky mission_executor::state_machine::ac3_bounded_retry_then_success documented in batch 8 — still passes in isolation, intermittent under load, unchanged by this batch).

Clippy: cargo clippy -p frame_ingest -p scan_controller --all-targets -- -D warnings is clean. Workspace-wide clippy hits one pre-existing dead-code error in autopilot/src/runtime.rs (see Findings §A1).

5. Findings (this batch)

A1. Pre-existing dead-code error in autopilot::Runtime::vlm_provider_name

Severity: High (blocks workspace -D warnings clippy gate) Category: Maintenance Origin: Batch 4 (commit 69c0629, [AZ-643] [AZ-665] [AZ-672] mavlink+mapobjects+vlm batch 4). Predates this batch.

Runtime::vlm_provider_name is only called from #[cfg(test)] code in the same file. Compiling the autopilot binary target without test cfg flags it as dead code, which under -D warnings becomes an error. Not introduced by AZ-657 or AZ-682 — confirmed by stashing this batch and running clippy against batch-11 HEAD.

Per coderule.mdc "Pre-existing lint errors should only be fixed if they're in the modified area" → not fixed here. Recorded as a leftover for a follow-up sweep:

→ See _docs/_process_leftovers/2026-05-20_autopilot_clippy.md.

A2. AZ-682 Inner fields surfaced via new metrics() API

Severity: Low (would have been dead-code in clippy) Resolution: Added pub async fn metrics() -> ScanMetrics returning transitions_total, rejected_total, last_state_change_ns, tick_latency_p99_us — fields are now publicly observable per the documented health surface in description.md §3. No deferred warning.

A3. Spec drift — module-layout.md is now out of date for frame_ingest

and scan_controller

Severity: Low (Architecture) Detail: module-layout.md already lists the right internal paths for both components, but gimbal_controller and now frame_ingest / scan_controller have actual files present that the doc does not yet enumerate by stable name (sweep.rs/smooth_pan.rs/centre_on_target.rs/ transport.rs from batches 10-11 are still pending; this batch adds lifecycle.rs/rtsp_client.rs/state_machine/{mod,transitions}.rs/ frame_rate_guard.rs).

Cumulative leftover with batches 10-11 — same item, deferred to the documentation sync sweep.

A4. Spec drift — data_model.md §PanPlan still missing from batch 11

Severity: Low (Architecture) Detail: Carried from batch 11 — PanPlan / PanGoal exist in crates/shared/src/models/gimbal.rs but are not enumerated in data_model.md. Unchanged by this batch.

6. Cumulative code review — batches 10, 11, 12

The autodev cadence is "cumulative code review every 3 batches". Inputs: batch 10 (AZ-653 A40 UDP transport), batch 11 (AZ-654/655/656 sweep/ smooth_pan/centre_on_target + MonoClock fix), batch 12 (AZ-657 RTSP lifecycle + AZ-682 scan FSM).

Cumulative findings

ID Severity Category Status
C1 Medium Maintainability OPEN — duplicated SendCommandError mapping in gimbal_controller (batches 9-10)
C2 Low Style OPEN — MavlinkCommandIssuer naming inconsistency (batch 9)
C3 Low Architecture OPEN — module-layout.md drift: gimbal_controller/internal/transport.rs, sweep.rs, smooth_pan.rs, centre_on_target.rs, frame_ingest/internal/{lifecycle,rtsp_client}.rs, scan_controller/internal/{state_machine,frame_rate_guard}.rs
C4 Low Architecture OPEN — data_model.md §PanPlan definition still missing (batch 11)
C5 High Maintenance OPEN — pre-existing autopilot/runtime.rs::vlm_provider_name dead-code error blocking workspace -D warnings clippy (batch 4 origin)

Cross-batch positive observations

  • Pattern consistency: AZ-653 (A40Transport trait), AZ-655 (PlanExecutor taking real Instant clock), AZ-657 (RtspTransport trait) all follow the same "trait + real impl + fake-for-tests" pattern. This is starting to look like a workspace idiom worth documenting in coderule.mdc — candidate rule: "wire I/O behind a trait; production impl talks to real hardware; test impl is in-memory / deterministic; bound the trait in one place to keep the abstraction thin".
  • MonoClock adoption: AZ-653's flawed SystemTime::now() was caught by AZ-656 (batch 11) and fixed. AZ-657 and AZ-682 both depend on shared::clock::MonoClock directly from the start — no repeat of the bug.
  • Error-typing discipline: AZ-657's OpenError::UnsupportedProfile and AZ-682's RejectReason::UnsupportedTransition both use the typed refusal pattern instead of silent no-op or panic. Good practice that's now consistent across the brain (scan_controller) and the perception edge (frame_ingest).

Cumulative recommendation

None of C1C5 are blockers for batch 12. C5 is the most pressing and is recorded as a non-user-input leftover for next autodev tick. C3 / C4 are documentation sync that should land before the next architecture review.

7. Next-batch candidates

The natural follow-on to batch 12 is:

  • AZ-658 — frame_ingest decoder (the H.264 decode that turns RtspPacket.payload into a real Frame.pixels buffer). Needs the retina/ffmpeg pin decision.
  • AZ-683 — scan_controller POI queue + ≤5/min cap + operator-decision window. Uses the AZ-682 FSM as the substrate.
  • AZ-659 — frame_ingest publisher (slow-consumer drop policy).