AZ-657 (frame_ingest): RTSP session lifecycle FSM with bounded exponential backoff (1 s → 30 s cap), AI-lock plumb through watch::Sender that stamps every emitted Frame, and SPS/PPS hard-fail via OpenError::UnsupportedProfile. The actual RTSP wire client is abstracted behind an RtspTransport trait so AZ-658 can pin retina/FFmpeg alongside the decoder; the lifecycle FSM itself is production code today. tokio::select! around every transport call so a hung open/read cannot wedge graceful shutdown. 10 unit + 5 integration tests cover happy path, bounded reconnect, stream- drop reopen, hard-fail no-retry, and AI-lock toggle. AZ-682 (scan_controller): typed ScanState (ZoomedOut / ZoomedIn / TargetFollow) with a complete pure transition catalogue, every (state, trigger) → next_state from description.md §1/§4/§5 covered; spec-disallowed combos return TransitionOutcome.accepted = false with RejectReason::UnsupportedTransition (loud, not silent). Frame- rate floor monitor with hysteresis suppresses ZoomedOut → ZoomedIn while sustained FPS < 10 fps per description.md §5/§6. Rolling 100-sample tick-latency window surfaces p99; health goes yellow above the 10 ms budget. 18 unit + 5 integration tests cover the catalogue, fps-floor activate/clear, and tick-latency budget. Cumulative review (batches 10-12): all OPEN findings carried forward without regressions. See _docs/03_implementation/batch_12_cycle1_report.md §6. Notes: pre-existing dead-code error in autopilot::Runtime:: vlm_provider_name (origin batch 4) blocks workspace -D warnings clippy. Recorded in _docs/_process_leftovers/ — not in batch 12 scope. Co-authored-by: Cursor <cursoragent@cursor.com>
11 KiB
Batch 12 / Cycle 1 — Implementation Report
Date: 2026-05-20 Tasks: AZ-657, AZ-682 Verdict: PASS_WITH_WARNINGS (pre-existing autopilot lint pre-dates this batch — see Findings §A1)
1. Scope
| Ticket | Title | Crate | Complexity |
|---|---|---|---|
| AZ-657 | frame_ingest RTSP session + reconnect + AI-lock | frame_ingest |
3 |
| AZ-682 | scan_controller typed state machine + fps-floor monitor | scan_controller |
5 |
2. Approach
AZ-657 — RTSP session lifecycle
Per the task spec, the production deliverable is the **session lifecycle FSM
- bounded reconnect + AI-lock plumb**. The actual RTSP wire client (retina /
FFmpeg / GStreamer binding) is pinned in AZ-658 alongside the H.264 decoder,
because the codec choice is what pins the client. To deliver real production
code today without prematurely committing to a binding, the lifecycle is
abstracted over an
RtspTransporttrait — the same pattern AZ-653 uses for the A40 UDP wire.
What this batch ships in production:
RtspSessionConfig,OpenError(incl.UnsupportedProfilefor the AC-3 SPS/PPS hard-fail),StreamError,RtspTransporttrait,RtspPacketenvelope. (internal/rtsp_client.rs)SessionStateFSM (Closed | Connecting { attempt } | Streaming | Failing { attempt }), puretransition(state, trigger, backoff),BackoffPolicy(1 s → 30 s cap perdescription.md §6),LifecycleStats. (internal/lifecycle.rs)FrameIngest::run(transport, config)— the actor that drives the lifecycle: opens via the transport, races every transport call against a shutdown signal viatokio::select!(so a hung transport cannot wedge graceful exit), pulls packets, stampsFrame.ai_lockedfrom the supervisorwatch::Sender<bool>, broadcasts. (src/lib.rs)FrameIngestHandle— public surface:subscribe(),set_ai_lock,session_state,session_state_stream,reopens_total,shutdown,health(Disabled/Yellow/Red mapped perdescription.md §6).
What ships in AZ-658 (already scaffolded as the RtspTransport trait):
- The real client binding (retina or FFmpeg-rs).
- The H.264/265 decoder that turns
RtspPacketpayloads into pixel buffers. - Real-camera + MediaMTX integration tests gated behind a
--features live-rtspflag.
AZ-682 — Scan controller state machine
Per the task spec, scope is the typed FSM + frame-rate floor + tick observability. The POI queue (AZ-683), evidence ladder (AZ-684), mapobjects dispatch (AZ-685), and gimbal issuance (AZ-686) are intentionally left to follow-up tickets. The FSM here is the substrate those tickets build on.
What this batch ships:
ScanState { ZoomedOut | ZoomedIn { roi, hold_started_at_ns } | TargetFollow { target_id, started_at_ns } }— typed, exhaustive, lives ininternal/state_machine/mod.rs.Triggercatalogue —PoiSelected | RoiRejected | RoiHoldTimeout | TargetConfirmed | TargetLost | OperatorReleaseFollow | OperatorAbort. Every(state, trigger) → next_statefromdescription.md §1/§4/§5is enumerated; spec-disallowed pairs returnTransitionOutcome { accepted: false, reject_reason: UnsupportedTransition }instead of silently no-opping.transition(state, trigger, ctx)— pure function ininternal/state_machine/transitions.rs, unit-testable without spinning up the actor.FrameRateGuard— rolling window of frame arrivals, hysteresis band[fps_floor, fps_clear)to dampen oscillation, 1-second window. GatesZoomedOut → ZoomedInperdescription.md §5/§6/§8. (internal/frame_rate_guard.rs)ScanController/ScanControllerHandle— async-safe wrapper around atokio::Mutex<Inner>holding the state, FPS guard, rolling latency window (100 samples ≈ 10 s at 10 Hz), transition counters. Records per-call latency onsubmit_triggerandtick; surfaceshealth()yellow when fps-floor active or tick p99 > 10 ms.OperatorCommand → Triggermapping for the kinds that don't need POI queue context (MissionAbort → OperatorAbort,ReleaseTargetFollow → OperatorReleaseFollow); the rest deliberately returnNotImplemented(AZ-683/AZ-684)so the wiring failure is loud.
3. Files touched
AZ-657
crates/frame_ingest/Cargo.toml— addedasync-trait,thiserror,bytes,serde.crates/frame_ingest/src/lib.rs— full rewrite (lifecycle loop, handle, health).crates/frame_ingest/src/internal/mod.rs— new.crates/frame_ingest/src/internal/rtsp_client.rs— new.crates/frame_ingest/src/internal/lifecycle.rs— new.crates/frame_ingest/tests/rtsp_lifecycle.rs— new (5 ACs + fake transport with explicit script controller).
AZ-682
crates/scan_controller/src/lib.rs— full rewrite (handle, metrics, health, operator-cmd mapping).crates/scan_controller/src/internal/mod.rs— new.crates/scan_controller/src/internal/state_machine/mod.rs— new (ScanState + Trigger + TransitionOutcome + RejectReason).crates/scan_controller/src/internal/state_machine/transitions.rs— new (pure transition function + 7 unit tests).crates/scan_controller/src/internal/frame_rate_guard.rs— new (FPS monitor + hysteresis + 6 unit tests).crates/scan_controller/tests/state_machine.rs— new (5 ACs).
4. Test results
| Crate | Unit | Integration | Total |
|---|---|---|---|
frame_ingest |
10 | 5 | 15 |
scan_controller |
18 | 5 | 23 |
Workspace cargo test --workspace: 280+ tests pass, 1 ignored (pre-existing
flaky mission_executor::state_machine::ac3_bounded_retry_then_success
documented in batch 8 — still passes in isolation, intermittent under load,
unchanged by this batch).
Clippy: cargo clippy -p frame_ingest -p scan_controller --all-targets -- -D warnings is clean. Workspace-wide clippy hits one pre-existing dead-code
error in autopilot/src/runtime.rs (see Findings §A1).
5. Findings (this batch)
A1. Pre-existing dead-code error in autopilot::Runtime::vlm_provider_name
Severity: High (blocks workspace -D warnings clippy gate)
Category: Maintenance
Origin: Batch 4 (commit 69c0629, [AZ-643] [AZ-665] [AZ-672] mavlink+mapobjects+vlm batch 4). Predates this batch.
Runtime::vlm_provider_name is only called from #[cfg(test)] code in the
same file. Compiling the autopilot binary target without test cfg flags
it as dead code, which under -D warnings becomes an error. Not introduced
by AZ-657 or AZ-682 — confirmed by stashing this batch and running clippy
against batch-11 HEAD.
Per coderule.mdc "Pre-existing lint errors should only be fixed if they're
in the modified area" → not fixed here. Recorded as a leftover for a
follow-up sweep:
→ See _docs/_process_leftovers/2026-05-20_autopilot_clippy.md.
A2. AZ-682 Inner fields surfaced via new metrics() API
Severity: Low (would have been dead-code in clippy)
Resolution: Added pub async fn metrics() -> ScanMetrics returning
transitions_total, rejected_total, last_state_change_ns,
tick_latency_p99_us — fields are now publicly observable per the
documented health surface in description.md §3. No deferred warning.
A3. Spec drift — module-layout.md is now out of date for frame_ingest
and scan_controller
Severity: Low (Architecture)
Detail: module-layout.md already lists the right internal paths for
both components, but gimbal_controller and now frame_ingest /
scan_controller have actual files present that the doc does not yet
enumerate by stable name (sweep.rs/smooth_pan.rs/centre_on_target.rs/
transport.rs from batches 10-11 are still pending; this batch adds
lifecycle.rs/rtsp_client.rs/state_machine/{mod,transitions}.rs/
frame_rate_guard.rs).
Cumulative leftover with batches 10-11 — same item, deferred to the documentation sync sweep.
A4. Spec drift — data_model.md §PanPlan still missing from batch 11
Severity: Low (Architecture)
Detail: Carried from batch 11 — PanPlan / PanGoal exist in
crates/shared/src/models/gimbal.rs but are not enumerated in
data_model.md. Unchanged by this batch.
6. Cumulative code review — batches 10, 11, 12
The autodev cadence is "cumulative code review every 3 batches". Inputs: batch 10 (AZ-653 A40 UDP transport), batch 11 (AZ-654/655/656 sweep/ smooth_pan/centre_on_target + MonoClock fix), batch 12 (AZ-657 RTSP lifecycle + AZ-682 scan FSM).
Cumulative findings
| ID | Severity | Category | Status |
|---|---|---|---|
| C1 | Medium | Maintainability | OPEN — duplicated SendCommandError mapping in gimbal_controller (batches 9-10) |
| C2 | Low | Style | OPEN — MavlinkCommandIssuer naming inconsistency (batch 9) |
| C3 | Low | Architecture | OPEN — module-layout.md drift: gimbal_controller/internal/transport.rs, sweep.rs, smooth_pan.rs, centre_on_target.rs, frame_ingest/internal/{lifecycle,rtsp_client}.rs, scan_controller/internal/{state_machine,frame_rate_guard}.rs |
| C4 | Low | Architecture | OPEN — data_model.md §PanPlan definition still missing (batch 11) |
| C5 | High | Maintenance | OPEN — pre-existing autopilot/runtime.rs::vlm_provider_name dead-code error blocking workspace -D warnings clippy (batch 4 origin) |
Cross-batch positive observations
- Pattern consistency: AZ-653 (A40Transport trait), AZ-655 (PlanExecutor
taking real Instant clock), AZ-657 (RtspTransport trait) all follow the
same "trait + real impl + fake-for-tests" pattern. This is starting to
look like a workspace idiom worth documenting in
coderule.mdc— candidate rule: "wire I/O behind a trait; production impl talks to real hardware; test impl is in-memory / deterministic; bound the trait in one place to keep the abstraction thin". - MonoClock adoption: AZ-653's flawed
SystemTime::now()was caught by AZ-656 (batch 11) and fixed. AZ-657 and AZ-682 both depend onshared::clock::MonoClockdirectly from the start — no repeat of the bug. - Error-typing discipline: AZ-657's
OpenError::UnsupportedProfileand AZ-682'sRejectReason::UnsupportedTransitionboth use the typed refusal pattern instead of silent no-op or panic. Good practice that's now consistent across the brain (scan_controller) and the perception edge (frame_ingest).
Cumulative recommendation
None of C1–C5 are blockers for batch 12. C5 is the most pressing and is recorded as a non-user-input leftover for next autodev tick. C3 / C4 are documentation sync that should land before the next architecture review.
7. Next-batch candidates
The natural follow-on to batch 12 is:
- AZ-658 — frame_ingest decoder (the H.264 decode that turns
RtspPacket.payloadinto a realFrame.pixelsbuffer). Needs the retina/ffmpeg pin decision. - AZ-683 — scan_controller POI queue + ≤5/min cap + operator-decision window. Uses the AZ-682 FSM as the substrate.
- AZ-659 — frame_ingest publisher (slow-consumer drop policy).