[AZ-657] [AZ-682] frame_ingest RTSP lifecycle + scan_controller FSM (batch 12)
ci/woodpecker/push/build-arm Pipeline failed

AZ-657 (frame_ingest): RTSP session lifecycle FSM with bounded
exponential backoff (1 s → 30 s cap), AI-lock plumb through
watch::Sender that stamps every emitted Frame, and SPS/PPS
hard-fail via OpenError::UnsupportedProfile. The actual RTSP wire
client is abstracted behind an RtspTransport trait so AZ-658 can
pin retina/FFmpeg alongside the decoder; the lifecycle FSM itself
is production code today. tokio::select! around every transport
call so a hung open/read cannot wedge graceful shutdown. 10 unit +
5 integration tests cover happy path, bounded reconnect, stream-
drop reopen, hard-fail no-retry, and AI-lock toggle.

AZ-682 (scan_controller): typed ScanState (ZoomedOut / ZoomedIn /
TargetFollow) with a complete pure transition catalogue, every
(state, trigger) → next_state from description.md §1/§4/§5 covered;
spec-disallowed combos return TransitionOutcome.accepted = false
with RejectReason::UnsupportedTransition (loud, not silent). Frame-
rate floor monitor with hysteresis suppresses ZoomedOut → ZoomedIn
while sustained FPS < 10 fps per description.md §5/§6. Rolling
100-sample tick-latency window surfaces p99; health goes yellow
above the 10 ms budget. 18 unit + 5 integration tests cover the
catalogue, fps-floor activate/clear, and tick-latency budget.

Cumulative review (batches 10-12): all OPEN findings carried
forward without regressions. See
_docs/03_implementation/batch_12_cycle1_report.md §6.

Notes: pre-existing dead-code error in autopilot::Runtime::
vlm_provider_name (origin batch 4) blocks workspace -D warnings
clippy. Recorded in _docs/_process_leftovers/ — not in batch 12
scope.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 08:17:27 +03:00
parent 4c63829ccd
commit 745ab806f1
18 changed files with 2600 additions and 51 deletions
@@ -0,0 +1,230 @@
# Batch 12 / Cycle 1 — Implementation Report
**Date**: 2026-05-20
**Tasks**: AZ-657, AZ-682
**Verdict**: PASS_WITH_WARNINGS (pre-existing autopilot lint pre-dates this
batch — see Findings §A1)
## 1. Scope
| Ticket | Title | Crate | Complexity |
|---|---|---|---|
| AZ-657 | frame_ingest RTSP session + reconnect + AI-lock | `frame_ingest` | 3 |
| AZ-682 | scan_controller typed state machine + fps-floor monitor | `scan_controller` | 5 |
## 2. Approach
### AZ-657 — RTSP session lifecycle
Per the task spec, the *production deliverable* is the **session lifecycle FSM
+ bounded reconnect + AI-lock plumb**. The actual RTSP wire client (retina /
FFmpeg / GStreamer binding) is pinned in AZ-658 alongside the H.264 decoder,
because the codec choice is what pins the client. To deliver real production
code today without prematurely committing to a binding, the lifecycle is
abstracted over an `RtspTransport` trait — the same pattern AZ-653 uses for
the A40 UDP wire.
**What this batch ships in production**:
- `RtspSessionConfig`, `OpenError` (incl. `UnsupportedProfile` for the AC-3
SPS/PPS hard-fail), `StreamError`, `RtspTransport` trait, `RtspPacket`
envelope. (`internal/rtsp_client.rs`)
- `SessionState` FSM (`Closed | Connecting { attempt } | Streaming |
Failing { attempt }`), pure `transition(state, trigger, backoff)`,
`BackoffPolicy` (1 s → 30 s cap per `description.md §6`),
`LifecycleStats`. (`internal/lifecycle.rs`)
- `FrameIngest::run(transport, config)` — the actor that drives the
lifecycle: opens via the transport, races every transport call against a
shutdown signal via `tokio::select!` (so a hung transport cannot wedge
graceful exit), pulls packets, stamps `Frame.ai_locked` from the
supervisor `watch::Sender<bool>`, broadcasts. (`src/lib.rs`)
- `FrameIngestHandle` — public surface: `subscribe()`, `set_ai_lock`,
`session_state`, `session_state_stream`, `reopens_total`, `shutdown`,
`health` (Disabled/Yellow/Red mapped per `description.md §6`).
**What ships in AZ-658** (already scaffolded as the `RtspTransport` trait):
- The real client binding (retina or FFmpeg-rs).
- The H.264/265 decoder that turns `RtspPacket` payloads into pixel buffers.
- Real-camera + MediaMTX integration tests gated behind a `--features
live-rtsp` flag.
### AZ-682 — Scan controller state machine
Per the task spec, scope is the **typed FSM + frame-rate floor + tick
observability**. The POI queue (AZ-683), evidence ladder (AZ-684), mapobjects
dispatch (AZ-685), and gimbal issuance (AZ-686) are intentionally left to
follow-up tickets. The FSM here is the substrate those tickets build on.
**What this batch ships**:
- `ScanState { ZoomedOut | ZoomedIn { roi, hold_started_at_ns } |
TargetFollow { target_id, started_at_ns } }` — typed, exhaustive, lives
in `internal/state_machine/mod.rs`.
- `Trigger` catalogue — `PoiSelected | RoiRejected | RoiHoldTimeout |
TargetConfirmed | TargetLost | OperatorReleaseFollow | OperatorAbort`.
Every `(state, trigger) → next_state` from `description.md §1/§4/§5` is
enumerated; spec-disallowed pairs return
`TransitionOutcome { accepted: false, reject_reason:
UnsupportedTransition }` instead of silently no-opping.
- `transition(state, trigger, ctx)` — pure function in
`internal/state_machine/transitions.rs`, unit-testable without spinning
up the actor.
- `FrameRateGuard` — rolling window of frame arrivals, hysteresis band
`[fps_floor, fps_clear)` to dampen oscillation, 1-second window.
Gates `ZoomedOut → ZoomedIn` per `description.md §5/§6/§8`.
(`internal/frame_rate_guard.rs`)
- `ScanController` / `ScanControllerHandle` — async-safe wrapper around a
`tokio::Mutex<Inner>` holding the state, FPS guard, rolling latency
window (100 samples ≈ 10 s at 10 Hz), transition counters. Records
per-call latency on `submit_trigger` and `tick`; surfaces `health()`
yellow when fps-floor active or tick p99 > 10 ms.
- `OperatorCommand → Trigger` mapping for the kinds that don't need POI
queue context (`MissionAbort → OperatorAbort`,
`ReleaseTargetFollow → OperatorReleaseFollow`); the rest deliberately
return `NotImplemented(AZ-683/AZ-684)` so the wiring failure is loud.
## 3. Files touched
### AZ-657
- `crates/frame_ingest/Cargo.toml` — added `async-trait`, `thiserror`,
`bytes`, `serde`.
- `crates/frame_ingest/src/lib.rs` — full rewrite (lifecycle loop,
handle, health).
- `crates/frame_ingest/src/internal/mod.rs` — new.
- `crates/frame_ingest/src/internal/rtsp_client.rs` — new.
- `crates/frame_ingest/src/internal/lifecycle.rs` — new.
- `crates/frame_ingest/tests/rtsp_lifecycle.rs` — new (5 ACs + fake
transport with explicit script controller).
### AZ-682
- `crates/scan_controller/src/lib.rs` — full rewrite (handle, metrics,
health, operator-cmd mapping).
- `crates/scan_controller/src/internal/mod.rs` — new.
- `crates/scan_controller/src/internal/state_machine/mod.rs` — new
(ScanState + Trigger + TransitionOutcome + RejectReason).
- `crates/scan_controller/src/internal/state_machine/transitions.rs` —
new (pure transition function + 7 unit tests).
- `crates/scan_controller/src/internal/frame_rate_guard.rs` — new (FPS
monitor + hysteresis + 6 unit tests).
- `crates/scan_controller/tests/state_machine.rs` — new (5 ACs).
## 4. Test results
| Crate | Unit | Integration | Total |
|---|---|---|---|
| `frame_ingest` | 10 | 5 | 15 |
| `scan_controller` | 18 | 5 | 23 |
Workspace `cargo test --workspace`: 280+ tests pass, 1 ignored (pre-existing
flaky `mission_executor::state_machine::ac3_bounded_retry_then_success`
documented in batch 8 — still passes in isolation, intermittent under load,
unchanged by this batch).
Clippy: `cargo clippy -p frame_ingest -p scan_controller --all-targets --
-D warnings` is clean. Workspace-wide clippy hits one pre-existing dead-code
error in `autopilot/src/runtime.rs` (see Findings §A1).
## 5. Findings (this batch)
### A1. Pre-existing dead-code error in `autopilot::Runtime::vlm_provider_name`
**Severity**: High (blocks workspace `-D warnings` clippy gate)
**Category**: Maintenance
**Origin**: Batch 4 (commit 69c0629, `[AZ-643] [AZ-665] [AZ-672]
mavlink+mapobjects+vlm batch 4`). Predates this batch.
`Runtime::vlm_provider_name` is only called from `#[cfg(test)]` code in the
same file. Compiling the `autopilot` binary target without test cfg flags
it as dead code, which under `-D warnings` becomes an error. Not introduced
by AZ-657 or AZ-682 — confirmed by stashing this batch and running clippy
against batch-11 HEAD.
Per `coderule.mdc` "Pre-existing lint errors should only be fixed if they're
in the modified area" → not fixed here. Recorded as a leftover for a
follow-up sweep:
→ See `_docs/_process_leftovers/2026-05-20_autopilot_clippy.md`.
### A2. AZ-682 `Inner` fields surfaced via new `metrics()` API
**Severity**: Low (would have been dead-code in clippy)
**Resolution**: Added `pub async fn metrics() -> ScanMetrics` returning
`transitions_total`, `rejected_total`, `last_state_change_ns`,
`tick_latency_p99_us` — fields are now publicly observable per the
documented health surface in `description.md §3`. No deferred warning.
### A3. Spec drift — `module-layout.md` is now out of date for `frame_ingest`
and `scan_controller`
**Severity**: Low (Architecture)
**Detail**: `module-layout.md` already lists the right internal paths for
both components, but `gimbal_controller` and now `frame_ingest` /
`scan_controller` have actual files present that the doc does not yet
enumerate by stable name (sweep.rs/smooth_pan.rs/centre_on_target.rs/
transport.rs from batches 10-11 are still pending; this batch adds
lifecycle.rs/rtsp_client.rs/state_machine/{mod,transitions}.rs/
frame_rate_guard.rs).
Cumulative leftover with batches 10-11 — same item, deferred to the
documentation sync sweep.
### A4. Spec drift — `data_model.md §PanPlan` still missing from batch 11
**Severity**: Low (Architecture)
**Detail**: Carried from batch 11 — `PanPlan` / `PanGoal` exist in
`crates/shared/src/models/gimbal.rs` but are not enumerated in
`data_model.md`. Unchanged by this batch.
## 6. Cumulative code review — batches 10, 11, 12
The autodev cadence is "cumulative code review every 3 batches". Inputs:
batch 10 (AZ-653 A40 UDP transport), batch 11 (AZ-654/655/656 sweep/
smooth_pan/centre_on_target + MonoClock fix), batch 12 (AZ-657 RTSP
lifecycle + AZ-682 scan FSM).
### Cumulative findings
| ID | Severity | Category | Status |
|---|---|---|---|
| C1 | Medium | Maintainability | OPEN — duplicated `SendCommandError` mapping in `gimbal_controller` (batches 9-10) |
| C2 | Low | Style | OPEN — `MavlinkCommandIssuer` naming inconsistency (batch 9) |
| C3 | Low | Architecture | OPEN — `module-layout.md` drift: `gimbal_controller/internal/transport.rs`, `sweep.rs`, `smooth_pan.rs`, `centre_on_target.rs`, `frame_ingest/internal/{lifecycle,rtsp_client}.rs`, `scan_controller/internal/{state_machine,frame_rate_guard}.rs` |
| C4 | Low | Architecture | OPEN — `data_model.md §PanPlan` definition still missing (batch 11) |
| C5 | High | Maintenance | OPEN — pre-existing `autopilot/runtime.rs::vlm_provider_name` dead-code error blocking workspace `-D warnings` clippy (batch 4 origin) |
### Cross-batch positive observations
- **Pattern consistency**: AZ-653 (A40Transport trait), AZ-655 (PlanExecutor
taking real Instant clock), AZ-657 (RtspTransport trait) all follow the
same "trait + real impl + fake-for-tests" pattern. This is starting to
look like a workspace idiom worth documenting in `coderule.mdc` —
candidate rule: "wire I/O behind a trait; production impl talks to real
hardware; test impl is in-memory / deterministic; bound the trait in
one place to keep the abstraction thin".
- **MonoClock adoption**: AZ-653's flawed `SystemTime::now()` was caught
by AZ-656 (batch 11) and fixed. AZ-657 and AZ-682 both depend on
`shared::clock::MonoClock` directly from the start — no repeat of the
bug.
- **Error-typing discipline**: AZ-657's `OpenError::UnsupportedProfile`
and AZ-682's `RejectReason::UnsupportedTransition` both use the typed
refusal pattern instead of silent no-op or panic. Good practice that's
now consistent across the brain (scan_controller) and the perception
edge (frame_ingest).
### Cumulative recommendation
None of C1C5 are blockers for batch 12. C5 is the most pressing and is
recorded as a non-user-input leftover for next autodev tick. C3 / C4 are
documentation sync that should land before the next architecture review.
## 7. Next-batch candidates
The natural follow-on to batch 12 is:
- **AZ-658** — frame_ingest decoder (the H.264 decode that turns
`RtspPacket.payload` into a real `Frame.pixels` buffer). Needs the
retina/ffmpeg pin decision.
- **AZ-683** — scan_controller POI queue + ≤5/min cap + operator-decision
window. Uses the AZ-682 FSM as the substrate.
- **AZ-659** — frame_ingest publisher (slow-consumer drop policy).
+2 -2
View File
@@ -6,9 +6,9 @@ step: 7
name: Implement
status: in_progress
sub_step:
phase: 17
phase: 20
name: tracker-update-in-testing
detail: "batch 11 (AZ-654/655/656) committed; awaiting In Testing + push"
detail: "batch 12 (AZ-657 + AZ-682) implemented and reviewed; awaiting commit + In Testing"
retry_count: 0
cycle: 1
tracker: jira
@@ -0,0 +1,59 @@
# Leftover — autopilot dead-code clippy gate
- **Timestamp**: 2026-05-20T05:30:00Z
- **Source**: discovered during batch 12 (`AZ-657` + `AZ-682`)
- **Origin**: commit `69c0629` — `[AZ-643] [AZ-665] [AZ-672]
mavlink+mapobjects+vlm batch 4`
- **Blocked operation**: `cargo clippy --workspace --all-targets --
-D warnings`
## Symptom
```
error: method `vlm_provider_name` is never used
--> crates/autopilot/src/runtime.rs:84:12
|
58 | impl Runtime {
| ------------ method in this implementation
...
84 | pub fn vlm_provider_name(&self) -> &'static str {
| ^^^^^^^^^^^^^^^^^
|
= note: `-D dead-code` implied by `-D warnings`
```
`Runtime::vlm_provider_name` is only called from `#[cfg(test)]` code in the
same file (`runtime.rs:215`, `runtime.rs:228`). Compiling the `autopilot`
binary target without test cfg flags it as dead code; under `-D warnings`
this is an error.
## Why not fixed in batch 12
Per `.cursor/rules/coderule.mdc`:
> Pre-existing lint errors should only be fixed if they're in the modified
> area.
The autopilot crate is outside the AZ-657 / AZ-682 scope (which touch
`frame_ingest` and `scan_controller` only). Fixing this would expand scope
and obscure the batch-12 diff. The lint must be cleared before the next
CI gate that enforces workspace `-D warnings`.
## Recommended fix
Pick the smallest of:
1. `#[cfg(test)]` on the method (it's only called from tests).
2. `#[allow(dead_code)]` on the method.
3. Add a real (non-test) caller — e.g. expose it through the `/health`
JSON so the field becomes load-bearing.
Option (3) is preferred because it surfaces a useful field; (1) is the
narrowest change.
## Replay
This leftover requires no Jira write — it is a code-quality gate. Replay
on the next autodev tick by either folding (3) into a relevant batch
(any batch that touches `autopilot/src/runtime.rs` or the health surface)
or opening a small standalone Maintenance ticket.