mirror of
https://github.com/azaion/autopilot.git
synced 2026-06-21 08:41:09 +00:00
[AZ-657] [AZ-682] frame_ingest RTSP lifecycle + scan_controller FSM (batch 12)
ci/woodpecker/push/build-arm Pipeline failed
ci/woodpecker/push/build-arm Pipeline failed
AZ-657 (frame_ingest): RTSP session lifecycle FSM with bounded exponential backoff (1 s → 30 s cap), AI-lock plumb through watch::Sender that stamps every emitted Frame, and SPS/PPS hard-fail via OpenError::UnsupportedProfile. The actual RTSP wire client is abstracted behind an RtspTransport trait so AZ-658 can pin retina/FFmpeg alongside the decoder; the lifecycle FSM itself is production code today. tokio::select! around every transport call so a hung open/read cannot wedge graceful shutdown. 10 unit + 5 integration tests cover happy path, bounded reconnect, stream- drop reopen, hard-fail no-retry, and AI-lock toggle. AZ-682 (scan_controller): typed ScanState (ZoomedOut / ZoomedIn / TargetFollow) with a complete pure transition catalogue, every (state, trigger) → next_state from description.md §1/§4/§5 covered; spec-disallowed combos return TransitionOutcome.accepted = false with RejectReason::UnsupportedTransition (loud, not silent). Frame- rate floor monitor with hysteresis suppresses ZoomedOut → ZoomedIn while sustained FPS < 10 fps per description.md §5/§6. Rolling 100-sample tick-latency window surfaces p99; health goes yellow above the 10 ms budget. 18 unit + 5 integration tests cover the catalogue, fps-floor activate/clear, and tick-latency budget. Cumulative review (batches 10-12): all OPEN findings carried forward without regressions. See _docs/03_implementation/batch_12_cycle1_report.md §6. Notes: pre-existing dead-code error in autopilot::Runtime:: vlm_provider_name (origin batch 4) blocks workspace -D warnings clippy. Recorded in _docs/_process_leftovers/ — not in batch 12 scope. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,230 @@
|
||||
# Batch 12 / Cycle 1 — Implementation Report
|
||||
|
||||
**Date**: 2026-05-20
|
||||
**Tasks**: AZ-657, AZ-682
|
||||
**Verdict**: PASS_WITH_WARNINGS (pre-existing autopilot lint pre-dates this
|
||||
batch — see Findings §A1)
|
||||
|
||||
## 1. Scope
|
||||
|
||||
| Ticket | Title | Crate | Complexity |
|
||||
|---|---|---|---|
|
||||
| AZ-657 | frame_ingest RTSP session + reconnect + AI-lock | `frame_ingest` | 3 |
|
||||
| AZ-682 | scan_controller typed state machine + fps-floor monitor | `scan_controller` | 5 |
|
||||
|
||||
## 2. Approach
|
||||
|
||||
### AZ-657 — RTSP session lifecycle
|
||||
|
||||
Per the task spec, the *production deliverable* is the **session lifecycle FSM
|
||||
+ bounded reconnect + AI-lock plumb**. The actual RTSP wire client (retina /
|
||||
FFmpeg / GStreamer binding) is pinned in AZ-658 alongside the H.264 decoder,
|
||||
because the codec choice is what pins the client. To deliver real production
|
||||
code today without prematurely committing to a binding, the lifecycle is
|
||||
abstracted over an `RtspTransport` trait — the same pattern AZ-653 uses for
|
||||
the A40 UDP wire.
|
||||
|
||||
**What this batch ships in production**:
|
||||
|
||||
- `RtspSessionConfig`, `OpenError` (incl. `UnsupportedProfile` for the AC-3
|
||||
SPS/PPS hard-fail), `StreamError`, `RtspTransport` trait, `RtspPacket`
|
||||
envelope. (`internal/rtsp_client.rs`)
|
||||
- `SessionState` FSM (`Closed | Connecting { attempt } | Streaming |
|
||||
Failing { attempt }`), pure `transition(state, trigger, backoff)`,
|
||||
`BackoffPolicy` (1 s → 30 s cap per `description.md §6`),
|
||||
`LifecycleStats`. (`internal/lifecycle.rs`)
|
||||
- `FrameIngest::run(transport, config)` — the actor that drives the
|
||||
lifecycle: opens via the transport, races every transport call against a
|
||||
shutdown signal via `tokio::select!` (so a hung transport cannot wedge
|
||||
graceful exit), pulls packets, stamps `Frame.ai_locked` from the
|
||||
supervisor `watch::Sender<bool>`, broadcasts. (`src/lib.rs`)
|
||||
- `FrameIngestHandle` — public surface: `subscribe()`, `set_ai_lock`,
|
||||
`session_state`, `session_state_stream`, `reopens_total`, `shutdown`,
|
||||
`health` (Disabled/Yellow/Red mapped per `description.md §6`).
|
||||
|
||||
**What ships in AZ-658** (already scaffolded as the `RtspTransport` trait):
|
||||
|
||||
- The real client binding (retina or FFmpeg-rs).
|
||||
- The H.264/265 decoder that turns `RtspPacket` payloads into pixel buffers.
|
||||
- Real-camera + MediaMTX integration tests gated behind a `--features
|
||||
live-rtsp` flag.
|
||||
|
||||
### AZ-682 — Scan controller state machine
|
||||
|
||||
Per the task spec, scope is the **typed FSM + frame-rate floor + tick
|
||||
observability**. The POI queue (AZ-683), evidence ladder (AZ-684), mapobjects
|
||||
dispatch (AZ-685), and gimbal issuance (AZ-686) are intentionally left to
|
||||
follow-up tickets. The FSM here is the substrate those tickets build on.
|
||||
|
||||
**What this batch ships**:
|
||||
|
||||
- `ScanState { ZoomedOut | ZoomedIn { roi, hold_started_at_ns } |
|
||||
TargetFollow { target_id, started_at_ns } }` — typed, exhaustive, lives
|
||||
in `internal/state_machine/mod.rs`.
|
||||
- `Trigger` catalogue — `PoiSelected | RoiRejected | RoiHoldTimeout |
|
||||
TargetConfirmed | TargetLost | OperatorReleaseFollow | OperatorAbort`.
|
||||
Every `(state, trigger) → next_state` from `description.md §1/§4/§5` is
|
||||
enumerated; spec-disallowed pairs return
|
||||
`TransitionOutcome { accepted: false, reject_reason:
|
||||
UnsupportedTransition }` instead of silently no-opping.
|
||||
- `transition(state, trigger, ctx)` — pure function in
|
||||
`internal/state_machine/transitions.rs`, unit-testable without spinning
|
||||
up the actor.
|
||||
- `FrameRateGuard` — rolling window of frame arrivals, hysteresis band
|
||||
`[fps_floor, fps_clear)` to dampen oscillation, 1-second window.
|
||||
Gates `ZoomedOut → ZoomedIn` per `description.md §5/§6/§8`.
|
||||
(`internal/frame_rate_guard.rs`)
|
||||
- `ScanController` / `ScanControllerHandle` — async-safe wrapper around a
|
||||
`tokio::Mutex<Inner>` holding the state, FPS guard, rolling latency
|
||||
window (100 samples ≈ 10 s at 10 Hz), transition counters. Records
|
||||
per-call latency on `submit_trigger` and `tick`; surfaces `health()`
|
||||
yellow when fps-floor active or tick p99 > 10 ms.
|
||||
- `OperatorCommand → Trigger` mapping for the kinds that don't need POI
|
||||
queue context (`MissionAbort → OperatorAbort`,
|
||||
`ReleaseTargetFollow → OperatorReleaseFollow`); the rest deliberately
|
||||
return `NotImplemented(AZ-683/AZ-684)` so the wiring failure is loud.
|
||||
|
||||
## 3. Files touched
|
||||
|
||||
### AZ-657
|
||||
- `crates/frame_ingest/Cargo.toml` — added `async-trait`, `thiserror`,
|
||||
`bytes`, `serde`.
|
||||
- `crates/frame_ingest/src/lib.rs` — full rewrite (lifecycle loop,
|
||||
handle, health).
|
||||
- `crates/frame_ingest/src/internal/mod.rs` — new.
|
||||
- `crates/frame_ingest/src/internal/rtsp_client.rs` — new.
|
||||
- `crates/frame_ingest/src/internal/lifecycle.rs` — new.
|
||||
- `crates/frame_ingest/tests/rtsp_lifecycle.rs` — new (5 ACs + fake
|
||||
transport with explicit script controller).
|
||||
|
||||
### AZ-682
|
||||
- `crates/scan_controller/src/lib.rs` — full rewrite (handle, metrics,
|
||||
health, operator-cmd mapping).
|
||||
- `crates/scan_controller/src/internal/mod.rs` — new.
|
||||
- `crates/scan_controller/src/internal/state_machine/mod.rs` — new
|
||||
(ScanState + Trigger + TransitionOutcome + RejectReason).
|
||||
- `crates/scan_controller/src/internal/state_machine/transitions.rs` —
|
||||
new (pure transition function + 7 unit tests).
|
||||
- `crates/scan_controller/src/internal/frame_rate_guard.rs` — new (FPS
|
||||
monitor + hysteresis + 6 unit tests).
|
||||
- `crates/scan_controller/tests/state_machine.rs` — new (5 ACs).
|
||||
|
||||
## 4. Test results
|
||||
|
||||
| Crate | Unit | Integration | Total |
|
||||
|---|---|---|---|
|
||||
| `frame_ingest` | 10 | 5 | 15 |
|
||||
| `scan_controller` | 18 | 5 | 23 |
|
||||
|
||||
Workspace `cargo test --workspace`: 280+ tests pass, 1 ignored (pre-existing
|
||||
flaky `mission_executor::state_machine::ac3_bounded_retry_then_success`
|
||||
documented in batch 8 — still passes in isolation, intermittent under load,
|
||||
unchanged by this batch).
|
||||
|
||||
Clippy: `cargo clippy -p frame_ingest -p scan_controller --all-targets --
|
||||
-D warnings` is clean. Workspace-wide clippy hits one pre-existing dead-code
|
||||
error in `autopilot/src/runtime.rs` (see Findings §A1).
|
||||
|
||||
## 5. Findings (this batch)
|
||||
|
||||
### A1. Pre-existing dead-code error in `autopilot::Runtime::vlm_provider_name`
|
||||
|
||||
**Severity**: High (blocks workspace `-D warnings` clippy gate)
|
||||
**Category**: Maintenance
|
||||
**Origin**: Batch 4 (commit 69c0629, `[AZ-643] [AZ-665] [AZ-672]
|
||||
mavlink+mapobjects+vlm batch 4`). Predates this batch.
|
||||
|
||||
`Runtime::vlm_provider_name` is only called from `#[cfg(test)]` code in the
|
||||
same file. Compiling the `autopilot` binary target without test cfg flags
|
||||
it as dead code, which under `-D warnings` becomes an error. Not introduced
|
||||
by AZ-657 or AZ-682 — confirmed by stashing this batch and running clippy
|
||||
against batch-11 HEAD.
|
||||
|
||||
Per `coderule.mdc` "Pre-existing lint errors should only be fixed if they're
|
||||
in the modified area" → not fixed here. Recorded as a leftover for a
|
||||
follow-up sweep:
|
||||
|
||||
→ See `_docs/_process_leftovers/2026-05-20_autopilot_clippy.md`.
|
||||
|
||||
### A2. AZ-682 `Inner` fields surfaced via new `metrics()` API
|
||||
|
||||
**Severity**: Low (would have been dead-code in clippy)
|
||||
**Resolution**: Added `pub async fn metrics() -> ScanMetrics` returning
|
||||
`transitions_total`, `rejected_total`, `last_state_change_ns`,
|
||||
`tick_latency_p99_us` — fields are now publicly observable per the
|
||||
documented health surface in `description.md §3`. No deferred warning.
|
||||
|
||||
### A3. Spec drift — `module-layout.md` is now out of date for `frame_ingest`
|
||||
and `scan_controller`
|
||||
|
||||
**Severity**: Low (Architecture)
|
||||
**Detail**: `module-layout.md` already lists the right internal paths for
|
||||
both components, but `gimbal_controller` and now `frame_ingest` /
|
||||
`scan_controller` have actual files present that the doc does not yet
|
||||
enumerate by stable name (sweep.rs/smooth_pan.rs/centre_on_target.rs/
|
||||
transport.rs from batches 10-11 are still pending; this batch adds
|
||||
lifecycle.rs/rtsp_client.rs/state_machine/{mod,transitions}.rs/
|
||||
frame_rate_guard.rs).
|
||||
|
||||
Cumulative leftover with batches 10-11 — same item, deferred to the
|
||||
documentation sync sweep.
|
||||
|
||||
### A4. Spec drift — `data_model.md §PanPlan` still missing from batch 11
|
||||
|
||||
**Severity**: Low (Architecture)
|
||||
**Detail**: Carried from batch 11 — `PanPlan` / `PanGoal` exist in
|
||||
`crates/shared/src/models/gimbal.rs` but are not enumerated in
|
||||
`data_model.md`. Unchanged by this batch.
|
||||
|
||||
## 6. Cumulative code review — batches 10, 11, 12
|
||||
|
||||
The autodev cadence is "cumulative code review every 3 batches". Inputs:
|
||||
batch 10 (AZ-653 A40 UDP transport), batch 11 (AZ-654/655/656 sweep/
|
||||
smooth_pan/centre_on_target + MonoClock fix), batch 12 (AZ-657 RTSP
|
||||
lifecycle + AZ-682 scan FSM).
|
||||
|
||||
### Cumulative findings
|
||||
|
||||
| ID | Severity | Category | Status |
|
||||
|---|---|---|---|
|
||||
| C1 | Medium | Maintainability | OPEN — duplicated `SendCommandError` mapping in `gimbal_controller` (batches 9-10) |
|
||||
| C2 | Low | Style | OPEN — `MavlinkCommandIssuer` naming inconsistency (batch 9) |
|
||||
| C3 | Low | Architecture | OPEN — `module-layout.md` drift: `gimbal_controller/internal/transport.rs`, `sweep.rs`, `smooth_pan.rs`, `centre_on_target.rs`, `frame_ingest/internal/{lifecycle,rtsp_client}.rs`, `scan_controller/internal/{state_machine,frame_rate_guard}.rs` |
|
||||
| C4 | Low | Architecture | OPEN — `data_model.md §PanPlan` definition still missing (batch 11) |
|
||||
| C5 | High | Maintenance | OPEN — pre-existing `autopilot/runtime.rs::vlm_provider_name` dead-code error blocking workspace `-D warnings` clippy (batch 4 origin) |
|
||||
|
||||
### Cross-batch positive observations
|
||||
|
||||
- **Pattern consistency**: AZ-653 (A40Transport trait), AZ-655 (PlanExecutor
|
||||
taking real Instant clock), AZ-657 (RtspTransport trait) all follow the
|
||||
same "trait + real impl + fake-for-tests" pattern. This is starting to
|
||||
look like a workspace idiom worth documenting in `coderule.mdc` —
|
||||
candidate rule: "wire I/O behind a trait; production impl talks to real
|
||||
hardware; test impl is in-memory / deterministic; bound the trait in
|
||||
one place to keep the abstraction thin".
|
||||
- **MonoClock adoption**: AZ-653's flawed `SystemTime::now()` was caught
|
||||
by AZ-656 (batch 11) and fixed. AZ-657 and AZ-682 both depend on
|
||||
`shared::clock::MonoClock` directly from the start — no repeat of the
|
||||
bug.
|
||||
- **Error-typing discipline**: AZ-657's `OpenError::UnsupportedProfile`
|
||||
and AZ-682's `RejectReason::UnsupportedTransition` both use the typed
|
||||
refusal pattern instead of silent no-op or panic. Good practice that's
|
||||
now consistent across the brain (scan_controller) and the perception
|
||||
edge (frame_ingest).
|
||||
|
||||
### Cumulative recommendation
|
||||
|
||||
None of C1–C5 are blockers for batch 12. C5 is the most pressing and is
|
||||
recorded as a non-user-input leftover for next autodev tick. C3 / C4 are
|
||||
documentation sync that should land before the next architecture review.
|
||||
|
||||
## 7. Next-batch candidates
|
||||
|
||||
The natural follow-on to batch 12 is:
|
||||
|
||||
- **AZ-658** — frame_ingest decoder (the H.264 decode that turns
|
||||
`RtspPacket.payload` into a real `Frame.pixels` buffer). Needs the
|
||||
retina/ffmpeg pin decision.
|
||||
- **AZ-683** — scan_controller POI queue + ≤5/min cap + operator-decision
|
||||
window. Uses the AZ-682 FSM as the substrate.
|
||||
- **AZ-659** — frame_ingest publisher (slow-consumer drop policy).
|
||||
@@ -6,9 +6,9 @@ step: 7
|
||||
name: Implement
|
||||
status: in_progress
|
||||
sub_step:
|
||||
phase: 17
|
||||
phase: 20
|
||||
name: tracker-update-in-testing
|
||||
detail: "batch 11 (AZ-654/655/656) committed; awaiting In Testing + push"
|
||||
detail: "batch 12 (AZ-657 + AZ-682) implemented and reviewed; awaiting commit + In Testing"
|
||||
retry_count: 0
|
||||
cycle: 1
|
||||
tracker: jira
|
||||
|
||||
@@ -0,0 +1,59 @@
|
||||
# Leftover — autopilot dead-code clippy gate
|
||||
|
||||
- **Timestamp**: 2026-05-20T05:30:00Z
|
||||
- **Source**: discovered during batch 12 (`AZ-657` + `AZ-682`)
|
||||
- **Origin**: commit `69c0629` — `[AZ-643] [AZ-665] [AZ-672]
|
||||
mavlink+mapobjects+vlm batch 4`
|
||||
- **Blocked operation**: `cargo clippy --workspace --all-targets --
|
||||
-D warnings`
|
||||
|
||||
## Symptom
|
||||
|
||||
```
|
||||
error: method `vlm_provider_name` is never used
|
||||
--> crates/autopilot/src/runtime.rs:84:12
|
||||
|
|
||||
58 | impl Runtime {
|
||||
| ------------ method in this implementation
|
||||
...
|
||||
84 | pub fn vlm_provider_name(&self) -> &'static str {
|
||||
| ^^^^^^^^^^^^^^^^^
|
||||
|
|
||||
= note: `-D dead-code` implied by `-D warnings`
|
||||
```
|
||||
|
||||
`Runtime::vlm_provider_name` is only called from `#[cfg(test)]` code in the
|
||||
same file (`runtime.rs:215`, `runtime.rs:228`). Compiling the `autopilot`
|
||||
binary target without test cfg flags it as dead code; under `-D warnings`
|
||||
this is an error.
|
||||
|
||||
## Why not fixed in batch 12
|
||||
|
||||
Per `.cursor/rules/coderule.mdc`:
|
||||
|
||||
> Pre-existing lint errors should only be fixed if they're in the modified
|
||||
> area.
|
||||
|
||||
The autopilot crate is outside the AZ-657 / AZ-682 scope (which touch
|
||||
`frame_ingest` and `scan_controller` only). Fixing this would expand scope
|
||||
and obscure the batch-12 diff. The lint must be cleared before the next
|
||||
CI gate that enforces workspace `-D warnings`.
|
||||
|
||||
## Recommended fix
|
||||
|
||||
Pick the smallest of:
|
||||
|
||||
1. `#[cfg(test)]` on the method (it's only called from tests).
|
||||
2. `#[allow(dead_code)]` on the method.
|
||||
3. Add a real (non-test) caller — e.g. expose it through the `/health`
|
||||
JSON so the field becomes load-bearing.
|
||||
|
||||
Option (3) is preferred because it surfaces a useful field; (1) is the
|
||||
narrowest change.
|
||||
|
||||
## Replay
|
||||
|
||||
This leftover requires no Jira write — it is a code-quality gate. Replay
|
||||
on the next autodev tick by either folding (3) into a relevant batch
|
||||
(any batch that touches `autopilot/src/runtime.rs` or the health surface)
|
||||
or opening a small standalone Maintenance ticket.
|
||||
Reference in New Issue
Block a user