[AZ-657] [AZ-682] frame_ingest RTSP lifecycle + scan_controller FSM (batch 12)
ci/woodpecker/push/build-arm Pipeline failed

AZ-657 (frame_ingest): RTSP session lifecycle FSM with bounded
exponential backoff (1 s → 30 s cap), AI-lock plumb through
watch::Sender that stamps every emitted Frame, and SPS/PPS
hard-fail via OpenError::UnsupportedProfile. The actual RTSP wire
client is abstracted behind an RtspTransport trait so AZ-658 can
pin retina/FFmpeg alongside the decoder; the lifecycle FSM itself
is production code today. tokio::select! around every transport
call so a hung open/read cannot wedge graceful shutdown. 10 unit +
5 integration tests cover happy path, bounded reconnect, stream-
drop reopen, hard-fail no-retry, and AI-lock toggle.

AZ-682 (scan_controller): typed ScanState (ZoomedOut / ZoomedIn /
TargetFollow) with a complete pure transition catalogue, every
(state, trigger) → next_state from description.md §1/§4/§5 covered;
spec-disallowed combos return TransitionOutcome.accepted = false
with RejectReason::UnsupportedTransition (loud, not silent). Frame-
rate floor monitor with hysteresis suppresses ZoomedOut → ZoomedIn
while sustained FPS < 10 fps per description.md §5/§6. Rolling
100-sample tick-latency window surfaces p99; health goes yellow
above the 10 ms budget. 18 unit + 5 integration tests cover the
catalogue, fps-floor activate/clear, and tick-latency budget.

Cumulative review (batches 10-12): all OPEN findings carried
forward without regressions. See
_docs/03_implementation/batch_12_cycle1_report.md §6.

Notes: pre-existing dead-code error in autopilot::Runtime::
vlm_provider_name (origin batch 4) blocks workspace -D warnings
clippy. Recorded in _docs/_process_leftovers/ — not in batch 12
scope.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 08:17:27 +03:00
parent 4c63829ccd
commit 745ab806f1
18 changed files with 2600 additions and 51 deletions
@@ -0,0 +1,71 @@
# RTSP Session + Reconnect + AI-Lock Signal
**Task**: AZ-657_frame_ingest_rtsp_session
**Name**: RTSP session lifecycle + bounded reconnect + AI-lock plumb-through
**Description**: Open the RTSP session to the ViewPro A40, recover from transient connection loss with bounded exponential backoff (1 s → 30 s cap), and plumb through the `bringCameraDown`/`bringCameraUp` AI-lock signal so downstream consumers can skip detection.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure
**Component**: frame_ingest
**Tracker**: AZ-657
**Epic**: AZ-627
## Problem
The RTSP session is the foundation of the perception pipeline. It must (a) open against the camera at startup, (b) recover from drops with bounded backoff (no infinite retry), and (c) carry the `ai_locked` flag through to every emitted `Frame` so that downstream consumers (`detection_client`, `movement_detector`) know to skip detection while the local supervisor is asserting an RC-takeover lock.
## Outcome
- `RtspSession::open(config) -> Result<Self, OpenError>` opens with TCP or UDP transport per camera config.
- On stream loss the session reopens with exponential backoff `1 s → 2 s → 4 s ...` capped at 30 s.
- A subscription to `bringCameraDown` / `bringCameraUp` toggles `ai_locked` on every subsequently emitted frame.
- Health surface: `reopens_total`, `last_frame_age_ms`, `session_state ∈ {closed, connecting, streaming, failing}`, `ai_locked`.
- Camera output-format mismatch (unexpected SPS/PPS) hard-fails at session open with an explicit error; never silently picks a wrong decode path.
## Scope
### Included
- RTSP client (FFmpeg / GStreamer binding or pure-Rust client — pick what `shared` pins).
- Backoff state machine.
- AI-lock signal source subscription (the supervisor channel is implementation-defined; the local supervisor signals over a unix-domain socket per `architecture.md`).
- Session state surface.
### Excluded
- Frame decoding (task 19).
- Multi-consumer publisher (task 20).
## Acceptance Criteria
**AC-1: Open against ViewPro A40 (fixture)**
Given a fixture RTSP server (e.g. `MediaMTX`) replaying a sample stream
When `RtspSession::open(...)` is called
Then it returns `Ok` within ≤2 s and `session_state = "streaming"`.
**AC-2: Reconnect on drop**
Given a healthy session for 5 s
When the fixture RTSP server is killed and restarted
Then the session reopens within ≤5 s and `reopens_total` increments by 1.
**AC-3: SPS/PPS mismatch hard-fails**
Given a fixture stream that announces an unsupported codec profile
When `RtspSession::open(...)` is called
Then it returns `Err(UnsupportedProfile { details })`; no silent decode-path selection.
**AC-4: AI-lock toggles ai_locked flag**
Given a healthy session emitting frames
When `bringCameraDown` is asserted
Then subsequent emitted frames have `ai_locked = true`; when `bringCameraUp` is asserted, they revert to `false`.
## Non-Functional Requirements
**Performance**
- Reconnect latency: ≤5 s from camera availability (per `description.md §8`).
**Reliability**
- Bounded backoff cap configurable; no infinite retry.
## Runtime Completeness
- **Named capability**: RTSP transport against ViewPro A40 + AI-lock signal plumb.
- **Production code that must exist**: real RTSP session; real AI-lock subscription.
- **Allowed external stubs**: `MediaMTX` or `live555-test` as fixture in dev/CI.
- **Unacceptable substitutes**: bypassing AI-lock entirely is unacceptable — it is a safety boundary.
@@ -0,0 +1,73 @@
# Typed State Machine: ZoomedOut / ZoomedIn / TargetFollow
**Task**: AZ-682_scan_controller_state_machine
**Name**: Typed enum state machine with explicit transitions and no ad-hoc booleans
**Description**: Define the typed state enum (`ZoomedOut | ZoomedIn { roi, hold_started_at } | TargetFollow { target_id, started_at }`) and all explicit transitions. Tick latency budget ≤10 ms p99. Frame-rate floor monitor suppresses `ZoomedOut → ZoomedIn` when sustained FPS < 10.
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-649_mission_executor_telemetry_forwarding
**Component**: scan_controller
**Tracker**: AZ-682
**Epic**: AZ-635
## Problem
`scan_controller` is the system's brain. State variables and transitions must be exhaustive and typed — ad-hoc booleans drift and create unreachable bugs. The frame-rate-floor guard (`sustained FPS < 10 → suppress zoom-in`) prevents the controller from entering a state where Tier 2 + VLM saturate the budget while Tier 1 starves.
## Outcome
- `ScanState` enum exhausting all variants; transitions modeled as functions returning new `ScanState`.
- `Tick` function: pure transition step taking inputs (`DetectionBatch | MovementCandidate | Tier2Evidence | VlmAssessment | OperatorEvent | TelemetrySample | MissionState | MapObjectsSyncState`) and returning `(new_state, emitted_actions)`.
- Frame-rate floor monitor: rolling FPS window from `frame_ingest` health pulses; below 10 fps sustained over `floor_window_secs`, suppress `ZoomedOut → ZoomedIn` transitions and surface health → yellow.
- Health: `state`, `tick_latency_p99`, `last_state_change_ts`, `fps_floor_active`.
## Scope
### Included
- State enum + transition functions.
- Tick function (pure, single-input).
- Frame-rate floor monitor.
- Restart starts in `ZoomedOut` with empty queue (per `description.md §5`).
### Excluded
- POI queue + rate-cap + decision window (task 44).
- Evidence ladder + zoom-in candidate handling (task 45).
- MapObjects dispatch (task 46).
- Gimbal command issuance + degraded-sync handling (task 47).
## Acceptance Criteria
**AC-1: State enum is exhausted by tick**
Given any `ScanState` variant and any `TickInput`
When `tick(state, input)` runs
Then it returns a `(new_state, actions)` with no `_ => …` catch-all in transition logic; new variants force compile errors elsewhere.
**AC-2: Transition catalogue is complete**
Given the architecture catalogue of allowed transitions (per `system-flows.md §F4`)
When transitions are enumerated in the code
Then every `(from_state, trigger) → to_state` from the spec is covered; spec-disallowed transitions are typed-impossible OR explicitly rejected with a recorded reason.
**AC-3: Frame-rate floor suppresses zoom-in**
Given sustained FPS < 10 over the `floor_window_secs` window
When a tick that would otherwise transition `ZoomedOut → ZoomedIn` runs
Then the transition is suppressed; state remains `ZoomedOut`; `fps_floor_active = true`; health → yellow.
**AC-4: Tick latency budget**
Given a stream of 1000 sequential ticks
When measured
Then `tick_latency_p99 ≤ 10 ms`.
## Non-Functional Requirements
**Performance**
- Tick latency: ≤10 ms p99.
- State transition is deterministic (same inputs in same order → same outputs).
## Contract
- Canonical typed model: `data_model.md §POI`, `§MapObject`, `§IgnoredItem`. State variants per `description.md §1`.
## Runtime Completeness
- **Named capability**: deterministic typed state machine for scan_controller.
- **Production code that must exist**: real state enum; real transitions; real frame-rate floor monitor.
- **Unacceptable substitutes**: `is_zoomed_in: bool` instead of a sum type is unacceptable per `description.md §4` ("no ad-hoc booleans").