[AZ-676] [AZ-677] [AZ-678] [AZ-679] telemetry+operator foundation

Batch 15 ships the four foundation tickets sitting on top of AZ-675
(gRPC server) and AZ-667 (mapobjects_store hydrate):

* AZ-676: telemetry_stream video path (rtsp_forward + bytes_inline)
  with ai_locked atomic + session counter, SubscribeVideo RPC.
* AZ-677: MapObjects snapshot-on-subscribe + diff broadcast +
  reconnect-resync (StartThen stream-prepend pattern).
* AZ-678: HmacOperatorValidator with per-session monotonic seq,
  in-process session registry + TTL, constant-time HMAC compare,
  rejection-reason counters, sliding 60 s sig-failure red-health gate.
  Trait OperatorCommandValidator in shared::contracts::operator_auth.
* AZ-679: PoiSurfaceMapper produces OperatorPoiEvent per architecture
  §7.10; PoiDequeued events on rotate/age-out/complete; pushed via
  new TelemetrySink::push_operator_event extension on Topic::OperatorEvent.

Cross-task wiring: TelemetrySink trait extended with
push_operator_event; OperatorBridge gets optional builder methods
with_telemetry_sink / with_validator (composition root wires in
AZ-680). Workspace deps: hmac = "0.12"; per-crate adds bytes,
serde_json, parking_lot, chrono, uuid, sha2, thiserror.

Tests: 14/14 ACs verified locally (4 + 3 + 5 + 3 by AC) plus
6 supporting unit tests + 7 integration tests + 2 shared serde
roundtrips. cargo clippy clean on touched crates. Cumulative
review for batches 13-15 produced; verdict PASS_WITH_WARNINGS
(0 Critical, 0 High, 1 Medium, 4 Low — all carry-overs or
deferred-producer notes for AZ-680/AZ-684).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 16:18:40 +03:00
parent 0eb09eec2d
commit ccf929af69
29 changed files with 3495 additions and 68 deletions
@@ -1,65 +0,0 @@
# Video Path Selection (Forward RTSP vs Encoded Bytes) + AI-Lock Coordination
**Task**: AZ-676_telemetry_stream_video_path
**Name**: Operator-bound video path: forward RTSP URL OR carry encoded bytes; coordinate with frame_ingest ai_locked signal
**Description**: Two delivery modes for the operator video path (config-driven): (1) forward the RTSP URL to the operator (most common), (2) carry encoded bytes over the operator gRPC stream. Coordinate with `frame_ingest`'s `ai_locked` signal so AI inference is suppressed only while operator-led control occupies the frame budget.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-657_frame_ingest_rtsp_session, AZ-675_telemetry_stream_grpc_server
**Component**: telemetry_stream
**Tracker**: AZ-676
**Epic**: AZ-637
## Problem
The operator sees the camera feed. Two modes are supported because some operator stacks attach to the RTSP source directly (lower onboard cost, recommended default), and others need bytes carried over the same operator-link channel (no separate RTSP socket to the operator).
When the operator-bound feed is active (either mode), `frame_ingest` MUST raise `ai_locked = true` so Tier-1 inference does not run on the same frames the operator is actively driving. The mechanism is a shared `Arc<AtomicBool>` (or equivalent) toggled by `telemetry_stream`'s session start/stop, read by `frame_ingest` (task 18) and `detection_client` (task 21).
## Outcome
- Config flag `video_path = "rtsp_forward" | "bytes_inline"`; default `rtsp_forward`.
- `rtsp_forward`: emit the canonical RTSP URL as part of the session-start telemetry.
- `bytes_inline`: take frames from `frame_ingest`'s broadcast channel and forward bytes to subscribed operator clients.
- `ai_locked` shared flag plumbed at startup; flipped to `true` while at least one operator session is consuming the video path, `false` otherwise.
- Health surface: `video_path_mode`, `ai_locked_state`, `bytes_inline_drops_total`.
## Scope
### Included
- Both modes (rtsp_forward + bytes_inline).
- ai_locked toggle wiring.
- Session-tracking (active client count gating ai_locked).
### Excluded
- RTSP server stream itself (it's owned by the camera; we just forward the URL).
- `frame_ingest` reading the flag (task 18 owns that).
- Snapshot/diff for MapObjects (task 38).
## Acceptance Criteria
**AC-1: rtsp_forward mode emits URL only**
Given `video_path = "rtsp_forward"` and a client subscribes
When the session starts
Then the client receives the configured RTSP URL in the session-start message; no bytes are streamed by this component.
**AC-2: bytes_inline forwards encoded frames**
Given `video_path = "bytes_inline"` and a client subscribes
When `frame_ingest` publishes 100 frames
Then the client receives all 100 (modulo bounded-queue drops handled by task 36).
**AC-3: ai_locked toggles on session start/stop**
Given no operator session is active (`ai_locked = false`)
When the first client subscribes to the video stream
Then `ai_locked` flips to `true`; when all clients disconnect, `ai_locked` flips back to `false`.
## Non-Functional Requirements
**Performance**
- bytes_inline: frame copy cost ≤2 ms p99 per frame on Jetson Orin Nano.
- AI-lock toggle latency: ≤50 ms from subscribe → flag flip.
## Runtime Completeness
- **Named capability**: operator video path (dual mode) + ai_locked coordination.
- **Production code that must exist**: both modes; real ai_locked atomic wired to consumers.
- **Unacceptable substitutes**: rtsp_forward that doesn't actually emit the URL (or bytes_inline that doesn't read frame_ingest) is unacceptable.
@@ -1,62 +0,0 @@
# Pre-Flight MapObjects Snapshot + In-Flight Diffs + Reconnect Resync
**Task**: AZ-677_telemetry_stream_mapobjects_snapshot
**Name**: MapObjects bundle: pre-flight snapshot + in-flight diff stream + reconnect re-snapshot
**Description**: Emit a full `MapObjectsBundle` snapshot on operator client connect/reconnect, then stream diff messages as the store appends new observations / ignored items. On client reconnect after disconnect, emit a fresh snapshot rather than trying to replay diffs.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-675_telemetry_stream_grpc_server, AZ-667_mapobjects_store_hydrate_and_pending
**Component**: telemetry_stream
**Tracker**: AZ-677
**Epic**: AZ-637
## Problem
The operator views the live map state. Sending the entire `MapObjectsBundle` on every change is wasteful, but streaming diffs without a baseline forces the operator to recover from missing state on reconnect. The pattern: snapshot on connect, then diffs while connected. On disconnect-then-reconnect, treat as fresh client → re-snapshot. No best-effort gap-filling.
## Outcome
- On client subscribe to `MapObjectsBundle` topic: read current store state via `MapObjectsStore::snapshot()`; emit one `MapObjectsBundleSnapshot` message.
- During the session: subscribe to the store's append log (pending_observations + pending_ignored streams); emit `MapObjectsDiff { added: [...], moved: [...], removed_candidates: [...], ignored: [...] }` messages.
- On client disconnect: drop the subscriber.
- On reconnect: treat as new subscribe; emit a fresh snapshot. NO diff replay.
- Health: `mapobjects_snapshot_bytes`, `mapobjects_diff_count`, `mapobjects_resnap_count`.
## Scope
### Included
- Snapshot emission on subscribe.
- Diff stream from store append log.
- Re-snapshot on reconnect.
### Excluded
- Store implementation (task 28).
- Per-client subscriber state machine (task 36).
## Acceptance Criteria
**AC-1: First subscribe receives snapshot**
Given a store with 50 MapObjects + 10 IgnoredItems hydrated
When a client subscribes to the MapObjectsBundle topic
Then it receives exactly one `MapObjectsBundleSnapshot` containing 50 + 10 entries.
**AC-2: In-flight changes emit diffs**
Given a connected client
When 3 new observations and 1 ignored item are appended to the store
Then the client receives one or more `MapObjectsDiff` messages whose combined contents = `{added: 3, ignored: 1}`.
**AC-3: Reconnect re-snapshots**
Given a client disconnected mid-session and the store grew by 5 entries while disconnected
When the client reconnects
Then the client receives a fresh `MapObjectsBundleSnapshot` reflecting the current state; NO diff replay.
## Non-Functional Requirements
**Performance**
- Snapshot serialization: ≤200 ms p99 for ≤10 000 MapObjects.
- Diff fan-out: ≤2 ms p99 per append.
## Runtime Completeness
- **Named capability**: snapshot + diff transport for MapObjects.
- **Production code that must exist**: real snapshot emission; real diff streaming; real re-snapshot on reconnect.
- **Unacceptable substitutes**: emitting full snapshots on every change (bandwidth) or replaying diffs across reconnect (consistency hazard) are both unacceptable.
@@ -1,82 +0,0 @@
# Operator Command Auth: Signature + Replay-Protection + Session Validation
**Task**: AZ-678_operator_bridge_command_auth
**Name**: Validate operator command signature, replay-protection sequence, and session token before dispatch
**Description**: Every incoming operator command MUST pass three checks before any business logic runs: (1) authentication signature over `(session_token, sequence_number, payload)`, (2) replay-protection sequence number is monotonically increasing per session, (3) session token is currently valid. Modem-link encryption is not sufficient (per `architecture.md §5` + Q9).
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-675_telemetry_stream_grpc_server
**Component**: operator_bridge
**Tracker**: AZ-678
**Epic**: AZ-628
## Problem
Operator commands shape the drone's behaviour (target-follow start, mission resume, safety-override). A hostile injection on the operator-link could otherwise direct the UAV to a forbidden target or release a safety. The authoritative principle ("operator commands authenticated, signed, replay-protected, session-bound") is committed in `architecture.md §5`; the exact scheme is open (`§8 Q9`) — this task implements the validator layer in a scheme-agnostic shape and a default HMAC-SHA256 scheme as the placeholder concrete implementation, behind a trait so Q9 resolution can swap it cleanly.
## Outcome
- Trait `OperatorCommandValidator { fn validate(&self, cmd: &SignedCommand) -> Result<ValidatedCommand, AuthError> }` in `shared::contracts::operator_auth`.
- Default impl `HmacOperatorValidator` using SHA-256 HMAC over `(session_token || sequence_number || payload_bytes)`.
- Per-session replay-protection: highest-seen sequence number stored in-process; rejection on equal-or-less.
- Session token validity check against an in-process session registry (sessions added on Ground Station auth; expired after `session_ttl_secs`).
- All rejections counted by reason: `signature_invalid`, `replay_detected`, `session_unknown`, `session_expired`.
- Health: `auth_rejections_total{reason}`; sustained signature failures → health red.
## Scope
### Included
- Validator trait + HMAC default impl.
- Per-session sequence-number tracker.
- Session registry (in-process, in-memory).
- Rejection-reason metrics.
### Excluded
- The transport (Ground Station → operator_bridge); this task validates already-deserialized `SignedCommand`s.
- Session creation handshake from Ground Station (handled at session establishment; this task only consumes sessions).
- Dispatch to `scan_controller` / `mission_executor` (tasks 41 + 42).
## Acceptance Criteria
**AC-1: Valid signed command passes**
Given a `SignedCommand` with correct HMAC over `(token, seq, payload)` and a known session with `seq > last_seen`
When `validate(cmd)` runs
Then it returns `Ok(ValidatedCommand)`; `last_seen` advances to `seq`.
**AC-2: Invalid signature rejected**
Given a `SignedCommand` whose HMAC does not match the computed value
When `validate(cmd)` runs
Then it returns `Err(AuthError::SignatureInvalid)`; `auth_rejections_total{reason: signature_invalid}` increments; sequence number is NOT advanced.
**AC-3: Replay detected**
Given a session with `last_seen = 10`
When a command arrives with `sequence_number = 10` (or `< 10`)
Then it returns `Err(AuthError::ReplayDetected)`; `auth_rejections_total{reason: replay_detected}` increments.
**AC-4: Unknown / expired session rejected**
Given a command bearing a session token not in the registry (or whose TTL has elapsed)
When `validate(cmd)` runs
Then it returns `Err(AuthError::SessionUnknown)` or `SessionExpired`; appropriate counter increments.
**AC-5: Sustained signature failures escalate health**
Given `auth_rejections_total{reason: signature_invalid}` exceeds `signature_failure_red_threshold` per minute (config)
When health is sampled
Then health returns red and surfaces the failure rate.
## Non-Functional Requirements
**Performance**
- `validate` ≤1 ms p99.
**Security**
- Reject-then-log; never log the raw payload of a rejected command at info level (size-cap + redact).
- No timing oracle — use constant-time HMAC compare.
## Contract
- Canonical typed model: `data_model.md §OperatorCommand`, `§SignedCommand`. Open Q9 (`architecture.md §8`) is acknowledged; this task ships a working HMAC default and a trait so resolution can swap.
## Runtime Completeness
- **Named capability**: operator command authentication + replay-protection + session binding.
- **Production code that must exist**: real HMAC validator; real per-session monotonic counter; real session registry; constant-time compare.
- **Unacceptable substitutes**: "accept all signed commands as valid" or sequence number tracked globally (cross-session replay) are unacceptable.
@@ -1,66 +0,0 @@
# POI Surface to Operator + Deadline Carriage
**Task**: AZ-679_operator_bridge_poi_surface
**Name**: POI surface event format + deadline carriage + push via telemetry_stream
**Description**: Translate `POI` events from `scan_controller` into the wire format defined in `architecture.md §7.10 Drone ⇄ Operator Sync Message Format`. Carry the operator-decision deadline computed by `scan_controller`. Push via `telemetry_stream`.
**Complexity**: 3 points
**Dependencies**: AZ-640_initial_structure, AZ-675_telemetry_stream_grpc_server
**Component**: operator_bridge
**Tracker**: AZ-679
**Epic**: AZ-628
## Problem
The operator UI consumes `POI` events with all the evidence required to decide (Tier-1 evidence summary, Tier-2 path/concealment scores when present, optional VLM status, photo metadata, deadline). The wire format is fixed by `architecture.md §7.10`; this task implements the in→out mapping and the deadline carriage (computed upstream by `scan_controller`'s confidence-scaled window).
## Outcome
- `PoiSurfaceMapper::map(poi: &Poi) -> OperatorPoiEvent` produces the wire-format message.
- Fields populated: `poi_id`, `mgrs`, `class_group`, `confidence`, `vlm_status`, `tier2_evidence_summary`, `photo_metadata`, `deadline_unix_ms`.
- On `scan_controller` dequeue (cap rotation / age-out / completion), emit a `PoiDequeued { poi_id, reason }` event.
- Health: `pois_surfaced_per_min`.
## Scope
### Included
- `Poi` → wire format mapping.
- Deadline carriage (already computed upstream).
- Dequeue event emission.
### Excluded
- POI queue + rate-cap (lives in `scan_controller` task 44).
- Operator command dispatch (task 41).
- Transport (lives in `telemetry_stream`).
## Acceptance Criteria
**AC-1: All required fields populated**
Given a `Poi` with Tier-1 + Tier-2 + VLM evidence
When `map(poi)` runs
Then the resulting `OperatorPoiEvent` has every required field per `architecture.md §7.10` non-empty (or explicit null with `vlm_status` reflecting state).
**AC-2: VLM-disabled case carries explicit status**
Given a `Poi` whose VLM evidence is `status: Disabled`
When `map(poi)` runs
Then `OperatorPoiEvent.vlm_status = "disabled"` and `vlm_label = null`; operator can render without inferring absence.
**AC-3: Dequeue emits event**
Given a surfaced POI with id X
When `scan_controller` rotates the queue and X is dequeued
Then a `PoiDequeued { poi_id: X, reason }` event is emitted to `telemetry_stream`.
## Non-Functional Requirements
**Performance**
- POI surface mapping: ≤1 ms p99.
- POI surface → operator visible: ≤1 s under normal modem (end-to-end with `telemetry_stream`).
## Contract
- Canonical typed model: `data_model.md §POI`. Wire format: `architecture.md §7.10`.
## Runtime Completeness
- **Named capability**: POI surface in canonical wire format.
- **Production code that must exist**: real mapper; real dequeue emission.
- **Unacceptable substitutes**: dropping `tier2_evidence_summary` because "it's optional" is unacceptable when the evidence exists in the POI.