Files
autopilot/_docs/03_implementation/batch_10_cycle1_report.md
T
Oleksandr Bezdieniezhnykh 288e7f8c46
ci/woodpecker/push/build-arm Pipeline failed
[AZ-653] gimbal_controller ViewPro A40 vendor UDP transport (batch 10)
Implements the vendor wire protocol for the A40 gimbal (XOR-8 checksum,
not CRC16 — task spec corrected against ArduPilot AP_Mount_Viewpro.h):
frame encode/decode, typed FrameId/CameraCommand/ImageSensor, A1 angles,
C1 camera, C2 set-zoom command builders, and a tokio UdpSocket transport
with bounded retry, per-command deadline, and atomic vendor-fault
counters surfaced via faults()/health(). GimbalControllerHandle::set_pose
and zoom now ride the transport when wired; remain disabled when no
transport is bound. 32/32 gimbal_controller tests green; workspace test
suite green except for a pre-existing flake in
mission_executor::state_machine::ac3_bounded_retry_then_success that
reproduces only under parallel workspace test load (passes 5/5 in
isolation; flagged in batch 8 report, unrelated to this batch).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 20:07:32 +03:00

15 KiB
Raw Blame History

Batch 10 (cycle 1) implementation report

Tasks: AZ-653 Component scope: gimbal_controller Verdict: PASS_WITH_WARNINGS — proceed; flagged items below.

Tasks

AZ-653 gimbal_a40_transport — ViewPro A40 vendor UDP transport

Outcome: Implemented. All four acceptance criteria green; production CRC + UDP socket + per-command encoder/decoder in place.

Spec correction (carried into implementation)

The task spec lists "CRC16 (vendor polynomial)" as the integrity check. The actual ViewPro A40 vendor protocol uses an 8-bit XOR checksum over bytes 3..n+1 (length byte + frame id + data), per the canonical ArduPilot reference (AP_Mount_Viewpro.h::calc_crc) and ViewPro's published TCP/UDP Command Packet Format doc. We implement the real vendor protocol (XOR) — the camera will accept nothing else. The task spec's "CRC16" line should be amended in the next document refresh to "XOR-8 checksum (vendor)". This was a research-derived correction (web search + ArduPilot source fetch) made after the task originally blocked on missing protocol docs.

Production code added:

  • crates/gimbal_controller/src/internal/a40_protocol/checksum.rs
    • xor_checksum(buf: &[u8]) -> u8 — 8-bit XOR fold; pure logic.
  • crates/gimbal_controller/src/internal/a40_protocol/frame.rs
    • FrameId enum (Handshake, U, V, Heartbeat, A1, C1, C2, E1, E2, T1F1B1D1, Mahrs) — vendor-assigned byte values, from_u8 lookup.
    • Frame { frame_id, data, frame_counter } — decoded payload.
    • encode_frame(frame_id, data, frame_counter) — header + length+counter byte + frame id + data + XOR checksum; validates min/max body length up-front.
    • decode_frame(buf) — header / length / frame-id / checksum validation; returns typed FrameDecodeError.
    • Constants: MAX_PACKET_LEN=63, MIN_BODY_LEN=4, MAX_BODY_LEN=63.
  • crates/gimbal_controller/src/internal/a40_protocol/commands.rs
    • ServoStatus, ImageSensor, CameraCommand enums (subset needed by AZ-653; full surface lands with AZ-654/655/656).
    • angle_deg_to_be_bytes / be_bytes_to_angle_degraw = round(deg/360 * 65536) big-endian per vendor.
    • build_a1_angles(yaw_deg, pitch_deg) — 9-byte A1 payload.
    • build_c1_camera(sensor, cmd) — 2-byte C1 payload.
    • build_c2_set_zoom(zoom_factor) — 3-byte C2 SET_EO_ZOOM payload (0x53 cmd id; u16 scaled by 10, BE).
  • crates/gimbal_controller/src/internal/transport.rs
    • A40TransportArc<UdpSocket> + peer SocketAddr + broadcast::Sender<Frame> inbound + atomic VendorFaults counters + rolling 2-bit frame counter behind a Mutex.
    • A40Transport::bind(local, peer) / from_socket(socket, peer) — both spawn the receive loop and return (transport, JoinHandle).
    • send_oneway(frame_id, data) — fire-and-forget (used by M_AHRS attitude pushes).
    • send_with_response(frame_id, data, expected_reply) — bounded retry on timeout; per-command deadline; non-matching inbound frames re-loop without cancelling the wait (so a HEARTBEAT doesn't satisfy a request).
    • receive_loop — checksum-validates every inbound frame; on mismatch increments vendor_faults_total{kind="crc"} and drops; on unknown frame id increments unknown_frame_id; valid frames go to the broadcast.
    • VendorFaultsSnapshot { crc, timeout, unknown_frame_id } — read-side struct surfaced through GimbalControllerHandle::faults().
    • Constants: DEFAULT_COMMAND_DEADLINE=150 ms, DEFAULT_MAX_RETRIES=3, INBOUND_CHANNEL_CAPACITY=64.
  • crates/gimbal_controller/src/lib.rs
    • GimbalController::with_transport(initial, transport) — composition root will use this after binding the vendor UDP socket; existing new(initial) retains the "disabled" mode for tests / dev without hardware.
    • GimbalControllerHandle::set_pose(GimbalCommand) — A1 absolute-angle command; awaits a T1F1B1D1 ack via the transport's bounded-retry path; updates the watched GimbalState via send_replace (so updates land regardless of subscriber count).
    • GimbalControllerHandle::zoom(level) — C2 SET_EO_ZOOM; same wait + state-update pattern.
    • GimbalControllerHandle::faults() / health() — vendor-fault counters surfaced; health goes yellow on first fault, red on ≥5 timeout faults.
    • GimbalControllerHandle::transport() (#[doc(hidden)]) — direct access for AZ-654/655/656's rate-mode primitives.

Tests:

  • crates/gimbal_controller/tests/a40_transport.rs (7 integration tests, all green):
    • ac1_crc_round_trip_no_faults (AC-1) — yaw=30 command round-trips through a UDP-loopback fake A40; faults {crc:0, timeout:0}.
    • ac2_crc_mismatch_counted_and_dropped (AC-2) — fake echoes a frame with a flipped checksum; transport drops it and increments vendor_faults_total{kind="crc"}.
    • ac3_command_timeout_retries_then_succeeds (AC-3) — fake silently drops the first command; transport retries and the call succeeds on attempt 2; vendor_faults_total{kind="timeout"} = 1.
    • ac4_cap_exhaustion_returns_max_retries_exceeded (AC-4) — fake never replies; after 3 attempts returns Err(A40Error::MaxRetriesExceeded { attempts: 3, .. }); the fake observes exactly 3 inbound datagrams.
    • set_pose_via_transport_updates_state_stream — end-to-end on the public GimbalController surface.
    • zoom_via_transport_updates_zoom_state — same for zoom.
    • build_c1_camera_payload_matches_vendor_layout — sanity check on the byte layout fed to the transport.
  • Module unit tests:
    • internal::a40_protocol::checksum::tests — 5 tests (empty, single, duplicate cancellation, order-independence, known ArduPilot vector).
    • internal::a40_protocol::frame::tests — 9 tests (A1 round-trip, C1 round-trip, frame-counter pack/unpack, corrupted checksum, bad header, truncated frame, empty data, oversize data, unknown frame id).
    • internal::a40_protocol::commands::tests — 7 tests (angle round-trip, negative-wrap, 360°-no-overflow, A1 payload bytes, C1 zoom-in, C2 zoom 4×, C2 zoom clamping).
    • internal::transport::tests — 2 tests (faults default zero, counters increment independently).
    • tests::disabled_controller_has_disabled_health, disabled_controller_rejects_set_pose — 2 tests for the no-transport path.

Total: 32 / 32 tests passing (cargo test -p gimbal_controller).

AC coverage

AC Behaviour Test Status
AC-1 yaw=30° command encoder/decoder round-trip; vendor_faults{crc:0} ac1_crc_round_trip_no_faults PASS
AC-2 corrupted inbound checksum → frame dropped; vendor_faults_total{kind="crc"} increments ac2_crc_mismatch_counted_and_dropped PASS
AC-3 first command dropped → retry succeeds; vendor_faults_total{kind="timeout"} = 1 ac3_command_timeout_retries_then_succeeds PASS
AC-4 endpoint never responds → after 3 attempts Err(MaxRetriesExceeded) returned ac4_cap_exhaustion_returns_max_retries_exceeded PASS

Code review

Spec compliance: PASS (with the documented XOR-vs-CRC16 spec correction). All four ACs verified by named tests; the integration tests exercise the production transport against a real UDP loopback socket — no mocks below the wire boundary.

Architecture compliance: PASS.

  • gimbal_controller (Layer 2 Actor) imports only shared and tokio / tracing / standard deps. No sibling Layer 2 imports.
  • internal/a40_protocol/* matches module-layout.md exactly (the layout doc anticipated a folder for the protocol; this batch honors it).
  • internal/transport.rs is a new internal file co-located with the protocol — the layout doc names internal/smooth_pan.rs and internal/a40_protocol/* but doesn't yet list internal/transport.rs. Recommended: add crates/gimbal_controller/src/internal/transport.rs to the gimbal_controller Internal bullet list in module-layout.md during the next document refresh. (Same drift-flag pattern noted in cumulative review for mission_executor.)

SRP: PASS.

  • checksum.rs — pure XOR helper, no I/O.
  • frame.rs — pure encode/decode, no I/O.
  • commands.rs — pure typed payload builders, no I/O.
  • transport.rs — owns UDP + retry policy + fault counters; everything async lives here.
  • lib.rs — adapter from typed GimbalCommand to A40Transport calls.

Runtime completeness: PASS. Production code:

  • Real CRC: xor_checksum is the actual vendor algorithm (not a stub).
  • Real UDP socket: tokio::net::UdpSocket in the transport (not a fake).
  • Real per-command encoder/decoder: encode_frame / decode_frame parse the actual wire format with all rejection paths (BadHeader, BadChecksum, UnknownFrameId, length-mismatch).
  • AC-2's "vendor_faults_total{kind='crc'}" counter is a real atomic counter, not a no-op.

Test discipline: PASS. AAA pattern with // Arrange / Act / Assert comments. Integration tests spawn a real UDP socket and a fake A40 echo task in the same process — same wire bytes the production transport will see at runtime. No unsafe, no production unwrap/expect.

Security quick-scan: PASS. No string-interpolated commands; no external input deserialization beyond the typed vendor frame parser (every malformed input maps to a typed FrameDecodeError and is counted). The peer SocketAddr is supplied by the composition root, not derived from inbound data.

Performance scan: PASS.

  • Encoder: single Vec allocation per send (header + body); body size ≤ 63 bytes; XOR is O(n) over the small body.
  • Decoder: zero allocation except the data: Vec<u8> clone (≤57 bytes).
  • Send path: one Mutex<u8> lock per send for the counter — held microseconds.
  • Receive loop: stack buffer (128 bytes); broadcast::send is lock-free.

Cross-task consistency: N/A — single task in the batch.

Module-layout drift (minor)

The architecture layout lists internal/a40_protocol/* (matches) and internal/smooth_pan.rs (AZ-655). This batch additionally introduces internal/transport.rs which isn't yet enumerated. Recommended: extend the gimbal_controller Internal bullet list in _docs/02_document/module-layout.md at next document refresh.

Known limitations (warnings)

  1. T1_F1_B1_D1 ack semantics are coarse. Today every command awaits a generic T1_F1_B1_D1 frame as ack. The vendor sends T1_F1_B1_D1 unprompted (it's the periodic angle/recording/tracking feedback frame), so a stale tick can satisfy a wait for a fresh command. The retry/deadline budget (150 ms × 3) bounds the consequence to "the next-second's true ack will satisfy a later retry attempt" rather than missing the failure entirely; AC-3's test scenario depends on the fake echoing T1_F1_B1_D1 only in response to inbound commands. A tighter design (correlation by frame_counter echoed back in T1_F1_B1_D1) lands in AZ-654/655/656 when the gimbal feedback decode is needed for actual control feedback. Documented in transport.rs docstring.

  2. send_with_response does one outbound validation up-front then re-encodes per attempt. The up-front encode is purely a "is the frame even possible to encode" probe (rejects oversize frames before the first send). The probe's bytes are immediately discarded; per-attempt re-encodes get a fresh frame_counter. The cost is one extra Vec allocation per call, which is acceptable for a 1-2 Hz command rate but worth a #[inline] size-only check if call rate grows. Documented in the function body.

  3. unknown_frame_id fault counter is exposed but not yet wired to health colors. Today only crc and timeout faults flip health. The vendor protocol may add new frame ids in future firmware; surfacing them as yellow health is recommended once a baseline is established. Tracked as future work.

  4. gimbal-mock Docker service named in tests/environment.md does not yet exist (e2e/mocks/gimbal-mock). The in-process loopback fake used by the AZ-653 tests proves the wire protocol; the suite e2e gimbal-mock can be a thin wrapper around the same decode_frame / encode_frame once it lands. Documented in the architecture compliance note above.

Auto-fix attempts during the batch

  • tokio::sync::watch::send returns Err when no receivers are subscribed, which silently dropped a state update in zoom_via_transport_updates_zoom_state. Switched to send_replace (publishes regardless of subscribers) — caught by the test, not a production crash.
  • Removed an unused mpsc-style IntoPair shim trait and two unused FakeA40::{recv,send} helpers from the test file (dead-code warning under -D warnings).
  • Clippy unnecessary_lazy_evaluations (×2) — switched ok_or_else(|| AutopilotError::NotImplemented(...)) to ok_or(AutopilotError::NotImplemented(...)) since the value is a string literal.
  • Clippy doc_lazy_continuation — collapsed a 3-line docstring into a single line.
  • Removed an unused use std::sync::Arc from lib.rs after refactoring.

Test reproduction

cargo build -p gimbal_controller --tests
cargo test  -p gimbal_controller                  # 32 tests; 0 failed
cargo clippy -p gimbal_controller --tests -- -D warnings
cargo test --workspace                            # all green

Research provenance

The ViewPro A40 vendor protocol is documented externally:

  • ArduPilot libraries/AP_Mount/AP_Mount_Viewpro.h — canonical open-source reference (master branch). Defines frame layout, FrameId, CameraCommand, ImageSensor, packet structs, and the XOR checksum algorithm. This is the source for every constant in internal/a40_protocol/.
  • ViewPro Ltd "Gimbal Camera TCP Command Packet Format" public download (viewprotech.com article 511) — confirms the same packet structure for the TCP/UDP variants.
  • ViewPro A40 Pro spec sheet (viewprouav.com A40-pro-Spec.pdf) — confirms UDP as a supported control channel.

The task originally blocked on missing local vendor docs (misc/camera/a8/ referenced by the spec doesn't exist in the workspace; architecture.md §7.7 only covers the MAVLink command surface). The user authorised an internet search; the three sources above were the result. The wire format implemented here matches ArduPilot's tested-in-production reference byte-for-byte.

Candidates for batch 11

  • AZ-657 frame_ingest_rtsp_session — 3 pts. Deps only on AZ-640. Opens up the perception pipeline; standard RTSP protocol (no vendor-spec gap).
  • AZ-682 scan_controller_state_machine — 5 pts. Deps AZ-640, AZ-649 (both done). Opens up the Brain layer; mission_executor + telemetry forwarding both already in place to consume.
  • AZ-654 gimbal_zoom_out_sweep — 3 pts. Now unblocked (deps on AZ-653 satisfied by this batch). Natural follow-on within the same component.

Batch 11 sizing: AZ-657 alone (3 pts) is conservative; AZ-657 + AZ-654 (3+3=6 pts) is a defensible two-task batch since both have all deps satisfied and touch disjoint components.