[AZ-658] frame_ingest H.264/265 decoder (NVDEC + sw fallback)

Wires a real ffmpeg-next 8.1 decoder into the frame_ingest lifecycle
loop. NVDEC is probed at runtime via h264_cuvid / hevc_cuvid; CUDA-less
hosts transparently fall back to software h264 / hevc. Each decoded
frame is stamped with capture_ts (taken at packet receipt) and
decode_ts (taken after decode returns) so movement_detector sees
accurate frame-arrival times. Single-frame decode errors are counted
toward decode_errors_total and dropped; the stream is never aborted.

Adds new public API on FrameIngestHandle: decoder_backend(),
decode_errors_total(), frames_decoded_total(), decode_ms_first_frame(),
decode_ms_p50(), decode_ms_p99(). Integration tests under
crates/frame_ingest/tests/decoder_pipeline.rs cover AC-1, AC-3, AC-4
end-to-end through the real FfmpegDecoder using libx264-encoded
synthetic streams; AC-2 positive (NVDEC selection) is opt-in via
--ignored on a CUDA host. AZ-657 lifecycle tests retained via a
StubDecoder.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 17:05:27 +03:00
parent c1558ac5c3
commit 251ebed1c2
12 changed files with 1566 additions and 65 deletions
@@ -0,0 +1,91 @@
# Batch Report
**Batch**: 16
**Cycle**: 1
**Tasks**: AZ-658
**Date**: 2026-05-20
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|---------------|-------|-------------|--------|
| AZ-658_frame_ingest_decoder | Done | 7 files | 24 passed, 1 ignored | 4/4 ACs covered | None |
## AC Coverage map
| AC | Test | File | Notes |
|----|------|------|-------|
| AC-1 software decode + ≥285/300 throughput + monotonic seq + `decoder_backend = "Software"` | `ac1_ac4_software_decode_preserves_throughput_and_monotonicity` | `crates/frame_ingest/tests/decoder_pipeline.rs` | 60-frame variant exercises the same software decode path; literal 1080p/10s NFR validated at deploy on Jetson per `description.md §8` |
| AC-2 NVDEC selected on Jetson | `ac2_nvdec_backend_selected_on_cuda_host` (`#[ignore]` — opt-in via `--ignored` on CUDA host) | same file | Negative direction (no CUDA → Software) covered both by the unit test `ffmpeg_decoder_falls_back_to_software_on_macos_dev_host` and by the AC-1 test; together they pin the selection rule from both sides |
| AC-3 single-frame error doesn't abort | `ac3_corrupted_frame_is_counted_and_does_not_abort_stream` | same file | Asserts `decode_errors_total == 1` after one garbage packet between valid streams; subsequent frames continue to land with strictly monotonic seq |
| AC-4 monotonic capture timestamps | rides on `ac1_ac4_software_decode_preserves_throughput_and_monotonicity` | same file | Asserts `capture_ts_monotonic_ns` strictly increases and `decode_ts ≥ capture_ts` for every frame |
## AC Test Coverage: All covered (4/4 — AC-2 positive direction is `#[ignore]`d behind the Jetson prerequisite, which counts as covered per implement skill Step 8)
## Code Review Verdict: PASS_WITH_WARNINGS (self-review — see findings below)
## Auto-Fix Attempts: 0 (no findings escalated to auto-fix)
## Stuck Agents: None
## Files modified
```
M Cargo.toml (workspace dep: ffmpeg-next = "8.1")
M crates/frame_ingest/Cargo.toml (deps: ffmpeg-next, parking_lot)
A crates/frame_ingest/src/internal/decoder.rs (NEW: trait + FfmpegDecoder + DecodeStats)
A crates/frame_ingest/src/internal/timestamp.rs (NEW: SeqCounter + FrameStamper)
M crates/frame_ingest/src/internal/mod.rs (+decoder, +timestamp modules)
M crates/frame_ingest/src/lib.rs (lifecycle loop now wires the decoder; new health/metric accessors)
A crates/frame_ingest/tests/decoder_pipeline.rs (NEW: AC-1, AC-2 ignored, AC-3, AC-4)
M crates/frame_ingest/tests/rtsp_lifecycle.rs (StubDecoder for AZ-657 lifecycle tests)
R _docs/02_tasks/todo/AZ-658_frame_ingest_decoder.md → _docs/02_tasks/done/...
```
## Notable design decisions
1. **FFmpeg stack** — user picked `ffmpeg-next 8.1` (workspace-pinned to FFmpeg 8.1 already on the host). NVDEC is probed at runtime via `ffmpeg::codec::decoder::find_by_name("h264_cuvid")` / `"hevc_cuvid"`; on a CUDA-less host we transparently fall back to the software `h264` / `hevc` decoder. No feature flag — both code paths are always compiled.
2. **NV12 normalisation** — the decoder always emits NV12 (the canonical pixel format for downstream consumers per `description.md §3` and what NVDEC produces natively on Jetson). A reusable `sws_scale` context converts whatever the inner decoder returned (typically YUV420P from libx264 software, NV12 from NVDEC). Non-Send `SwsContext` is wrapped with `unsafe impl Send for FfmpegDecoder` — the safety justification (exclusive ownership by the spawned lifecycle task) is documented in `decoder.rs`.
3. **Stats**`DecodeStats` is a lock-free counter set with a 1024-sample ring buffer behind `parking_lot::Mutex` for p50/p99 readout. Cold-start metric (`decode_ms_first_frame`) is recorded only on the first successful decode per session; subsequent calls are no-ops.
4. **Trait shape**`FrameDecoder::decode(payload, out: &mut Vec<DecodedPixels>)` instead of `Result<Frame>` because FFmpeg may buffer encoded packets internally before producing any decoded frames (e.g. while assembling SPS/PPS for the first IDR). Zero, one, or many frames per call.
5. **Timestamp boundary** — capture timestamp + sequence number are taken **before** the decoder runs (the moment the lifecycle loop pulls the packet off the transport). `decode_ts_monotonic_ns` is read after the decoder returns. This matches `description.md §4` and gives `movement_detector` accurate frame-arrival timestamps for the telemetry-skew gate.
## Self-review findings
| # | Severity | Category | Location | Finding | Disposition |
|---|----------|----------|----------|---------|-------------|
| 1 | Low | Maintainability | `decoder.rs::is_eagain` | Detects EAGAIN by string-matching `Error` Display output rather than a typed errno. Reason: `ffmpeg-next` does not re-export the EAGAIN constant across its 48 versions in a stable shape. | Accepted as a small surface area (only used inside the decode loop); will be tightened when FFmpeg 9 changes the error variants. |
| 2 | Low | Architecture | `crates/autopilot/src/runtime.rs:84` | Pre-existing dead-code warning on `vlm_provider_name` — leftover entry exists. | Out of batch 16 scope (different component); leftover stays for the next batch that touches autopilot. |
| 3 | Info | Spec gap (out of scope) | `crates/frame_ingest/src/internal/rtsp_client.rs:5-12` | The AZ-657 author's docstring says "the full RTSP client is folded into AZ-658 alongside the decoder". The AZ-658 task spec **explicitly excludes** RTSP lifecycle ("Excluded: RTSP session lifecycle (task 18)"). The real production RTSP `RtspTransport` impl is therefore still TBD — it will be a separate follow-up task or wired during runtime composition. | Not a regression; not in AZ-658 scope. The Product Implementation Completeness Gate (Step 15) will surface this if the system needs it before final reporting. |
## Test results
```
running 17 tests (frame_ingest unit + lib tests)
test result: ok. 17 passed; 0 failed; 0 ignored
running 3 tests (tests/decoder_pipeline.rs)
test ac3_corrupted_frame_is_counted_and_does_not_abort_stream ... ok
test ac1_ac4_software_decode_preserves_throughput_and_monotonicity ... ok
test ac2_nvdec_backend_selected_on_cuda_host ... ignored, AC-2 positive: requires a CUDA-capable FFmpeg
test result: ok. 2 passed; 0 failed; 1 ignored
running 5 tests (tests/rtsp_lifecycle.rs)
test result: ok. 5 passed; 0 failed; 0 ignored
```
## Quality gates
- `cargo check --workspace --all-targets` → clean (only the documented pre-existing autopilot dead-code warning)
- `cargo clippy -p frame_ingest --all-targets -- -D warnings` → clean
- `cargo fmt -p frame_ingest --check` → clean
## Next Batch
Batch 17 candidates (ready by deps):
- AZ-680 `operator_bridge_command_dispatch` (3 pts)
- AZ-681 `operator_bridge_safety_and_bit_ack` (3 pts)
- AZ-659 `frame_ingest_publisher` (3 pts) — newly unblocked because AZ-658 is now in `done/`
Suggested grouping: AZ-680 + AZ-681 (tightly coupled — both depend on AZ-678 operator_bridge command auth). AZ-659 fits a separate batch focused on the frame_ingest pipeline's tail.
## Cumulative review cadence
Last cumulative: batches 1315 (`cumulative_review_batches_13-15_cycle1_report.md`). Next due: end of batch 18 (no cumulative review for batch 16).
+13 -13
View File
@@ -4,27 +4,27 @@
flow: greenfield
step: 7
name: Implement
status: between-batches
status: in_progress
sub_step:
phase: 0
name: batch-16-select
detail: ""
phase: 11
name: commit
detail: "batch 16 — AZ-658 awaiting commit + push approval"
retry_count: 0
cycle: 1
tracker: jira
## Last Completed Batch
batch: 15
commit: ccf929a
ticket: AZ-676 / AZ-677 / AZ-678 / AZ-679
jira_status: In Testing (all 4 confirmed via read-back)
pushed_to: origin/dev
report: _docs/03_implementation/batch_15_cycle1_report.md
batch: 16
commit: pending
ticket: AZ-658
jira_status: In Progress (transition to In Testing pending commit)
pushed_to: pending
report: _docs/03_implementation/batch_16_cycle1_report.md
cumulative_review: _docs/03_implementation/cumulative_review_batches_13-15_cycle1_report.md
## Process Leftovers
- `_docs/_process_leftovers/2026-05-20_autopilot_clippy.md`C5 replay
- `_docs/_process_leftovers/2026-05-20_mission_executor_ac3_flake.md`C6 fix recipe
- `_docs/_process_leftovers/2026-05-20_autopilot_clippy.md`out-of-scope for batch 16
- `_docs/_process_leftovers/2026-05-20_mission_executor_ac3_flake.md`out-of-scope for batch 16
## Cumulative Review Cadence
Last cumulative: batches 1315 (just produced). Next due: end of batch 18.
Last cumulative: batches 1315. Next due: end of batch 18.