[AZ-662] [AZ-669] Close batch 19: green test gate via Jetson Docker
ci/woodpecker/push/build-arm Pipeline failed

Stand up a production-target test runner on jetson-e2e and run the
deferred cargo test --workspace for batch 19.

Infra:
- Dockerfile.test: ubuntu:22.04 + libopencv-dev + libav*-dev +
  libclang-dev + protobuf-compiler + rust 1.82.0 (rustfmt, clippy).
  Sets LIBCLANG_PATH so clang-sys can dlopen libclang under the
  opencv-rust clang-runtime path.
- scripts/jetson-test.sh: rsync source to jetson-e2e, docker build,
  docker run cargo test --workspace --no-fail-fast.

Workspace fix exposed by the gate:
- Cargo.toml: enable opencv "clang-runtime" feature. Without it the
  workspace fails to build because clang-sys is shared between
  opencv-binding-generator and bindgen (via ffmpeg-sys-next) and the
  opencv generator panics with "a `libclang` shared library is not
  loaded on this thread" (opencv-rust GH issue #635).

Batch-19 code bugs exposed by the gate (6 compile errors + 1 algo bug):
- movement_detector::optical_flow: min_max_loc signature (opencv 0.98
  expects Option<&mut f64> / Option<&mut Point>); data_mut() returns
  *mut u8 directly, not Result. RANSAC residual now filters by the
  inlier mask returned by find_homography (matches the docstring; was
  systematically over-reporting motion magnitude on synthetic
  pure-pan input).
- semantic_analyzer::scoring::freshness: same data_mut() fix;
  stddev_f32 now takes &impl core::ToInputArray so it accepts the
  BoxedRef<Mat> that Mat::roi returns in opencv 0.98.

Result: 391 tests passed across 58 binaries, 0 in-scope failures.

Two pre-existing failures in frame_ingest (batch 16-18 scope) are
NOT addressed here and are recorded as leftovers:
- frame_ingest_cuvid_segv: HIGH severity production bug; libavcodec58
  advertises h264_cuvid but libnvcuvid.so.1 is missing at runtime, the
  software fallback never fires, first send_packet SEGVs.
- frame_ingest_publisher_timing_flake: LOW severity; Jetson-specific
  timing budget too tight for ac1_three_consumers_at_rate_lose_no_frames.

Neither blocks batch 20 (movement_detector / semantic_analyzer next).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 22:11:16 +03:00
parent 202b2cb192
commit e077d3bd15
10 changed files with 359 additions and 81 deletions
@@ -2,8 +2,10 @@
**Tasks**: AZ-662, AZ-669
**Completed**: 2026-05-20
**Commit**: `db844db [AZ-662] [AZ-669] Implement ego-motion estimator and primitive graph`
**Status**: Code committed; code review PASS_WITH_WARNINGS; `cargo test --workspace` **NOT YET RUN** (env-blocked — see "Test Gate" below).
**Initial commit**: `db844db [AZ-662] [AZ-669] Implement ego-motion estimator and primitive graph`
**Archival commit**: `202b2cb [AZ-662] [AZ-669] Archive batch 19; defer test gate`
**Test-gate commit**: pending — closes this batch with the Jetson Docker test infra + 6 follow-up code fixes the test gate exposed
**Status**: Code committed; lightweight code review PASS_WITH_WARNINGS; `cargo test --workspace` **GREEN for batch 19 scope** (see "Test Run — DONE" section). 2 pre-existing failures in `frame_ingest` (batch 16/17/18 code) recorded as leftovers, not blocking.
---
@@ -75,18 +77,50 @@ No Critical, no High, no Security findings.
---
## Test Gate — DEFERRED
## Test Gate — DONE
`cargo test --workspace` **has not been run** for this batch.
Ran via the new Jetson Docker test pipeline (`Dockerfile.test` + `scripts/jetson-test.sh`), which mirrors the production target (Jetson Orin Nano Super, JetPack 6, Ubuntu 22.04 aarch64, FFmpeg 4.4, OpenCV 4.5).
**Why**:
- macOS dev box has no native OpenCV 4 install. `cargo test` for `movement_detector` and `semantic_analyzer` won't link.
- State file's recorded plan (`ssh jetson-e2e && cargo test --workspace`) is not directly executable — `jetson-e2e` hosts the CI infra (Gitea + Woodpecker on `~/ci/docker-compose.ci.yml`) and has neither the project checkout nor `cargo` on `$PATH`.
- `brew install opencv` failed with ENOSPC: data-partition free space ≤ 1.1 GiB; opencv + dependencies need ~3-5 GiB.
**Result**: **391 tests passed across 58 test binaries**, 2 ignored (NVDEC-positive cases that explicitly require a CUDA-capable FFmpeg), 0 in-scope failures.
**Tracked as leftover**: `_docs/_process_leftovers/2026-05-20_batch19_opencv_test_gate.md`.
### Infra introduced (commits in next push)
**Next-cycle requirement**: tests for AZ-662 and AZ-669 MUST pass before batch 20 can build on top of this code. Options recorded in the leftover.
| Artifact | Purpose |
|---|---|
| `Dockerfile.test` | ubuntu:22.04 base + `libopencv-dev` + `libav*-dev` + `libclang-dev` + protobuf-compiler + rust 1.82.0 (rustfmt, clippy) |
| `scripts/jetson-test.sh` | rsync source → Jetson, `docker build`, `docker run cargo test --workspace --no-fail-fast --color always` |
### Workspace fix exposed by the gate
| File | Change | Why |
|---|---|---|
| `Cargo.toml:91` | `opencv` features += `"clang-runtime"` | Without it, the workspace fails to build because the same `clang-sys 1.8.1` instance is shared with `bindgen` (via `ffmpeg-sys-next`), and the opencv binding generator panics with "a `libclang` shared library is not loaded on this thread". `clang-runtime` makes the opencv generator dlopen libclang via `LIBCLANG_PATH` rather than relying on the statically linked instance. See opencv-rust GH issue #635. |
### Batch-19 code fixes exposed by the gate
The test gate caught **6 real compile errors** + **1 algorithm bug** in the original `db844db` source. These are not "test infrastructure" issues; they are bugs that the deferred test gate let through. Fixed in-scope per coderule.mdc (adjacent hygiene allowed when the change is in the same files I authored for this batch):
| # | File | Line | Bug | Fix |
|---|---|---|---|---|
| 1 | `crates/movement_detector/src/internal/optical_flow/mod.rs` | 39-46 | `min_max_loc` called with `&mut min_val, &mut max_val, &mut Point::default(), &mut Point::default()` — opencv 0.98 expects `Option<&mut f64>` etc. | Wrapped min/max in `Some(...)`; passed `None` for the unused loc args. |
| 2 | `crates/movement_detector/src/internal/optical_flow/mod.rs` | 70 | `rgb_mat.data_mut()?` — opencv 0.98 changed `data_mut()` to return `*mut u8` directly (no `Result`). | Removed the `?`. |
| 3 | `crates/movement_detector/src/internal/optical_flow/mod.rs` | 85 | Same as #2 for `mat.data_mut()?`. | Removed the `?`. |
| 4 | `crates/semantic_analyzer/src/internal/scoring/freshness.rs` | 56 | Same as #2 for `mat.data_mut()?`. | Removed the `?`. |
| 5 | `crates/semantic_analyzer/src/internal/scoring/freshness.rs` | 64 | Same as #2 for `rgb.data_mut()?`. | Removed the `?`. |
| 6 | `crates/semantic_analyzer/src/internal/scoring/freshness.rs` | 94, 131 | `stddev_f32(&roi)` called with `&BoxedRef<'_, Mat>` (opencv 0.98 changed `Mat::roi` to return `BoxedRef<Mat>` instead of `Mat`); `stddev_f32` signature expects `&Mat`. | Changed `stddev_f32` to take `&impl core::ToInputArray` — same approach opencv's own API uses, accepts both `&Mat` and `&BoxedRef<Mat>` without manual deref. |
| 7 (algorithm) | `crates/movement_detector/src/internal/optical_flow/mod.rs` | 172-191 (now 172-201) | Residual computation iterated over ALL LK-tracked feature pairs, not RANSAC inliers — but the docstring on `HomographyResult::residual_magnitude_px` says "Mean reprojection residual across **inliers**". For a synthetic pure-pan checkerboard, edge features with no match in the post-shift region become RANSAC outliers and inflated the residual to 4.08 px (test asserts < 3.0). Real production bug: the residual was systematically over-reporting motion magnitude. | Added a check against the `mask` returned by `find_homography(..., RANSAC, 3.0)` so only inlier pairs contribute. Now matches the docstring + passes AC-1. |
### Pre-existing failures (out of batch 19 scope — recorded as leftovers)
These are in `crates/frame_ingest/` (batches 16/17/18, owned by AZ-657/658). The Jetson test gate is the first place they have surfaced because the macOS dev box doesn't have h264_cuvid registered at all and these tests had not been run on production-target hardware before.
| Failing target | Symptom | Root cause |
|---|---|---|
| `cargo test -p frame_ingest --lib` | SIGSEGV at `[h264_cuvid @ ...] Cannot load libnvcuvid.so.1` | `decoder.rs::try_open` uses `Context::new().decoder().open_as(codec)` which returns `Ok` even for codecs whose runtime backend (libnvcuvid) is missing. The fallback to software h264 never fires; the first `send_packet` SEGVs. Ubuntu's libavcodec58 advertises `h264_cuvid` because it was built with cuvid headers — but the dynamic libnvcuvid.so.1 is NOT in the test container. → leftover `2026-05-20_frame_ingest_cuvid_segv.md`. |
| `cargo test -p frame_ingest --test decoder_pipeline` | Same SIGSEGV chain | Same root cause as above. |
| `cargo test -p frame_ingest --test publisher::ac1_three_consumers_at_rate_lose_no_frames` | "telemetry stalled at 25/30" | Timing-sensitive test; the per-frame budget is too tight for the Jetson Orin Nano Super (6-core ARM Cortex-A78AE) compared to the Mac dev box (M-series). Passed on the second run, so this is flaky on slower hardware. → leftover `2026-05-20_frame_ingest_publisher_timing_flake.md`. |
These two leftovers do NOT block batch 20: AZ-663 / AZ-664 (movement_detector) and AZ-670 / AZ-671 (semantic_analyzer) — the actual candidates per `_docs/02_tasks/_dependencies_table.md` — do not touch `frame_ingest`.
---
@@ -121,4 +155,4 @@ Per `implement/SKILL.md` Step 12, `In Testing` is set post-commit and signals "d
## Next Batch
**Hold** — autodev will NOT auto-chain to batch 20 selection. The user must satisfy the batch-19 test gate first (run `cargo test --workspace` after OpenCV is locally / CI installable) so batch 20 does not build on unverified code.
Batch-19 test gate is **GREEN**. Ready to auto-chain to batch 20 selection at the next autodev tick.