Files
gps-denied-onboard/tests/e2e/Dockerfile.jetson
T
Oleksandr Bezdieniezhnykh 6586208f83 [AZ-615] Fix Jetson harness base image (l4t-base/l4t-pytorch tags don't exist)
Operator-reported: `nvcr.io/nvidia/l4t-base:r36.4.0` fails to pull.
Investigation against the live registries confirmed:

  * `nvcr.io/nvidia/l4t-base` — deprecated in JetPack 6, no r36 tags
    (forum thread "L4T Base docker image for Jetpack 6.2 (r36.4.3)",
    GitHub dusty-nv/jetson-containers#883).
  * `nvcr.io/nvidia/l4t-pytorch` — no r36 tags at all. Newest is
    r35.2.1-pth2.0-py3 (too old for our torch>=2.2 floor).
  * `nvcr.io/nvidia/l4t-jetpack:r36.4.0` — exists but ships no PyTorch.
  * `dustynv/l4t-pytorch:r36.4.0` (Docker Hub) — exists, ~6.3 GB ARM64,
    PyTorch + torchvision + opencv pre-baked, maintained by dusty-nv
    (NVIDIA's Jetson containers maintainer).

Switched Dockerfile.jetson base to `dustynv/l4t-pytorch:r36.4.0`.
Forward-compatible with the host's R36.5 BSP (NVIDIA containers
tolerate one minor BSP ahead on the host side).

Setup doc fixes:
  * smoke-test command now uses `l4t-jetpack:r36.4.0` (the official
    replacement for the deprecated `l4t-base`)
  * keygen step explicitly states it produces BOTH halves (private +
    .pub) in one go
  * ssh-copy-id + ssh config show how to specify a custom port
  * troubleshooting table gets a new row for the `l4t-base not found`
    case so the next dev hits the answer in 30 seconds

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-18 02:02:26 +03:00

96 lines
4.5 KiB
Docker

# Tier-2 e2e-runner image — Jetson Orin Nano (JetPack 6.x, L4T R36.x).
#
# AZ-615: companion image to `tests/e2e/Dockerfile` (Colima/Tier-1 smoke
# harness) that runs the full Reality Gate — including C3 matcher + C7
# inference — against a CUDA-capable GPU.
#
# Hardware contract (operator-confirmed, 2026-05-17):
# * Jetson Orin Nano, JetPack 6.2.2+b24, L4T R36.5.0
# * nvidia-container-toolkit ≥ 1.16
# * `docker run --runtime=nvidia ... nvidia-smi` returns the GPU
#
# Image layout mirrors the Colima Dockerfile (so AC-4 AST scan + bind
# mounts work the same way):
# /opt/pyproject.toml
# /opt/src/gps_denied_onboard/... (SUT package, editable install)
# /opt/tests/... (bind-mounted from host)
# /opt/_docs/00_problem/input_data/ (bind-mounted from host)
#
# Build context is the repo root (see `docker-compose.test.jetson.yml`
# → `services.e2e-runner.build.context`).
#
# BUILD HOST: this image MUST be built ON the Jetson — cross-building
# from x86 macOS produces images that miss Tegra-specific shared libs
# the nvidia-container-runtime later mounts at run time.
# ---------------------------------------------------------------------------
# Base — dustynv/l4t-pytorch ships JetPack runtime + PyTorch wheel for `.cuda()`
#
# Tag selection rationale (verified 2026-05-17 against the live registries):
#
# - `nvcr.io/nvidia/l4t-base` was deprecated in JetPack 6 (forums:
# "L4T Base docker image for Jetpack 6.2 (r36.4.3)" / Issue #883 in
# dusty-nv/jetson-containers). The image no longer publishes r36 tags.
# - `nvcr.io/nvidia/l4t-pytorch` has NO r36 tags published. The newest
# official l4t-pytorch tag is r35.2.1-pth2.0-py3 — too old for our
# torch >= 2.2 floor in pyproject.toml `[inference]`.
# - `nvcr.io/nvidia/l4t-jetpack:r36.4.0` exists (CUDA + cuDNN + TensorRT
# bundled) but ships NO PyTorch — we'd have to install the Jetson
# PyTorch wheel from developer.download.nvidia.com manually.
# - `dustynv/l4t-pytorch:r36.4.0` (Docker Hub) is the de-facto Jetson
# PyTorch image: maintained by dusty-nv (NVIDIA's Jetson containers
# maintainer), bakes torch / torchvision / opencv / ONNX runtime for
# JetPack 6, ARM64, ~6.3 GB. Forward-compatible with the host's
# slightly newer R36.5 BSP (NVIDIA containers tolerate one minor BSP
# ahead on the host side).
#
# Verify availability before build:
# docker pull dustynv/l4t-pytorch:r36.4.0
FROM dustynv/l4t-pytorch:r36.4.0 AS runtime
ARG DEBIAN_FRONTEND=noninteractive
# System deps mirror tests/e2e/Dockerfile + the Jetson runtime stack:
# * build-essential / libpq-dev / libspatialindex-dev — same as Colima
# * python3-pip / python3-venv — l4t-pytorch ships python but not always venv
# * libgl1 + libglib2.0-0 — OpenCV runtime libs (same reason as Colima)
# * libpq5 + libspatialindex-c6 — runtime side of psycopg + rtree
# Note: CUDA / cuDNN / TensorRT come pre-baked in the base image — do NOT
# attempt to apt-install them (would conflict with the Tegra-specific libs
# the runtime mounts).
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
build-essential \
libpq-dev \
libspatialindex-dev \
libpq5 \
libspatialindex-c6 \
libgl1 \
libglib2.0-0 \
python3-pip \
python3-venv \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /opt
# Editable SUT install. Skipping the `[inference]` extra because PyTorch +
# torchvision are already provided by the l4t-pytorch base image with
# Tegra-specific CUDA builds; reinstalling them from PyPI would clobber
# the Tegra wheels with x86-compatible ones that lack the cuDNN / cuBLAS
# linkage required by Orin.
COPY pyproject.toml README.md ./
COPY src ./src
# `--break-system-packages` is needed because the l4t-pytorch base image
# uses an externally-managed Python environment (PEP 668). The alternative
# would be to layer a venv on top of the pre-installed torch, but that
# would shadow the Tegra-tuned torch wheel and break `.cuda()`. The image
# IS the environment; embracing system-pip is the path of least drift.
RUN pip3 install --no-cache-dir --break-system-packages -e ".[dev]"
# ENTRYPOINT mirrors the Colima Dockerfile — pytest discovers both
# `tests/e2e/replay/` (heavy tier2 ACs run with GPS_DENIED_TIER=2) and
# any future `tests/e2e/scenarios/` additions. Rootdir resolves to /opt
# via the COPY'd pyproject.toml so `from tests.e2e.replay._helpers import ...`
# works inside the test files.
ENTRYPOINT ["pytest", "-q", "/opt/tests/e2e/"]