mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 07:11:13 +00:00
[AZ-615] Fix Jetson harness base image (l4t-base/l4t-pytorch tags don't exist)
Operator-reported: `nvcr.io/nvidia/l4t-base:r36.4.0` fails to pull.
Investigation against the live registries confirmed:
* `nvcr.io/nvidia/l4t-base` — deprecated in JetPack 6, no r36 tags
(forum thread "L4T Base docker image for Jetpack 6.2 (r36.4.3)",
GitHub dusty-nv/jetson-containers#883).
* `nvcr.io/nvidia/l4t-pytorch` — no r36 tags at all. Newest is
r35.2.1-pth2.0-py3 (too old for our torch>=2.2 floor).
* `nvcr.io/nvidia/l4t-jetpack:r36.4.0` — exists but ships no PyTorch.
* `dustynv/l4t-pytorch:r36.4.0` (Docker Hub) — exists, ~6.3 GB ARM64,
PyTorch + torchvision + opencv pre-baked, maintained by dusty-nv
(NVIDIA's Jetson containers maintainer).
Switched Dockerfile.jetson base to `dustynv/l4t-pytorch:r36.4.0`.
Forward-compatible with the host's R36.5 BSP (NVIDIA containers
tolerate one minor BSP ahead on the host side).
Setup doc fixes:
* smoke-test command now uses `l4t-jetpack:r36.4.0` (the official
replacement for the deprecated `l4t-base`)
* keygen step explicitly states it produces BOTH halves (private +
.pub) in one go
* ssh-copy-id + ssh config show how to specify a custom port
* troubleshooting table gets a new row for the `l4t-base not found`
case so the next dev hits the answer in 30 seconds
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -37,12 +37,18 @@ Operator-confirmed environment (2026-05-17):
|
||||
### 1. SSH key + alias (on the Mac)
|
||||
|
||||
```bash
|
||||
# Generate a dedicated keypair (separate from your daily-dev key)
|
||||
# Generate a dedicated keypair (separate from your daily-dev key).
|
||||
# This command produces BOTH halves in one go:
|
||||
# ~/.ssh/id_ed25519_jetson_e2e — private (keep secret, never share)
|
||||
# ~/.ssh/id_ed25519_jetson_e2e.pub — public (push to Jetson below)
|
||||
ssh-keygen -t ed25519 -a 100 -f ~/.ssh/id_ed25519_jetson_e2e \
|
||||
-C "jetson-e2e $(date +%Y-%m-%d)"
|
||||
|
||||
# Push the public half to the Jetson (asks for the Jetson password once)
|
||||
# Push the public half to the Jetson (asks for the Jetson password once).
|
||||
# Add `-p <port>` if the Jetson's sshd listens on a non-default port:
|
||||
ssh-copy-id -i ~/.ssh/id_ed25519_jetson_e2e.pub <jetson-user>@<jetson-ip>
|
||||
# or with a custom port:
|
||||
# ssh-copy-id -p <port> -i ~/.ssh/id_ed25519_jetson_e2e.pub <jetson-user>@<jetson-ip>
|
||||
|
||||
# Verify the Jetson's host key (run this ON the Jetson, via HDMI/serial,
|
||||
# not over the LAN you're about to trust):
|
||||
@@ -50,11 +56,13 @@ ssh-copy-id -i ~/.ssh/id_ed25519_jetson_e2e.pub <jetson-user>@<jetson-ip>
|
||||
# Then compare against what the Mac sees on first connect. Accept only
|
||||
# if they match.
|
||||
|
||||
# Wire up ~/.ssh/config (gitignored, never committed)
|
||||
# Wire up ~/.ssh/config (gitignored, never committed). Add `Port <port>`
|
||||
# if the Jetson's sshd listens on a non-default port.
|
||||
cat >> ~/.ssh/config <<'EOF'
|
||||
Host jetson-e2e
|
||||
HostName <jetson-ip>
|
||||
User <jetson-user>
|
||||
Port 22
|
||||
IdentityFile ~/.ssh/id_ed25519_jetson_e2e
|
||||
IdentitiesOnly yes
|
||||
AddKeysToAgent yes
|
||||
@@ -95,14 +103,24 @@ Then `sudo systemctl reload ssh`.
|
||||
|
||||
### 4. Verify the Jetson Docker + GPU pipeline
|
||||
|
||||
`nvcr.io/nvidia/l4t-base` was deprecated in JetPack 6 — use
|
||||
`l4t-jetpack` (the official replacement) for the smoke test:
|
||||
|
||||
```bash
|
||||
ssh jetson-e2e 'docker run --rm --runtime=nvidia --gpus all \
|
||||
nvcr.io/nvidia/l4t-base:r36.4.0 nvidia-smi'
|
||||
nvcr.io/nvidia/l4t-jetpack:r36.4.0 nvidia-smi'
|
||||
```
|
||||
|
||||
Expected output: a `nvidia-smi`-style table listing the Orin GPU. If
|
||||
Expected output: an `nvidia-smi`-style table listing the Orin GPU. If
|
||||
this fails with "runtime not found" or "no GPU devices", install
|
||||
`nvidia-container-toolkit` and `sudo systemctl restart docker`.
|
||||
`nvidia-container-toolkit` and `sudo systemctl restart docker`. If it
|
||||
fails with `pull access denied`, run `docker login nvcr.io` once (NGC
|
||||
API key from developer.nvidia.com — most public images don't require
|
||||
auth, but the registry sometimes prompts).
|
||||
|
||||
If `nvidia-smi` works on the host directly (it does — driver 540.5.0,
|
||||
CUDA 12.6, Orin detected) but the container can't see the GPU, the
|
||||
problem is always nvidia-container-toolkit, not the driver.
|
||||
|
||||
### 5. Confirm disk + swap
|
||||
|
||||
@@ -162,7 +180,8 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \
|
||||
| `docker: Error response from daemon: could not select device driver "nvidia"` | nvidia-container-toolkit missing or daemon not restarted after install | `sudo apt install nvidia-container-toolkit && sudo systemctl restart docker` |
|
||||
| `torch.cuda.is_available() == False` inside the container | `runtime: nvidia` block missing, or building on x86 host | Verify `docker-compose.test.jetson.yml` has `runtime: nvidia`; rebuild on the Jetson |
|
||||
| `replay.auto_sync.ac8_validation_failed` | AZ-614 (tlog time-base mismatch) — not a harness bug | Fix AZ-614 in `tests/e2e/replay/_tlog_synth.py` |
|
||||
| `pull access denied for nvcr.io/nvidia/l4t-pytorch` | NGC requires login for some tags | `docker login nvcr.io` (use NGC API key from developer.nvidia.com) |
|
||||
| `not found` / `tag not found` on `nvcr.io/nvidia/l4t-base:r36.*` | `l4t-base` was deprecated in JetPack 6 | use `l4t-jetpack:r36.4.0` for smoke tests; the harness itself uses `dustynv/l4t-pytorch:r36.4.0` |
|
||||
| `pull access denied for nvcr.io/nvidia/...` | NGC requires login for some tags | `docker login nvcr.io` (use NGC API key from developer.nvidia.com) |
|
||||
|
||||
## Related Jira
|
||||
|
||||
|
||||
@@ -24,17 +24,29 @@
|
||||
# the nvidia-container-runtime later mounts at run time.
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Base — l4t-pytorch ships JetPack runtime + PyTorch wheel ready for `.cuda()`
|
||||
# Base — dustynv/l4t-pytorch ships JetPack runtime + PyTorch wheel for `.cuda()`
|
||||
#
|
||||
# Tag selection: NGC publishes l4t-pytorch on a slight lag from L4T BSP
|
||||
# releases. With BSP R36.5 on the device, the closest stable NGC tag at
|
||||
# author time is `r36.4.0-pth2.3-py3`. NVIDIA containers are
|
||||
# forward-compatible across one minor BSP (the container's userspace
|
||||
# can be slightly older than the host's L4T kernel). If a `r36.5.0-*`
|
||||
# tag is published, prefer it.
|
||||
# Tag selection rationale (verified 2026-05-17 against the live registries):
|
||||
#
|
||||
# Image lookup at run time: `docker manifest inspect nvcr.io/nvidia/l4t-pytorch:r36.4.0-pth2.3-py3`
|
||||
FROM nvcr.io/nvidia/l4t-pytorch:r36.4.0-pth2.3-py3 AS runtime
|
||||
# - `nvcr.io/nvidia/l4t-base` was deprecated in JetPack 6 (forums:
|
||||
# "L4T Base docker image for Jetpack 6.2 (r36.4.3)" / Issue #883 in
|
||||
# dusty-nv/jetson-containers). The image no longer publishes r36 tags.
|
||||
# - `nvcr.io/nvidia/l4t-pytorch` has NO r36 tags published. The newest
|
||||
# official l4t-pytorch tag is r35.2.1-pth2.0-py3 — too old for our
|
||||
# torch >= 2.2 floor in pyproject.toml `[inference]`.
|
||||
# - `nvcr.io/nvidia/l4t-jetpack:r36.4.0` exists (CUDA + cuDNN + TensorRT
|
||||
# bundled) but ships NO PyTorch — we'd have to install the Jetson
|
||||
# PyTorch wheel from developer.download.nvidia.com manually.
|
||||
# - `dustynv/l4t-pytorch:r36.4.0` (Docker Hub) is the de-facto Jetson
|
||||
# PyTorch image: maintained by dusty-nv (NVIDIA's Jetson containers
|
||||
# maintainer), bakes torch / torchvision / opencv / ONNX runtime for
|
||||
# JetPack 6, ARM64, ~6.3 GB. Forward-compatible with the host's
|
||||
# slightly newer R36.5 BSP (NVIDIA containers tolerate one minor BSP
|
||||
# ahead on the host side).
|
||||
#
|
||||
# Verify availability before build:
|
||||
# docker pull dustynv/l4t-pytorch:r36.4.0
|
||||
FROM dustynv/l4t-pytorch:r36.4.0 AS runtime
|
||||
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
# System deps mirror tests/e2e/Dockerfile + the Jetson runtime stack:
|
||||
|
||||
Reference in New Issue
Block a user