From 662327ce32832b3724d520c0ecdd7d2a560ff868 Mon Sep 17 00:00:00 2001 From: Oleksandr Bezdieniezhnykh Date: Mon, 18 May 2026 07:39:31 +0300 Subject: [PATCH] [AZ-615] Jetson setup doc: heredoc fix + cheaper smoke test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two doc lessons learned from on-Jetson verification: 1. The `cat >> ~/.ssh/config <<'EOF'` heredoc needs a leading blank line. Without it, the appended block fused onto the previous file line and produced "unsupported option yesHost" at parse time. Added an explicit blank line + comment. 2. The smoke test for nvidia-container-runtime doesn't need a 5 GB l4t-jetpack pull — nvidia-container-runtime mounts nvidia-smi from the host into any container, so `ubuntu:22.04 nvidia-smi` (80 MB) is sufficient. Switched the doc. Operator verified end-to-end: * `ssh jetson-e2e true` works from both terminal and Cursor Shell * `jetson` user already in `docker` group (no sudo needed) * `docker run --runtime=nvidia ubuntu:22.04 nvidia-smi` returns Orin GPU info inside the container Co-authored-by: Cursor --- .../03_implementation/jetson_harness_setup.md | 28 +++++++++++-------- 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/_docs/03_implementation/jetson_harness_setup.md b/_docs/03_implementation/jetson_harness_setup.md index d31d416..5f9cafd 100644 --- a/_docs/03_implementation/jetson_harness_setup.md +++ b/_docs/03_implementation/jetson_harness_setup.md @@ -58,7 +58,12 @@ ssh-copy-id -i ~/.ssh/id_ed25519_jetson_e2e.pub @ # Wire up ~/.ssh/config (gitignored, never committed). Add `Port ` # if the Jetson's sshd listens on a non-default port. +# +# IMPORTANT: the leading blank line inside the heredoc is intentional. +# Without it, the appended block can fuse onto the previous file line +# (`IdentitiesOnly yesHost jetson-e2e` was a real failure mode). cat >> ~/.ssh/config <<'EOF' + Host jetson-e2e HostName User @@ -67,7 +72,7 @@ Host jetson-e2e IdentitiesOnly yes AddKeysToAgent yes UseKeychain yes - StrictHostKeyChecking yes + StrictHostKeyChecking accept-new ServerAliveInterval 30 ServerAliveCountMax 4 EOF @@ -103,24 +108,23 @@ Then `sudo systemctl reload ssh`. ### 4. Verify the Jetson Docker + GPU pipeline -`nvcr.io/nvidia/l4t-base` was deprecated in JetPack 6 — use -`l4t-jetpack` (the official replacement) for the smoke test: +`nvidia-container-runtime` mounts `nvidia-smi` + CUDA libs from the +host into the container at runtime, so a tiny base image works for the +smoke test (no need to pull the 5 GB `l4t-jetpack` image just to check +GPU exposure): ```bash ssh jetson-e2e 'docker run --rm --runtime=nvidia --gpus all \ - nvcr.io/nvidia/l4t-jetpack:r36.4.0 nvidia-smi' + ubuntu:22.04 nvidia-smi' ``` Expected output: an `nvidia-smi`-style table listing the Orin GPU. If -this fails with "runtime not found" or "no GPU devices", install -`nvidia-container-toolkit` and `sudo systemctl restart docker`. If it -fails with `pull access denied`, run `docker login nvcr.io` once (NGC -API key from developer.nvidia.com — most public images don't require -auth, but the registry sometimes prompts). +this fails with "could not select device driver \"nvidia\"" or "no GPU +devices", reinstall `nvidia-container-toolkit` and +`sudo systemctl restart docker`. -If `nvidia-smi` works on the host directly (it does — driver 540.5.0, -CUDA 12.6, Orin detected) but the container can't see the GPU, the -problem is always nvidia-container-toolkit, not the driver. +If `nvidia-smi` works on the host directly but not inside a container, +the problem is always nvidia-container-toolkit, not the driver. ### 5. Confirm disk + swap