# Jetson e2e Harness — Operator Setup AZ-615 / AZ-602 cycle-2. Documents the one-time operator-side setup that makes `scripts/run-tests-jetson.sh` work against a Jetson Orin Nano reachable from the developer Mac over SSH. ## Why a separate Jetson harness exists The Colima/Tier-1 smoke harness (`docker-compose.test.yml` + `tests/e2e/Dockerfile`) verifies wiring, env config, fixture loading, auto-sync, and JSONL schema — everything UP TO the GPU boundary. But all three C7 inference strategies (`pytorch_fp16_runtime.py`, `tensorrt_runtime.py`, `onnx_trt_ep_runtime.py`) are CUDA-only by design (`model.half().cuda()` on `pytorch_fp16_runtime.py:189`, no CPU fallback). The full Reality Gate — including C3 matcher + C7 inference — therefore needs a CUDA-capable host. The Jetson harness runs the same test tree (`tests/e2e/`) on the Jetson with `GPS_DENIED_TIER=2`, which turns OFF the auto-skip for `@pytest.mark.tier2` tests (see `tests/conftest.py:31-44`). ## Hardware contract Operator-confirmed environment (2026-05-17): * Jetson Orin Nano dev kit * JetPack 6.2.2+b24 * L4T R36.5.0 (Jan 2026) * nvidia-container-toolkit 1.16.2 * ≥ 30 GB free on `/var/lib/docker` (l4t-pytorch base image ~7 GB + build cache + fixture volumes) * Swap enabled (Orin Nano has 8 GB RAM; PyTorch + TensorRT loads spike) ## One-time setup ### 1. SSH key + alias (on the Mac) ```bash # Generate a dedicated keypair (separate from your daily-dev key). # This command produces BOTH halves in one go: # ~/.ssh/id_ed25519_jetson_e2e — private (keep secret, never share) # ~/.ssh/id_ed25519_jetson_e2e.pub — public (push to Jetson below) ssh-keygen -t ed25519 -a 100 -f ~/.ssh/id_ed25519_jetson_e2e \ -C "jetson-e2e $(date +%Y-%m-%d)" # Push the public half to the Jetson (asks for the Jetson password once). # Add `-p ` if the Jetson's sshd listens on a non-default port: ssh-copy-id -i ~/.ssh/id_ed25519_jetson_e2e.pub @ # or with a custom port: # ssh-copy-id -p -i ~/.ssh/id_ed25519_jetson_e2e.pub @ # Verify the Jetson's host key (run this ON the Jetson, via HDMI/serial, # not over the LAN you're about to trust): # ssh-keygen -lf /etc/ssh/ssh_host_ed25519_key.pub # Then compare against what the Mac sees on first connect. Accept only # if they match. # Wire up ~/.ssh/config (gitignored, never committed). Add `Port ` # if the Jetson's sshd listens on a non-default port. # # IMPORTANT: the leading blank line inside the heredoc is intentional. # Without it, the appended block can fuse onto the previous file line # (`IdentitiesOnly yesHost jetson-e2e` was a real failure mode). cat >> ~/.ssh/config <<'EOF' Host jetson-e2e HostName User Port 22 IdentityFile ~/.ssh/id_ed25519_jetson_e2e IdentitiesOnly yes AddKeysToAgent yes UseKeychain yes StrictHostKeyChecking accept-new ServerAliveInterval 30 ServerAliveCountMax 4 EOF # Cache the passphrase into macOS Keychain (one-time) ssh-add --apple-use-keychain ~/.ssh/id_ed25519_jetson_e2e ``` ### 2. Restrict the key's scope on the Jetson (recommended) Edit `~/.ssh/authorized_keys` on the Jetson and prefix the line that the `ssh-copy-id` step appended: ``` from="",no-port-forwarding,no-X11-forwarding,no-agent-forwarding ssh-ed25519 AAAA… jetson-e2e ``` Optionally lock to "only run the e2e driver" by adding `command="docker compose -f /home/jetson/gps-denied-onboard/docker-compose.test.jetson.yml up --abort-on-container-exit"` — the key can't get a general shell, only invoke that one command. ### 3. Harden sshd (optional, recommended for an exposed test rig) On the Jetson, create `/etc/ssh/sshd_config.d/10-e2e.conf`: ``` PasswordAuthentication no PermitRootLogin no PubkeyAuthentication yes ``` Then `sudo systemctl reload ssh`. ### 4. Verify the Jetson Docker + GPU pipeline `nvidia-container-runtime` mounts `nvidia-smi` + CUDA libs from the host into the container at runtime, so a tiny base image works for the smoke test (no need to pull the 5 GB `l4t-jetpack` image just to check GPU exposure): ```bash ssh jetson-e2e 'docker run --rm --runtime=nvidia --gpus all \ ubuntu:22.04 nvidia-smi' ``` Expected output: an `nvidia-smi`-style table listing the Orin GPU. If this fails with "could not select device driver \"nvidia\"" or "no GPU devices", reinstall `nvidia-container-toolkit` and `sudo systemctl restart docker`. If `nvidia-smi` works on the host directly but not inside a container, the problem is always nvidia-container-toolkit, not the driver. ### 5. Confirm disk + swap ```bash ssh jetson-e2e 'df -h /var/lib/docker && swapon --show && free -h' ``` Need ≥ 30 GB free on `/var/lib/docker`. Swap should be at least 4 GB (JetPack default is 4 GB zram). ## Running the harness From the developer Mac, repo root: ```bash bash scripts/run-tests-jetson.sh ``` What happens: 1. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`, `__pycache__`, build artefacts; LFS pointers transfer as text). 2. `ssh jetson-e2e docker compose -f docker-compose.test.jetson.yml build e2e-runner` 3. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner` 4. stdout / stderr stream to the Mac terminal; exit code propagates. Override the alias or remote dir if your setup differs: ```bash JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \ bash scripts/run-tests-jetson.sh ``` ## Smoke vs. Reality Gate split — at a glance | Test category | Marker | Colima (Tier-1) | Jetson (Tier-2) | |---------------|--------|-----------------|-----------------| | AC-4a AST scan | (none) | runs | runs | | AC-4b byte-equality | (none) | runs | runs | | AC-7 skip-gate self-check | (none) | runs | runs | | AC-9 helper unit tests | (none) | runs | runs | | AC-1 / AC-2 / AC-3 / AC-5 / AC-6 (heavy) | `tier2` | **SKIPPED** | runs | | AC-8 operator workflow | `skip` (AZ-616 blocks) | skipped | skipped | `GPS_DENIED_TIER` env var controls the auto-skip: * `GPS_DENIED_TIER=1` (Colima default) → `tier2` / `gpu` / `docker` marked tests auto-skipped via `tests/conftest.py:31-44`. * `GPS_DENIED_TIER=2` (Jetson default) → all markers active; everything runs (subject to other skip gates like `RUN_REPLAY_E2E`). ## Troubleshooting | Symptom | Likely cause | Fix | |---------|--------------|-----| | `cannot reach 'ssh jetson-e2e' non-interactively` | Agent isn't unlocked or key not in `authorized_keys` | `ssh-add -l` on Mac; check `~/.ssh/authorized_keys` on Jetson | | `docker: Error response from daemon: could not select device driver "nvidia"` | nvidia-container-toolkit missing or daemon not restarted after install | `sudo apt install nvidia-container-toolkit && sudo systemctl restart docker` | | `torch.cuda.is_available() == False` inside the container | `runtime: nvidia` block missing, or building on x86 host | Verify `docker-compose.test.jetson.yml` has `runtime: nvidia`; rebuild on the Jetson | | `replay.auto_sync.ac8_validation_failed` | AZ-614 (tlog time-base mismatch) — not a harness bug | Fix AZ-614 in `tests/e2e/replay/_tlog_synth.py` | | `not found` / `tag not found` on `nvcr.io/nvidia/l4t-base:r36.*` | `l4t-base` was deprecated in JetPack 6 | use `l4t-jetpack:r36.4.0` for smoke tests; the harness itself uses `dustynv/l4t-pytorch:r36.4.0` | | `pull access denied for nvcr.io/nvidia/...` | NGC requires login for some tags | `docker login nvcr.io` (use NGC API key from developer.nvidia.com) | ## Related Jira * AZ-615 — this harness (Jetson runner story) * AZ-616 — replace `mock-sat` with real `../satellite-provider` service * AZ-617 — mark heavy ACs with `tier2` (already applied; this story documents and verifies the auto-skip) * AZ-614 — tlog time-base mismatch (currently blocks the heavy ACs from reaching the GPU stage) * AZ-602 — parent Epic: E2E Tier-1 harness rehabilitation