[AZ-615] [AZ-617] Add Jetson e2e harness + tier2 marks

C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime) is CUDA-only by design — `model.half().cuda()` is hard-wired with no CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3 matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch and the pipeline reaches those stages, Colima runs would hard-fail at `.cuda()` instead of cleanly skipping. This commit lays down the Jetson companion harness and wires the existing `tier2` auto-skip: * tests/e2e/Dockerfile.jetson — l4t-pytorch:r36.4.0-pth2.3-py3 base, same /opt layout as the Colima image so AC-4 AST scan + bind mounts work identically. Built ON the Jetson via run-tests-jetson.sh. * docker-compose.test.jetson.yml — mirrors docker-compose.test.yml but with `runtime: nvidia`, GPU device exposure, and GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip). * scripts/run-tests-jetson.sh — rsync → ssh build → ssh up, exit-code-from e2e-runner so the local exit code reflects the remote test verdict. No credentials in the repo; uses `ssh jetson-e2e` alias resolved via ~/.ssh/config. * _docs/03_implementation/jetson_harness_setup.md — one-time SSH key + alias + sshd hardening + GPU verification steps. Documents the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch. AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1, AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py. Reuses the existing tier2 marker + auto-skip in tests/conftest.py (scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9 stay unmarked — they don't touch CUDA. Defers to follow-up Jira: * AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs actually reaching the GPU stage on the Jetson) * AZ-616 — replace mock-sat with real ../satellite-provider service Not run yet: the harness needs operator-side SSH setup to come online before scripts/run-tests-jetson.sh can be executed end-to-end. Setup steps documented in jetson_harness_setup.md. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 19:21:12 +00:00 · 2026-05-18 01:57:23 +03:00
parent c2934b8686
commit 9c13ab3bd0
5 changed files with 477 additions and 0 deletions
@@ -0,0 +1,115 @@
+#!/usr/bin/env bash
+# AZ-615: drive the Tier-2 Reality Gate e2e harness on a remote Jetson.
+#
+# Runs from the developer Mac. Assumes:
+#   * `ssh jetson-e2e` works via key auth + ~/.ssh/config (see
+#     _docs/03_implementation/jetson_harness_setup.md for one-time setup).
+#   * The Jetson has docker + nvidia-container-toolkit + ≥ 30 GB free on
+#     /var/lib/docker.
+#
+# Flow:
+#   1. rsync the working tree to the Jetson under ~/gps-denied-onboard/
+#      (excluding .git, LFS pointers, build artefacts).
+#   2. ssh into the Jetson and `docker compose build` the e2e-runner image
+#      against tests/e2e/Dockerfile.jetson.
+#   3. ssh again and `docker compose up --abort-on-container-exit
+#      --exit-code-from e2e-runner` so the local exit code reflects the
+#      remote test verdict.
+#   4. stdout / stderr stream back to the Mac terminal.
+#
+# Exit code propagates the docker-compose exit code (which == the
+# e2e-runner container's exit code, which == pytest's verdict).
+
+set -euo pipefail
+
+# ----------------------------------------------------------------------
+# Configuration
+
+SSH_ALIAS="${JETSON_SSH_ALIAS:-jetson-e2e}"
+REMOTE_DIR="${JETSON_REMOTE_DIR:-~/gps-denied-onboard}"
+COMPOSE_FILE="docker-compose.test.jetson.yml"
+
+# Repo root regardless of where the script is invoked from.
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
+
+# ----------------------------------------------------------------------
+# Pre-flight
+
+if ! command -v rsync >/dev/null 2>&1; then
+    echo "ERROR: rsync not on PATH — install with 'brew install rsync' or apt" >&2
+    exit 64
+fi
+
+if ! ssh -o BatchMode=yes -o ConnectTimeout=5 "${SSH_ALIAS}" true 2>/dev/null; then
+    cat >&2 <<EOF
+ERROR: cannot reach 'ssh ${SSH_ALIAS}' non-interactively. Configure
+       ~/.ssh/config + agent-based key auth per
+       _docs/03_implementation/jetson_harness_setup.md.
+EOF
+    exit 65
+fi
+
+echo "[run-tests-jetson] using ssh alias: ${SSH_ALIAS}"
+echo "[run-tests-jetson] remote dir:      ${REMOTE_DIR}"
+echo "[run-tests-jetson] compose file:    ${COMPOSE_FILE}"
+
+# ----------------------------------------------------------------------
+# Step 1: sync source
+
+# Exclusions kept deliberately narrow — we want the full src/, tests/,
+# _docs/, docker-compose*.yml, scripts/, pyproject.toml. We exclude:
+#   * .git — huge, no value on the Jetson
+#   * __pycache__ / *.pyc — host-arch bytecode, regenerated on Jetson
+#   * _build / build / dist — local CMake / setuptools output trees
+#   * node_modules — frontend artefacts, not needed by the harness
+#   * .venv / venv — host venv, would clobber the Jetson's Python env
+#   * .DS_Store — macOS metadata
+#   * *.tlog / *.bin / *.engine — large fixtures that exist on Jetson
+#     either via a separate fixture-sync step or are produced by the SUT
+# Git LFS pointers (134 B files) DO transfer — they're text. The
+# Jetson runs `git lfs pull` lazily for any LFS-tracked fixture it
+# actually needs.
+echo "[run-tests-jetson] rsync → ${SSH_ALIAS}:${REMOTE_DIR}/"
+rsync -avz --delete \
+    --exclude=.git/ \
+    --exclude='__pycache__/' \
+    --exclude='*.pyc' \
+    --exclude=_build/ \
+    --exclude=build/ \
+    --exclude=dist/ \
+    --exclude=node_modules/ \
+    --exclude=.venv/ \
+    --exclude=venv/ \
+    --exclude=.DS_Store \
+    --exclude='*.engine' \
+    "${REPO_ROOT}/" "${SSH_ALIAS}:${REMOTE_DIR}/"
+
+# ----------------------------------------------------------------------
+# Step 2: build the e2e-runner image on the Jetson
+
+# The image MUST be built on the Jetson — see Dockerfile.jetson comment
+# about Tegra-specific libs.
+echo "[run-tests-jetson] docker compose build e2e-runner (on Jetson)"
+# shellcheck disable=SC2087  # we want the heredoc to expand on the local side
+ssh "${SSH_ALIAS}" bash -s <<EOF
+set -euo pipefail
+cd "${REMOTE_DIR}"
+docker compose -f "${COMPOSE_FILE}" build e2e-runner
+EOF
+
+# ----------------------------------------------------------------------
+# Step 3: run
+
+# `--abort-on-container-exit` plus `--exit-code-from e2e-runner` makes
+# docker-compose propagate the runner's exit code, which we propagate
+# back to the local terminal via `ssh` returning that code. So `bash
+# scripts/run-tests-jetson.sh && echo OK` does the right thing locally.
+echo "[run-tests-jetson] docker compose up e2e-runner (on Jetson)"
+ssh "${SSH_ALIAS}" bash -s <<EOF
+set -euo pipefail
+cd "${REMOTE_DIR}"
+exec docker compose -f "${COMPOSE_FILE}" up \
+    --abort-on-container-exit \
+    --exit-code-from e2e-runner
+EOF