Files
gps-denied-onboard/scripts/run-tests-jetson.sh
T
Oleksandr Bezdieniezhnykh 9c13ab3bd0 [AZ-615] [AZ-617] Add Jetson e2e harness + tier2 marks
C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime)
is CUDA-only by design — `model.half().cuda()` is hard-wired with no
CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3
matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch
and the pipeline reaches those stages, Colima runs would hard-fail at
`.cuda()` instead of cleanly skipping.

This commit lays down the Jetson companion harness and wires the
existing `tier2` auto-skip:

  * tests/e2e/Dockerfile.jetson  — l4t-pytorch:r36.4.0-pth2.3-py3 base,
    same /opt layout as the Colima image so AC-4 AST scan + bind mounts
    work identically. Built ON the Jetson via run-tests-jetson.sh.
  * docker-compose.test.jetson.yml — mirrors docker-compose.test.yml
    but with `runtime: nvidia`, GPU device exposure, and
    GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip).
  * scripts/run-tests-jetson.sh — rsync → ssh build → ssh up,
    exit-code-from e2e-runner so the local exit code reflects the
    remote test verdict. No credentials in the repo; uses
    `ssh jetson-e2e` alias resolved via ~/.ssh/config.
  * _docs/03_implementation/jetson_harness_setup.md — one-time SSH
    key + alias + sshd hardening + GPU verification steps. Documents
    the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch.

AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1,
AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py.
Reuses the existing tier2 marker + auto-skip in tests/conftest.py
(scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9
stay unmarked — they don't touch CUDA.

Defers to follow-up Jira:

  * AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs
    actually reaching the GPU stage on the Jetson)
  * AZ-616 — replace mock-sat with real ../satellite-provider service

Not run yet: the harness needs operator-side SSH setup to come online
before scripts/run-tests-jetson.sh can be executed end-to-end. Setup
steps documented in jetson_harness_setup.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-18 01:57:23 +03:00

116 lines
4.4 KiB
Bash
Executable File

#!/usr/bin/env bash
# AZ-615: drive the Tier-2 Reality Gate e2e harness on a remote Jetson.
#
# Runs from the developer Mac. Assumes:
# * `ssh jetson-e2e` works via key auth + ~/.ssh/config (see
# _docs/03_implementation/jetson_harness_setup.md for one-time setup).
# * The Jetson has docker + nvidia-container-toolkit + ≥ 30 GB free on
# /var/lib/docker.
#
# Flow:
# 1. rsync the working tree to the Jetson under ~/gps-denied-onboard/
# (excluding .git, LFS pointers, build artefacts).
# 2. ssh into the Jetson and `docker compose build` the e2e-runner image
# against tests/e2e/Dockerfile.jetson.
# 3. ssh again and `docker compose up --abort-on-container-exit
# --exit-code-from e2e-runner` so the local exit code reflects the
# remote test verdict.
# 4. stdout / stderr stream back to the Mac terminal.
#
# Exit code propagates the docker-compose exit code (which == the
# e2e-runner container's exit code, which == pytest's verdict).
set -euo pipefail
# ----------------------------------------------------------------------
# Configuration
SSH_ALIAS="${JETSON_SSH_ALIAS:-jetson-e2e}"
REMOTE_DIR="${JETSON_REMOTE_DIR:-~/gps-denied-onboard}"
COMPOSE_FILE="docker-compose.test.jetson.yml"
# Repo root regardless of where the script is invoked from.
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# ----------------------------------------------------------------------
# Pre-flight
if ! command -v rsync >/dev/null 2>&1; then
echo "ERROR: rsync not on PATH — install with 'brew install rsync' or apt" >&2
exit 64
fi
if ! ssh -o BatchMode=yes -o ConnectTimeout=5 "${SSH_ALIAS}" true 2>/dev/null; then
cat >&2 <<EOF
ERROR: cannot reach 'ssh ${SSH_ALIAS}' non-interactively. Configure
~/.ssh/config + agent-based key auth per
_docs/03_implementation/jetson_harness_setup.md.
EOF
exit 65
fi
echo "[run-tests-jetson] using ssh alias: ${SSH_ALIAS}"
echo "[run-tests-jetson] remote dir: ${REMOTE_DIR}"
echo "[run-tests-jetson] compose file: ${COMPOSE_FILE}"
# ----------------------------------------------------------------------
# Step 1: sync source
# Exclusions kept deliberately narrow — we want the full src/, tests/,
# _docs/, docker-compose*.yml, scripts/, pyproject.toml. We exclude:
# * .git — huge, no value on the Jetson
# * __pycache__ / *.pyc — host-arch bytecode, regenerated on Jetson
# * _build / build / dist — local CMake / setuptools output trees
# * node_modules — frontend artefacts, not needed by the harness
# * .venv / venv — host venv, would clobber the Jetson's Python env
# * .DS_Store — macOS metadata
# * *.tlog / *.bin / *.engine — large fixtures that exist on Jetson
# either via a separate fixture-sync step or are produced by the SUT
# Git LFS pointers (134 B files) DO transfer — they're text. The
# Jetson runs `git lfs pull` lazily for any LFS-tracked fixture it
# actually needs.
echo "[run-tests-jetson] rsync → ${SSH_ALIAS}:${REMOTE_DIR}/"
rsync -avz --delete \
--exclude=.git/ \
--exclude='__pycache__/' \
--exclude='*.pyc' \
--exclude=_build/ \
--exclude=build/ \
--exclude=dist/ \
--exclude=node_modules/ \
--exclude=.venv/ \
--exclude=venv/ \
--exclude=.DS_Store \
--exclude='*.engine' \
"${REPO_ROOT}/" "${SSH_ALIAS}:${REMOTE_DIR}/"
# ----------------------------------------------------------------------
# Step 2: build the e2e-runner image on the Jetson
# The image MUST be built on the Jetson — see Dockerfile.jetson comment
# about Tegra-specific libs.
echo "[run-tests-jetson] docker compose build e2e-runner (on Jetson)"
# shellcheck disable=SC2087 # we want the heredoc to expand on the local side
ssh "${SSH_ALIAS}" bash -s <<EOF
set -euo pipefail
cd "${REMOTE_DIR}"
docker compose -f "${COMPOSE_FILE}" build e2e-runner
EOF
# ----------------------------------------------------------------------
# Step 3: run
# `--abort-on-container-exit` plus `--exit-code-from e2e-runner` makes
# docker-compose propagate the runner's exit code, which we propagate
# back to the local terminal via `ssh` returning that code. So `bash
# scripts/run-tests-jetson.sh && echo OK` does the right thing locally.
echo "[run-tests-jetson] docker compose up e2e-runner (on Jetson)"
ssh "${SSH_ALIAS}" bash -s <<EOF
set -euo pipefail
cd "${REMOTE_DIR}"
exec docker compose -f "${COMPOSE_FILE}" up \
--abort-on-container-exit \
--exit-code-from e2e-runner
EOF