mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 19:21:12 +00:00
[AZ-615] [AZ-617] Add Jetson e2e harness + tier2 marks
C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime)
is CUDA-only by design — `model.half().cuda()` is hard-wired with no
CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3
matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch
and the pipeline reaches those stages, Colima runs would hard-fail at
`.cuda()` instead of cleanly skipping.
This commit lays down the Jetson companion harness and wires the
existing `tier2` auto-skip:
* tests/e2e/Dockerfile.jetson — l4t-pytorch:r36.4.0-pth2.3-py3 base,
same /opt layout as the Colima image so AC-4 AST scan + bind mounts
work identically. Built ON the Jetson via run-tests-jetson.sh.
* docker-compose.test.jetson.yml — mirrors docker-compose.test.yml
but with `runtime: nvidia`, GPU device exposure, and
GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip).
* scripts/run-tests-jetson.sh — rsync → ssh build → ssh up,
exit-code-from e2e-runner so the local exit code reflects the
remote test verdict. No credentials in the repo; uses
`ssh jetson-e2e` alias resolved via ~/.ssh/config.
* _docs/03_implementation/jetson_harness_setup.md — one-time SSH
key + alias + sshd hardening + GPU verification steps. Documents
the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch.
AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1,
AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py.
Reuses the existing tier2 marker + auto-skip in tests/conftest.py
(scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9
stay unmarked — they don't touch CUDA.
Defers to follow-up Jira:
* AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs
actually reaching the GPU stage on the Jetson)
* AZ-616 — replace mock-sat with real ../satellite-provider service
Not run yet: the harness needs operator-side SSH setup to come online
before scripts/run-tests-jetson.sh can be executed end-to-end. Setup
steps documented in jetson_harness_setup.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Executable
+115
@@ -0,0 +1,115 @@
|
||||
#!/usr/bin/env bash
|
||||
# AZ-615: drive the Tier-2 Reality Gate e2e harness on a remote Jetson.
|
||||
#
|
||||
# Runs from the developer Mac. Assumes:
|
||||
# * `ssh jetson-e2e` works via key auth + ~/.ssh/config (see
|
||||
# _docs/03_implementation/jetson_harness_setup.md for one-time setup).
|
||||
# * The Jetson has docker + nvidia-container-toolkit + ≥ 30 GB free on
|
||||
# /var/lib/docker.
|
||||
#
|
||||
# Flow:
|
||||
# 1. rsync the working tree to the Jetson under ~/gps-denied-onboard/
|
||||
# (excluding .git, LFS pointers, build artefacts).
|
||||
# 2. ssh into the Jetson and `docker compose build` the e2e-runner image
|
||||
# against tests/e2e/Dockerfile.jetson.
|
||||
# 3. ssh again and `docker compose up --abort-on-container-exit
|
||||
# --exit-code-from e2e-runner` so the local exit code reflects the
|
||||
# remote test verdict.
|
||||
# 4. stdout / stderr stream back to the Mac terminal.
|
||||
#
|
||||
# Exit code propagates the docker-compose exit code (which == the
|
||||
# e2e-runner container's exit code, which == pytest's verdict).
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Configuration
|
||||
|
||||
SSH_ALIAS="${JETSON_SSH_ALIAS:-jetson-e2e}"
|
||||
REMOTE_DIR="${JETSON_REMOTE_DIR:-~/gps-denied-onboard}"
|
||||
COMPOSE_FILE="docker-compose.test.jetson.yml"
|
||||
|
||||
# Repo root regardless of where the script is invoked from.
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Pre-flight
|
||||
|
||||
if ! command -v rsync >/dev/null 2>&1; then
|
||||
echo "ERROR: rsync not on PATH — install with 'brew install rsync' or apt" >&2
|
||||
exit 64
|
||||
fi
|
||||
|
||||
if ! ssh -o BatchMode=yes -o ConnectTimeout=5 "${SSH_ALIAS}" true 2>/dev/null; then
|
||||
cat >&2 <<EOF
|
||||
ERROR: cannot reach 'ssh ${SSH_ALIAS}' non-interactively. Configure
|
||||
~/.ssh/config + agent-based key auth per
|
||||
_docs/03_implementation/jetson_harness_setup.md.
|
||||
EOF
|
||||
exit 65
|
||||
fi
|
||||
|
||||
echo "[run-tests-jetson] using ssh alias: ${SSH_ALIAS}"
|
||||
echo "[run-tests-jetson] remote dir: ${REMOTE_DIR}"
|
||||
echo "[run-tests-jetson] compose file: ${COMPOSE_FILE}"
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Step 1: sync source
|
||||
|
||||
# Exclusions kept deliberately narrow — we want the full src/, tests/,
|
||||
# _docs/, docker-compose*.yml, scripts/, pyproject.toml. We exclude:
|
||||
# * .git — huge, no value on the Jetson
|
||||
# * __pycache__ / *.pyc — host-arch bytecode, regenerated on Jetson
|
||||
# * _build / build / dist — local CMake / setuptools output trees
|
||||
# * node_modules — frontend artefacts, not needed by the harness
|
||||
# * .venv / venv — host venv, would clobber the Jetson's Python env
|
||||
# * .DS_Store — macOS metadata
|
||||
# * *.tlog / *.bin / *.engine — large fixtures that exist on Jetson
|
||||
# either via a separate fixture-sync step or are produced by the SUT
|
||||
# Git LFS pointers (134 B files) DO transfer — they're text. The
|
||||
# Jetson runs `git lfs pull` lazily for any LFS-tracked fixture it
|
||||
# actually needs.
|
||||
echo "[run-tests-jetson] rsync → ${SSH_ALIAS}:${REMOTE_DIR}/"
|
||||
rsync -avz --delete \
|
||||
--exclude=.git/ \
|
||||
--exclude='__pycache__/' \
|
||||
--exclude='*.pyc' \
|
||||
--exclude=_build/ \
|
||||
--exclude=build/ \
|
||||
--exclude=dist/ \
|
||||
--exclude=node_modules/ \
|
||||
--exclude=.venv/ \
|
||||
--exclude=venv/ \
|
||||
--exclude=.DS_Store \
|
||||
--exclude='*.engine' \
|
||||
"${REPO_ROOT}/" "${SSH_ALIAS}:${REMOTE_DIR}/"
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Step 2: build the e2e-runner image on the Jetson
|
||||
|
||||
# The image MUST be built on the Jetson — see Dockerfile.jetson comment
|
||||
# about Tegra-specific libs.
|
||||
echo "[run-tests-jetson] docker compose build e2e-runner (on Jetson)"
|
||||
# shellcheck disable=SC2087 # we want the heredoc to expand on the local side
|
||||
ssh "${SSH_ALIAS}" bash -s <<EOF
|
||||
set -euo pipefail
|
||||
cd "${REMOTE_DIR}"
|
||||
docker compose -f "${COMPOSE_FILE}" build e2e-runner
|
||||
EOF
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Step 3: run
|
||||
|
||||
# `--abort-on-container-exit` plus `--exit-code-from e2e-runner` makes
|
||||
# docker-compose propagate the runner's exit code, which we propagate
|
||||
# back to the local terminal via `ssh` returning that code. So `bash
|
||||
# scripts/run-tests-jetson.sh && echo OK` does the right thing locally.
|
||||
echo "[run-tests-jetson] docker compose up e2e-runner (on Jetson)"
|
||||
ssh "${SSH_ALIAS}" bash -s <<EOF
|
||||
set -euo pipefail
|
||||
cd "${REMOTE_DIR}"
|
||||
exec docker compose -f "${COMPOSE_FILE}" up \
|
||||
--abort-on-container-exit \
|
||||
--exit-code-from e2e-runner
|
||||
EOF
|
||||
Reference in New Issue
Block a user