mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 22:01:13 +00:00
9c13ab3bd0
C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime)
is CUDA-only by design — `model.half().cuda()` is hard-wired with no
CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3
matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch
and the pipeline reaches those stages, Colima runs would hard-fail at
`.cuda()` instead of cleanly skipping.
This commit lays down the Jetson companion harness and wires the
existing `tier2` auto-skip:
* tests/e2e/Dockerfile.jetson — l4t-pytorch:r36.4.0-pth2.3-py3 base,
same /opt layout as the Colima image so AC-4 AST scan + bind mounts
work identically. Built ON the Jetson via run-tests-jetson.sh.
* docker-compose.test.jetson.yml — mirrors docker-compose.test.yml
but with `runtime: nvidia`, GPU device exposure, and
GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip).
* scripts/run-tests-jetson.sh — rsync → ssh build → ssh up,
exit-code-from e2e-runner so the local exit code reflects the
remote test verdict. No credentials in the repo; uses
`ssh jetson-e2e` alias resolved via ~/.ssh/config.
* _docs/03_implementation/jetson_harness_setup.md — one-time SSH
key + alias + sshd hardening + GPU verification steps. Documents
the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch.
AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1,
AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py.
Reuses the existing tier2 marker + auto-skip in tests/conftest.py
(scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9
stay unmarked — they don't touch CUDA.
Defers to follow-up Jira:
* AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs
actually reaching the GPU stage on the Jetson)
* AZ-616 — replace mock-sat with real ../satellite-provider service
Not run yet: the harness needs operator-side SSH setup to come online
before scripts/run-tests-jetson.sh can be executed end-to-end. Setup
steps documented in jetson_harness_setup.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
116 lines
4.4 KiB
Bash
Executable File
116 lines
4.4 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# AZ-615: drive the Tier-2 Reality Gate e2e harness on a remote Jetson.
|
|
#
|
|
# Runs from the developer Mac. Assumes:
|
|
# * `ssh jetson-e2e` works via key auth + ~/.ssh/config (see
|
|
# _docs/03_implementation/jetson_harness_setup.md for one-time setup).
|
|
# * The Jetson has docker + nvidia-container-toolkit + ≥ 30 GB free on
|
|
# /var/lib/docker.
|
|
#
|
|
# Flow:
|
|
# 1. rsync the working tree to the Jetson under ~/gps-denied-onboard/
|
|
# (excluding .git, LFS pointers, build artefacts).
|
|
# 2. ssh into the Jetson and `docker compose build` the e2e-runner image
|
|
# against tests/e2e/Dockerfile.jetson.
|
|
# 3. ssh again and `docker compose up --abort-on-container-exit
|
|
# --exit-code-from e2e-runner` so the local exit code reflects the
|
|
# remote test verdict.
|
|
# 4. stdout / stderr stream back to the Mac terminal.
|
|
#
|
|
# Exit code propagates the docker-compose exit code (which == the
|
|
# e2e-runner container's exit code, which == pytest's verdict).
|
|
|
|
set -euo pipefail
|
|
|
|
# ----------------------------------------------------------------------
|
|
# Configuration
|
|
|
|
SSH_ALIAS="${JETSON_SSH_ALIAS:-jetson-e2e}"
|
|
REMOTE_DIR="${JETSON_REMOTE_DIR:-~/gps-denied-onboard}"
|
|
COMPOSE_FILE="docker-compose.test.jetson.yml"
|
|
|
|
# Repo root regardless of where the script is invoked from.
|
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
|
|
|
# ----------------------------------------------------------------------
|
|
# Pre-flight
|
|
|
|
if ! command -v rsync >/dev/null 2>&1; then
|
|
echo "ERROR: rsync not on PATH — install with 'brew install rsync' or apt" >&2
|
|
exit 64
|
|
fi
|
|
|
|
if ! ssh -o BatchMode=yes -o ConnectTimeout=5 "${SSH_ALIAS}" true 2>/dev/null; then
|
|
cat >&2 <<EOF
|
|
ERROR: cannot reach 'ssh ${SSH_ALIAS}' non-interactively. Configure
|
|
~/.ssh/config + agent-based key auth per
|
|
_docs/03_implementation/jetson_harness_setup.md.
|
|
EOF
|
|
exit 65
|
|
fi
|
|
|
|
echo "[run-tests-jetson] using ssh alias: ${SSH_ALIAS}"
|
|
echo "[run-tests-jetson] remote dir: ${REMOTE_DIR}"
|
|
echo "[run-tests-jetson] compose file: ${COMPOSE_FILE}"
|
|
|
|
# ----------------------------------------------------------------------
|
|
# Step 1: sync source
|
|
|
|
# Exclusions kept deliberately narrow — we want the full src/, tests/,
|
|
# _docs/, docker-compose*.yml, scripts/, pyproject.toml. We exclude:
|
|
# * .git — huge, no value on the Jetson
|
|
# * __pycache__ / *.pyc — host-arch bytecode, regenerated on Jetson
|
|
# * _build / build / dist — local CMake / setuptools output trees
|
|
# * node_modules — frontend artefacts, not needed by the harness
|
|
# * .venv / venv — host venv, would clobber the Jetson's Python env
|
|
# * .DS_Store — macOS metadata
|
|
# * *.tlog / *.bin / *.engine — large fixtures that exist on Jetson
|
|
# either via a separate fixture-sync step or are produced by the SUT
|
|
# Git LFS pointers (134 B files) DO transfer — they're text. The
|
|
# Jetson runs `git lfs pull` lazily for any LFS-tracked fixture it
|
|
# actually needs.
|
|
echo "[run-tests-jetson] rsync → ${SSH_ALIAS}:${REMOTE_DIR}/"
|
|
rsync -avz --delete \
|
|
--exclude=.git/ \
|
|
--exclude='__pycache__/' \
|
|
--exclude='*.pyc' \
|
|
--exclude=_build/ \
|
|
--exclude=build/ \
|
|
--exclude=dist/ \
|
|
--exclude=node_modules/ \
|
|
--exclude=.venv/ \
|
|
--exclude=venv/ \
|
|
--exclude=.DS_Store \
|
|
--exclude='*.engine' \
|
|
"${REPO_ROOT}/" "${SSH_ALIAS}:${REMOTE_DIR}/"
|
|
|
|
# ----------------------------------------------------------------------
|
|
# Step 2: build the e2e-runner image on the Jetson
|
|
|
|
# The image MUST be built on the Jetson — see Dockerfile.jetson comment
|
|
# about Tegra-specific libs.
|
|
echo "[run-tests-jetson] docker compose build e2e-runner (on Jetson)"
|
|
# shellcheck disable=SC2087 # we want the heredoc to expand on the local side
|
|
ssh "${SSH_ALIAS}" bash -s <<EOF
|
|
set -euo pipefail
|
|
cd "${REMOTE_DIR}"
|
|
docker compose -f "${COMPOSE_FILE}" build e2e-runner
|
|
EOF
|
|
|
|
# ----------------------------------------------------------------------
|
|
# Step 3: run
|
|
|
|
# `--abort-on-container-exit` plus `--exit-code-from e2e-runner` makes
|
|
# docker-compose propagate the runner's exit code, which we propagate
|
|
# back to the local terminal via `ssh` returning that code. So `bash
|
|
# scripts/run-tests-jetson.sh && echo OK` does the right thing locally.
|
|
echo "[run-tests-jetson] docker compose up e2e-runner (on Jetson)"
|
|
ssh "${SSH_ALIAS}" bash -s <<EOF
|
|
set -euo pipefail
|
|
cd "${REMOTE_DIR}"
|
|
exec docker compose -f "${COMPOSE_FILE}" up \
|
|
--abort-on-container-exit \
|
|
--exit-code-from e2e-runner
|
|
EOF
|