mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 20:31:12 +00:00
9c13ab3bd0
C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime)
is CUDA-only by design — `model.half().cuda()` is hard-wired with no
CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3
matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch
and the pipeline reaches those stages, Colima runs would hard-fail at
`.cuda()` instead of cleanly skipping.
This commit lays down the Jetson companion harness and wires the
existing `tier2` auto-skip:
* tests/e2e/Dockerfile.jetson — l4t-pytorch:r36.4.0-pth2.3-py3 base,
same /opt layout as the Colima image so AC-4 AST scan + bind mounts
work identically. Built ON the Jetson via run-tests-jetson.sh.
* docker-compose.test.jetson.yml — mirrors docker-compose.test.yml
but with `runtime: nvidia`, GPU device exposure, and
GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip).
* scripts/run-tests-jetson.sh — rsync → ssh build → ssh up,
exit-code-from e2e-runner so the local exit code reflects the
remote test verdict. No credentials in the repo; uses
`ssh jetson-e2e` alias resolved via ~/.ssh/config.
* _docs/03_implementation/jetson_harness_setup.md — one-time SSH
key + alias + sshd hardening + GPU verification steps. Documents
the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch.
AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1,
AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py.
Reuses the existing tier2 marker + auto-skip in tests/conftest.py
(scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9
stay unmarked — they don't touch CUDA.
Defers to follow-up Jira:
* AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs
actually reaching the GPU stage on the Jetson)
* AZ-616 — replace mock-sat with real ../satellite-provider service
Not run yet: the harness needs operator-side SSH setup to come online
before scripts/run-tests-jetson.sh can be executed end-to-end. Setup
steps documented in jetson_harness_setup.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
84 lines
3.8 KiB
Docker
84 lines
3.8 KiB
Docker
# Tier-2 e2e-runner image — Jetson Orin Nano (JetPack 6.x, L4T R36.x).
|
|
#
|
|
# AZ-615: companion image to `tests/e2e/Dockerfile` (Colima/Tier-1 smoke
|
|
# harness) that runs the full Reality Gate — including C3 matcher + C7
|
|
# inference — against a CUDA-capable GPU.
|
|
#
|
|
# Hardware contract (operator-confirmed, 2026-05-17):
|
|
# * Jetson Orin Nano, JetPack 6.2.2+b24, L4T R36.5.0
|
|
# * nvidia-container-toolkit ≥ 1.16
|
|
# * `docker run --runtime=nvidia ... nvidia-smi` returns the GPU
|
|
#
|
|
# Image layout mirrors the Colima Dockerfile (so AC-4 AST scan + bind
|
|
# mounts work the same way):
|
|
# /opt/pyproject.toml
|
|
# /opt/src/gps_denied_onboard/... (SUT package, editable install)
|
|
# /opt/tests/... (bind-mounted from host)
|
|
# /opt/_docs/00_problem/input_data/ (bind-mounted from host)
|
|
#
|
|
# Build context is the repo root (see `docker-compose.test.jetson.yml`
|
|
# → `services.e2e-runner.build.context`).
|
|
#
|
|
# BUILD HOST: this image MUST be built ON the Jetson — cross-building
|
|
# from x86 macOS produces images that miss Tegra-specific shared libs
|
|
# the nvidia-container-runtime later mounts at run time.
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Base — l4t-pytorch ships JetPack runtime + PyTorch wheel ready for `.cuda()`
|
|
#
|
|
# Tag selection: NGC publishes l4t-pytorch on a slight lag from L4T BSP
|
|
# releases. With BSP R36.5 on the device, the closest stable NGC tag at
|
|
# author time is `r36.4.0-pth2.3-py3`. NVIDIA containers are
|
|
# forward-compatible across one minor BSP (the container's userspace
|
|
# can be slightly older than the host's L4T kernel). If a `r36.5.0-*`
|
|
# tag is published, prefer it.
|
|
#
|
|
# Image lookup at run time: `docker manifest inspect nvcr.io/nvidia/l4t-pytorch:r36.4.0-pth2.3-py3`
|
|
FROM nvcr.io/nvidia/l4t-pytorch:r36.4.0-pth2.3-py3 AS runtime
|
|
|
|
ARG DEBIAN_FRONTEND=noninteractive
|
|
# System deps mirror tests/e2e/Dockerfile + the Jetson runtime stack:
|
|
# * build-essential / libpq-dev / libspatialindex-dev — same as Colima
|
|
# * python3-pip / python3-venv — l4t-pytorch ships python but not always venv
|
|
# * libgl1 + libglib2.0-0 — OpenCV runtime libs (same reason as Colima)
|
|
# * libpq5 + libspatialindex-c6 — runtime side of psycopg + rtree
|
|
# Note: CUDA / cuDNN / TensorRT come pre-baked in the base image — do NOT
|
|
# attempt to apt-install them (would conflict with the Tegra-specific libs
|
|
# the runtime mounts).
|
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
ca-certificates \
|
|
build-essential \
|
|
libpq-dev \
|
|
libspatialindex-dev \
|
|
libpq5 \
|
|
libspatialindex-c6 \
|
|
libgl1 \
|
|
libglib2.0-0 \
|
|
python3-pip \
|
|
python3-venv \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
WORKDIR /opt
|
|
|
|
# Editable SUT install. Skipping the `[inference]` extra because PyTorch +
|
|
# torchvision are already provided by the l4t-pytorch base image with
|
|
# Tegra-specific CUDA builds; reinstalling them from PyPI would clobber
|
|
# the Tegra wheels with x86-compatible ones that lack the cuDNN / cuBLAS
|
|
# linkage required by Orin.
|
|
COPY pyproject.toml README.md ./
|
|
COPY src ./src
|
|
|
|
# `--break-system-packages` is needed because the l4t-pytorch base image
|
|
# uses an externally-managed Python environment (PEP 668). The alternative
|
|
# would be to layer a venv on top of the pre-installed torch, but that
|
|
# would shadow the Tegra-tuned torch wheel and break `.cuda()`. The image
|
|
# IS the environment; embracing system-pip is the path of least drift.
|
|
RUN pip3 install --no-cache-dir --break-system-packages -e ".[dev]"
|
|
|
|
# ENTRYPOINT mirrors the Colima Dockerfile — pytest discovers both
|
|
# `tests/e2e/replay/` (heavy tier2 ACs run with GPS_DENIED_TIER=2) and
|
|
# any future `tests/e2e/scenarios/` additions. Rootdir resolves to /opt
|
|
# via the COPY'd pyproject.toml so `from tests.e2e.replay._helpers import ...`
|
|
# works inside the test files.
|
|
ENTRYPOINT ["pytest", "-q", "/opt/tests/e2e/"]
|