mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 14:41:12 +00:00
9c13ab3bd0
C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime)
is CUDA-only by design — `model.half().cuda()` is hard-wired with no
CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3
matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch
and the pipeline reaches those stages, Colima runs would hard-fail at
`.cuda()` instead of cleanly skipping.
This commit lays down the Jetson companion harness and wires the
existing `tier2` auto-skip:
* tests/e2e/Dockerfile.jetson — l4t-pytorch:r36.4.0-pth2.3-py3 base,
same /opt layout as the Colima image so AC-4 AST scan + bind mounts
work identically. Built ON the Jetson via run-tests-jetson.sh.
* docker-compose.test.jetson.yml — mirrors docker-compose.test.yml
but with `runtime: nvidia`, GPU device exposure, and
GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip).
* scripts/run-tests-jetson.sh — rsync → ssh build → ssh up,
exit-code-from e2e-runner so the local exit code reflects the
remote test verdict. No credentials in the repo; uses
`ssh jetson-e2e` alias resolved via ~/.ssh/config.
* _docs/03_implementation/jetson_harness_setup.md — one-time SSH
key + alias + sshd hardening + GPU verification steps. Documents
the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch.
AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1,
AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py.
Reuses the existing tier2 marker + auto-skip in tests/conftest.py
(scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9
stay unmarked — they don't touch CUDA.
Defers to follow-up Jira:
* AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs
actually reaching the GPU stage on the Jetson)
* AZ-616 — replace mock-sat with real ../satellite-provider service
Not run yet: the harness needs operator-side SSH setup to come online
before scripts/run-tests-jetson.sh can be executed end-to-end. Setup
steps documented in jetson_harness_setup.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
99 lines
3.4 KiB
YAML
99 lines
3.4 KiB
YAML
# Tier-2 e2e harness — Jetson Orin Nano (JetPack 6.x, L4T R36.x).
|
|
#
|
|
# AZ-615: companion compose file to `docker-compose.test.yml` that runs
|
|
# the full Reality Gate on a CUDA-capable host. Used via `ssh jetson-e2e
|
|
# "docker compose -f docker-compose.test.jetson.yml up ..."` driven by
|
|
# `scripts/run-tests-jetson.sh`.
|
|
#
|
|
# Difference vs. docker-compose.test.yml:
|
|
# * `runtime: nvidia` + `gpus: all` on `e2e-runner` so the SUT can
|
|
# resolve `model.half().cuda()` against the Orin GPU.
|
|
# * `GPS_DENIED_TIER=2` — turns OFF the auto-skip for `@pytest.mark.tier2`
|
|
# ACs (see tests/conftest.py:31-44). The heavy ACs (AC-1, AC-2, AC-3,
|
|
# AC-5, AC-6) actually run.
|
|
# * Builds from `tests/e2e/Dockerfile.jetson` (l4t-pytorch base).
|
|
# * Companion / db / mock-sat continue to come from the root
|
|
# `docker-compose.yml` via `extends:` (same as Colima) — they have ARM64
|
|
# tags via the existing build pipeline.
|
|
#
|
|
# Satellite-provider integration (real .NET service at ../satellite-provider/)
|
|
# is tracked separately under AZ-616 and lands as a follow-up patch to this
|
|
# file once the auth + tile-source strategy is decided.
|
|
|
|
services:
|
|
companion:
|
|
extends:
|
|
file: docker-compose.yml
|
|
service: companion
|
|
environment:
|
|
LOG_LEVEL: INFO
|
|
|
|
operator-orchestrator:
|
|
extends:
|
|
file: docker-compose.yml
|
|
service: operator-orchestrator
|
|
|
|
mock-sat:
|
|
extends:
|
|
file: docker-compose.yml
|
|
service: mock-sat
|
|
|
|
db:
|
|
extends:
|
|
file: docker-compose.yml
|
|
service: db
|
|
|
|
e2e-runner:
|
|
build:
|
|
context: .
|
|
dockerfile: tests/e2e/Dockerfile.jetson
|
|
image: gps-denied-onboard/e2e-runner:jetson
|
|
# nvidia-container-runtime exposes the Tegra GPU + libcuda mounts.
|
|
# Without this block the container starts but `torch.cuda.is_available()`
|
|
# returns False and every tier2 AC errors at `.cuda()`.
|
|
runtime: nvidia
|
|
deploy:
|
|
resources:
|
|
reservations:
|
|
devices:
|
|
- driver: nvidia
|
|
count: all
|
|
capabilities: [gpu]
|
|
depends_on:
|
|
companion:
|
|
condition: service_healthy
|
|
mock-sat:
|
|
condition: service_healthy
|
|
db:
|
|
condition: service_healthy
|
|
environment:
|
|
# Same FullSystemConfig env block as Colima — see comments in
|
|
# docker-compose.test.yml for the per-var rationale.
|
|
GPS_DENIED_FC_PROFILE: ardupilot_plane
|
|
# Tier-2 turns OFF the `tier2` / `gpu` auto-skip in tests/conftest.py
|
|
# so the heavy ACs in tests/e2e/replay/test_derkachi_1min.py actually
|
|
# execute. This is the WHOLE POINT of the Jetson harness.
|
|
GPS_DENIED_TIER: "2"
|
|
DB_URL: postgresql://gps_denied:dev@db:5432/gps_denied
|
|
SATELLITE_PROVIDER_URL: http://mock-sat:5100
|
|
COMPANION_URL: http://companion:8080
|
|
CAMERA_CALIBRATION_PATH: /opt/tests/fixtures/calibration/adti26.json
|
|
LOG_LEVEL: INFO
|
|
LOG_SINK: console
|
|
INFERENCE_BACKEND: pytorch_fp16
|
|
FDR_PATH: /var/lib/gps-denied/fdr
|
|
TILE_CACHE_PATH: /var/lib/gps-denied/tiles
|
|
MAVLINK_SIGNING_KEY: /opt/tests/fixtures/mavlink_signing/dev_key
|
|
RUN_REPLAY_E2E: "1"
|
|
BUILD_REPLAY_SINK_JSONL: "ON"
|
|
volumes:
|
|
- ./tests:/opt/tests:ro
|
|
- ./_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro
|
|
- fdr-data:/var/lib/gps-denied/fdr
|
|
- tile-data:/var/lib/gps-denied/tiles
|
|
|
|
volumes:
|
|
db-data: {}
|
|
fdr-data: {}
|
|
tile-data: {}
|