gps-denied-onboard

azaion/gps-denied-onboard

Fork 0

mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 21:01:12 +00:00

Commit Graph

Author	SHA1	Message	Date
Oleksandr Bezdieniezhnykh	58a1678417	[AZ-615] Dockerfile.jetson: fix pip indices + prerelease resolver Three discoveries from on-Jetson build (image builds clean in ~3m18s after fixes; gtsam-4.3a0, torch 2.4.0+cuda, cv2 4.11.0 all import OK inside container running --runtime=nvidia): 1. dustynv/l4t-pytorch's /etc/pip.conf bakes in a local Jetson mirror (jetson.webredirect.org) that's only reachable from the maintainer LAN. pip's DNS lookup fails everywhere else. Wipe the config and pin --index-url to upstream PyPI. 2. The image ships pip 24.2. The SUT's `gtsam<5.0,>=4.2` constraint matches ONLY gtsam-4.3a0 on PyPI (no stable aarch64 wheels), and pip 24.x rejects pre-releases unless --pre is set. The Colima image lands on the same wheel because its pip 26.x has explicit fallback-to-pre-release logic. Bump pip before installing the SUT to align resolver behavior across both harnesses. 3. Skip the [inference] extra entirely — the base image ships Tegra-tuned torch / torchvision that re-pip would clobber with x86 builds lacking cuDNN/cuBLAS for Orin. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 08:02:54 +03:00
Oleksandr Bezdieniezhnykh	6586208f83	[AZ-615] Fix Jetson harness base image (l4t-base/l4t-pytorch tags don't exist) Operator-reported: `nvcr.io/nvidia/l4t-base:r36.4.0` fails to pull. Investigation against the live registries confirmed: * `nvcr.io/nvidia/l4t-base` — deprecated in JetPack 6, no r36 tags (forum thread "L4T Base docker image for Jetpack 6.2 (r36.4.3)", GitHub dusty-nv/jetson-containers#883). * `nvcr.io/nvidia/l4t-pytorch` — no r36 tags at all. Newest is r35.2.1-pth2.0-py3 (too old for our torch>=2.2 floor). * `nvcr.io/nvidia/l4t-jetpack:r36.4.0` — exists but ships no PyTorch. * `dustynv/l4t-pytorch:r36.4.0` (Docker Hub) — exists, ~6.3 GB ARM64, PyTorch + torchvision + opencv pre-baked, maintained by dusty-nv (NVIDIA's Jetson containers maintainer). Switched Dockerfile.jetson base to `dustynv/l4t-pytorch:r36.4.0`. Forward-compatible with the host's R36.5 BSP (NVIDIA containers tolerate one minor BSP ahead on the host side). Setup doc fixes: * smoke-test command now uses `l4t-jetpack:r36.4.0` (the official replacement for the deprecated `l4t-base`) * keygen step explicitly states it produces BOTH halves (private + .pub) in one go * ssh-copy-id + ssh config show how to specify a custom port * troubleshooting table gets a new row for the `l4t-base not found` case so the next dev hits the answer in 30 seconds Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 02:02:26 +03:00
Oleksandr Bezdieniezhnykh	9c13ab3bd0	[AZ-615] [AZ-617] Add Jetson e2e harness + tier2 marks C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime) is CUDA-only by design — `model.half().cuda()` is hard-wired with no CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3 matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch and the pipeline reaches those stages, Colima runs would hard-fail at `.cuda()` instead of cleanly skipping. This commit lays down the Jetson companion harness and wires the existing `tier2` auto-skip: * tests/e2e/Dockerfile.jetson — l4t-pytorch:r36.4.0-pth2.3-py3 base, same /opt layout as the Colima image so AC-4 AST scan + bind mounts work identically. Built ON the Jetson via run-tests-jetson.sh. * docker-compose.test.jetson.yml — mirrors docker-compose.test.yml but with `runtime: nvidia`, GPU device exposure, and GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip). * scripts/run-tests-jetson.sh — rsync → ssh build → ssh up, exit-code-from e2e-runner so the local exit code reflects the remote test verdict. No credentials in the repo; uses `ssh jetson-e2e` alias resolved via ~/.ssh/config. * _docs/03_implementation/jetson_harness_setup.md — one-time SSH key + alias + sshd hardening + GPU verification steps. Documents the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch. AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1, AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py. Reuses the existing tier2 marker + auto-skip in tests/conftest.py (scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9 stay unmarked — they don't touch CUDA. Defers to follow-up Jira: * AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs actually reaching the GPU stage on the Jetson) * AZ-616 — replace mock-sat with real ../satellite-provider service Not run yet: the harness needs operator-side SSH setup to come online before scripts/run-tests-jetson.sh can be executed end-to-end. Setup steps documented in jetson_harness_setup.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 01:57:23 +03:00

Author

SHA1

Message

Date

Oleksandr Bezdieniezhnykh

58a1678417

[AZ-615] Dockerfile.jetson: fix pip indices + prerelease resolver

Three discoveries from on-Jetson build (image builds clean in ~3m18s
after fixes; gtsam-4.3a0, torch 2.4.0+cuda, cv2 4.11.0 all import OK
inside container running --runtime=nvidia):

1. dustynv/l4t-pytorch's /etc/pip.conf bakes in a local Jetson mirror
   (jetson.webredirect.org) that's only reachable from the maintainer
   LAN. pip's DNS lookup fails everywhere else. Wipe the config and
   pin --index-url to upstream PyPI.
2. The image ships pip 24.2. The SUT's `gtsam<5.0,>=4.2` constraint
   matches ONLY gtsam-4.3a0 on PyPI (no stable aarch64 wheels), and
   pip 24.x rejects pre-releases unless --pre is set. The Colima
   image lands on the same wheel because its pip 26.x has explicit
   fallback-to-pre-release logic. Bump pip before installing the SUT
   to align resolver behavior across both harnesses.
3. Skip the [inference] extra entirely — the base image ships
   Tegra-tuned torch / torchvision that re-pip would clobber with
   x86 builds lacking cuDNN/cuBLAS for Orin.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-18 08:02:54 +03:00

Oleksandr Bezdieniezhnykh

6586208f83

[AZ-615] Fix Jetson harness base image (l4t-base/l4t-pytorch tags don't exist)

Operator-reported: `nvcr.io/nvidia/l4t-base:r36.4.0` fails to pull.
Investigation against the live registries confirmed:

  * `nvcr.io/nvidia/l4t-base` — deprecated in JetPack 6, no r36 tags
    (forum thread "L4T Base docker image for Jetpack 6.2 (r36.4.3)",
    GitHub dusty-nv/jetson-containers#883).
  * `nvcr.io/nvidia/l4t-pytorch` — no r36 tags at all. Newest is
    r35.2.1-pth2.0-py3 (too old for our torch>=2.2 floor).
  * `nvcr.io/nvidia/l4t-jetpack:r36.4.0` — exists but ships no PyTorch.
  * `dustynv/l4t-pytorch:r36.4.0` (Docker Hub) — exists, ~6.3 GB ARM64,
    PyTorch + torchvision + opencv pre-baked, maintained by dusty-nv
    (NVIDIA's Jetson containers maintainer).

Switched Dockerfile.jetson base to `dustynv/l4t-pytorch:r36.4.0`.
Forward-compatible with the host's R36.5 BSP (NVIDIA containers
tolerate one minor BSP ahead on the host side).

Setup doc fixes:
  * smoke-test command now uses `l4t-jetpack:r36.4.0` (the official
    replacement for the deprecated `l4t-base`)
  * keygen step explicitly states it produces BOTH halves (private +
    .pub) in one go
  * ssh-copy-id + ssh config show how to specify a custom port
  * troubleshooting table gets a new row for the `l4t-base not found`
    case so the next dev hits the answer in 30 seconds

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-18 02:02:26 +03:00

Oleksandr Bezdieniezhnykh

9c13ab3bd0

[AZ-615] [AZ-617] Add Jetson e2e harness + tier2 marks

C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime)
is CUDA-only by design — `model.half().cuda()` is hard-wired with no
CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3
matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch
and the pipeline reaches those stages, Colima runs would hard-fail at
`.cuda()` instead of cleanly skipping.

This commit lays down the Jetson companion harness and wires the
existing `tier2` auto-skip:

  * tests/e2e/Dockerfile.jetson  — l4t-pytorch:r36.4.0-pth2.3-py3 base,
    same /opt layout as the Colima image so AC-4 AST scan + bind mounts
    work identically. Built ON the Jetson via run-tests-jetson.sh.
  * docker-compose.test.jetson.yml — mirrors docker-compose.test.yml
    but with `runtime: nvidia`, GPU device exposure, and
    GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip).
  * scripts/run-tests-jetson.sh — rsync → ssh build → ssh up,
    exit-code-from e2e-runner so the local exit code reflects the
    remote test verdict. No credentials in the repo; uses
    `ssh jetson-e2e` alias resolved via ~/.ssh/config.
  * _docs/03_implementation/jetson_harness_setup.md — one-time SSH
    key + alias + sshd hardening + GPU verification steps. Documents
    the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch.

AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1,
AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py.
Reuses the existing tier2 marker + auto-skip in tests/conftest.py
(scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9
stay unmarked — they don't touch CUDA.

Defers to follow-up Jira:

  * AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs
    actually reaching the GPU stage on the Jetson)
  * AZ-616 — replace mock-sat with real ../satellite-provider service

Not run yet: the harness needs operator-side SSH setup to come online
before scripts/run-tests-jetson.sh can be executed end-to-end. Setup
steps documented in jetson_harness_setup.md.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-18 01:57:23 +03:00

3 Commits