mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 10:31:13 +00:00
[AZ-615] Step-11 report + state: Jetson harness first end-to-end run
Records the first Jetson Tier-2 run results in the step-11 report:
17 pass / 5 fail / 1 skip / 1 xfail (24 total, 10m09s) — identical to
Colima because all 5 failures hit AZ-614 (tlog time-base mismatch)
BEFORE reaching the GPU. So the infrastructure is proven (image
builds, GPU exposed inside container, SUT subprocess runs to the
auto-sync stage) but the heavy ACs haven't yet exercised ALIKED /
DISK LightGlue. Fixing AZ-614 is the gating prerequisite to actually
drive the GPU stages.
Also captures lessons learned that are now in the setup doc:
* Only dustynv/l4t-pytorch:r36.4.0 is a usable Jetson PyTorch base
on Docker Hub for R36 / JetPack 6 (l4t-base deprecated, official
l4t-pytorch has no R36 tags).
* The dustynv image bakes a maintainer-LAN-only pip mirror into
/etc/pip.conf — must be wiped + --index-url pinned to pypi.org.
* pip 24.2 (image default) rejects gtsam-4.3a0 pre-release; pip 26.x
accepts the same wheel for `gtsam<5.0,>=4.2` because there are no
stable aarch64 builds. Upgrade pip in the build, don't relax pin.
* nvidia-container-runtime mounts nvidia-smi from host, so the GPU
smoke test needs only ubuntu:22.04 (80 MB), not l4t-jetpack (5 GB).
Autodev state advances to phase 7 / jetson-harness-online.
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -314,3 +314,75 @@ This is the same family as H-13 / `AZ-611` (stationary FT-P-01) but on the movin
|
||||
### Reality Gate verdict
|
||||
|
||||
**Cycle-2 verdict for Step 11**: Reality Gate signal is now REAL — the SUT runs end-to-end for ~21 s on the Derkachi fixture and surfaces a real auto-sync bug. Pre-Track 1, the gate was a vacuous "exit 0 with 0 tests collected" that hid every SUT issue. Track 1 was the minimum investment to make the gate honest; future cycles (Track 2 + AZ-614) will turn the failing ACs green.
|
||||
|
||||
## Cycle-2 addendum: Jetson harness brought online (AZ-615)
|
||||
|
||||
The Colima harness above is "Tier-1" — ARM Linux without GPU. The SUT's
|
||||
`pytorch_fp16_runtime` (and `tensorrt_runtime`) hard-code `.cuda()` calls,
|
||||
so anything past auto-sync can ONLY be exercised against a real GPU. The
|
||||
operator's Jetson Orin Nano (JetPack 6.2.2+b24, L4T R36.5.0,
|
||||
nvidia-container-toolkit ≥ 1.16) was wired in as the Tier-2 harness.
|
||||
|
||||
Net-new artifacts (committed under AZ-615):
|
||||
|
||||
* `tests/e2e/Dockerfile.jetson` — `FROM dustynv/l4t-pytorch:r36.4.0` with
|
||||
Tegra-tuned torch / torchvision pre-baked. Wipes the image's stale
|
||||
`/etc/pip.conf` (jetson.webredirect.org is maintainer-LAN only),
|
||||
upgrades pip 24→26 so the `gtsam<5.0,>=4.2` constraint resolves to
|
||||
the only PyPI wheel for aarch64 (`4.3a0`, same as Colima), installs
|
||||
the SUT editable via system-pip + `--break-system-packages`.
|
||||
* `docker-compose.test.jetson.yml` — mirror of `docker-compose.test.yml`
|
||||
with `runtime: nvidia`, `deploy.resources.reservations.devices`, and
|
||||
`GPS_DENIED_TIER: "2"` so the auto-skip hook in `tests/conftest.py`
|
||||
runs the heavy ACs instead of skipping them.
|
||||
* `scripts/run-tests-jetson.sh` — rsync → ssh build → ssh up wrapper.
|
||||
Operator-side SSH alias `jetson-e2e` documented in
|
||||
`_docs/03_implementation/jetson_harness_setup.md`.
|
||||
* `@pytest.mark.tier2` applied to AC-1, AC-2, AC-3, AC-5, AC-6 in
|
||||
`tests/e2e/replay/test_derkachi_1min.py` so the same test file is the
|
||||
source of truth for both harnesses (Colima auto-skips tier2 via the
|
||||
existing `pytest_collection_modifyitems` hook).
|
||||
|
||||
### Jetson smoke run (first end-to-end, 2026-05-18)
|
||||
|
||||
| Outcome | Count | Tests |
|
||||
|---------|-------|-------|
|
||||
| PASSED | 17 | AC-4 AST scan, AC-7 skip-gate, 14× AC-9 helpers |
|
||||
| FAILED | 5 | AC-1, AC-2, AC-5, AC-6 pace-realtime, AC-6 pace-asap |
|
||||
| SKIPPED | 1 | AC-8 (unchanged: D-PROJ-2 mock-sat stub) |
|
||||
| XFAIL | 1 | AC-3 (unchanged: calibration intrinsics unknown) |
|
||||
| **Wall clock** | **10m09s** | (vs ~5m on Colima) |
|
||||
|
||||
**Same 5 failures as Colima, same root cause** (`replay.auto_sync.ac8_validation_failed`,
|
||||
offset_ms=1699999995666). AZ-614 reproduces on Jetson because the synth
|
||||
tlog time-base bug is architecture-independent — heavy ACs die at
|
||||
auto-sync, BEFORE any frame reaches the GPU. So this run validated the
|
||||
infrastructure (image builds, GPU exposed, SUT runs, pytest collects 24)
|
||||
but did NOT yet exercise ALIKED / DISK LightGlue on the actual GPU. The
|
||||
2× wall delta vs Colima is the cost of CUDA + torch + TensorRT
|
||||
initialization in the per-test SUT subprocess.
|
||||
|
||||
**Implication for Track 2**: fixing AZ-614 is the gating prerequisite for
|
||||
ANY Reality-Gate-grade signal from the heavy ACs. Until then, Jetson and
|
||||
Colima are indistinguishable — same green light ACs, same failed heavy
|
||||
ACs. Once AZ-614 lands, the two harnesses divide cleanly: Colima keeps
|
||||
exercising the light path (AC-4 / AC-7 / AC-9 plus auto-sync), Jetson
|
||||
covers the heavy path (AC-1 / AC-2 / AC-5 / AC-6 plus the GPU inference
|
||||
stages they entail).
|
||||
|
||||
### Lessons learned (committed to setup doc)
|
||||
|
||||
* `nvcr.io/nvidia/l4t-base` is deprecated in JetPack 6; `l4t-pytorch`
|
||||
has no R36 tags; `l4t-jetpack:r36.4.0` exists but ships no PyTorch.
|
||||
`dustynv/l4t-pytorch:r36.4.0` (Docker Hub) is the only off-the-shelf
|
||||
Jetson base image with Tegra-tuned PyTorch wheels for R36.
|
||||
* `nvidia-container-runtime` mounts `nvidia-smi` + CUDA libs from the
|
||||
host into any container at runtime, so the GPU-exposure smoke test
|
||||
doesn't need a 5 GB `l4t-jetpack` pull — `ubuntu:22.04 nvidia-smi`
|
||||
(80 MB) suffices.
|
||||
* The dustynv image bakes a private pip mirror into `/etc/pip.conf`;
|
||||
builds in any other network must wipe it AND pin `--index-url` to
|
||||
upstream PyPI.
|
||||
* git LFS-tracked fixtures (the 269 MB Derkachi mp4) must be
|
||||
pre-smudged on the Mac BEFORE the rsync step; otherwise the Jetson
|
||||
receives the 134 B pointer and tests fail at fixture-load.
|
||||
|
||||
Reference in New Issue
Block a user