Files
loader/_docs/02_document/deployment/provisioning_runbook.md
T
Oleksandr Bezdieniezhnykh cfed26ff8c [AZ-187] Docker & hardening
Made-with: Cursor
2026-04-17 18:48:55 +03:00

11 KiB

Jetson device provisioning runbook

This runbook describes the end-to-end flow to fuse, flash, and provision device identities so the Azaion Loader can authenticate against the admin/resource APIs. It supports Jetson Orin Nano, Orin NX 8GB, and Orin NX 16GB devices. Board configuration is auto-detected from the USB product ID.

The scripts/provision_devices.sh script automates the entire flow: detecting connected Jetsons, auto-installing L4T if needed, setting up Docker with the Loader container, optionally hardening the OS, registering device identities via the admin API, writing credentials, fusing, and flashing.

After provisioning, each Jetson boots into a production-ready state with Docker Compose running the Loader container.

Prerequisites

  • Ubuntu amd64 provisioning workstation with bash, curl, jq, wget, lsusb.
  • Admin API reachable from the workstation (base URL configured in scripts/.env).
  • An ApiAdmin account on the admin API (email and password in scripts/.env).
  • sudo access on the workstation.
  • USB-C cables that support both power and data transfer.
  • Physical label/sticker materials for serial numbers.
  • Internet access on first run (to download L4T BSP if not already installed).
  • Loader Docker image tar file (see Preparing the Loader image).

The NVIDIA L4T BSP and sample rootfs are downloaded and installed automatically to /opt/nvidia/Linux_for_Tegra if not already present. No manual L4T setup is required.

Configuration

Copy scripts/.env.example to scripts/.env and fill in values:

ADMIN_EMAIL=admin@azaion.com
ADMIN_PASSWORD=<your ApiAdmin password>
API_URL=https://admin.azaion.com
LOADER_IMAGE_TAR=/path/to/loader-image.tar

Optional overrides (auto-detected/defaulted if omitted):

L4T_VERSION=r36.4.4
L4T_DIR=/opt/nvidia/Linux_for_Tegra
ROOTFS_DIR=/opt/nvidia/Linux_for_Tegra/rootfs
RESOURCE_API_URL=https://admin.azaion.com
LOADER_DEV_STAGE=main
LOADER_IMAGE=localhost:5000/loader:arm
FLASH_TARGET=nvme0n1p1
HARDEN=true

The .env file is git-ignored and must not be committed.

Preparing the Loader image

The provisioning script requires a Loader Docker image tar to pre-load onto each device. Options:

From CI (recommended): Download the loader-image.tar artifact from the Woodpecker CI pipeline for the target branch.

Local build (requires arm64 builder or BuildKit cross-compilation):

docker build -f Dockerfile -t localhost:5000/loader:arm .
docker save localhost:5000/loader:arm -o loader-image.tar

Set LOADER_IMAGE_TAR in .env to the absolute path of the resulting tar file.

Supported devices

USB Product ID Model Board Config (auto-detected)
0955:7523 Jetson Orin Nano jetson-orin-nano-devkit
0955:7323 Jetson Orin NX 16GB jetson-orin-nx-devkit
0955:7423 Jetson Orin NX 8GB jetson-orin-nx-devkit

The script scans for all NVIDIA USB devices (lsusb -d 0955:), matches them against the table above, and displays the model name next to each detected device.

Admin API contract (device registration)

The script calls:

  1. POST {API_URL}/login with {"email":"<admin>","password":"<password>"} to obtain a JWT.
  2. POST {API_URL}/devices with Authorization: Bearer <token> and no request body.
    • 200 or 201: returns {"serial":"azj-NNNN","email":"azj-NNNN@azaion.com","password":"<32-hex-chars>"}.
    • The server auto-assigns the next sequential serial number.

Device identity and device.conf

For each registered device, the script writes:

{ROOTFS_DIR}/etc/azaion/device.conf

On the flashed device this becomes /etc/azaion/device.conf with:

  • AZAION_DEVICE_EMAIL=azj-NNNN@azaion.com
  • AZAION_DEVICE_PASSWORD=<32-hex-chars>

File permissions are set to 600.

Docker and application setup

The scripts/setup_rootfs_docker.sh script prepares the rootfs before flashing. It runs automatically as part of provision_devices.sh. What it installs:

Component Details
Docker Engine + Compose plugin Installed via apt in chroot from Docker's official repository
NVIDIA Container Toolkit GPU passthrough for containers; nvidia set as default runtime
Production compose file /opt/azaion/docker-compose.yml — defines the loader service
Loader image Pre-loaded from LOADER_IMAGE_TAR at /opt/azaion/loader-image.tar
Boot service azaion-loader.service — loads the image tar on first boot, starts compose

Device filesystem layout after flash

/etc/azaion/device.conf                  Per-device credentials
/etc/docker/daemon.json                  Docker config (NVIDIA default runtime)
/opt/azaion/docker-compose.yml           Production compose file
/opt/azaion/boot.sh                      Boot startup script
/opt/azaion/loader-image.tar             Initial Loader image (deleted after first boot)
/opt/azaion/models/                      Model storage
/opt/azaion/state/                       Update manager state
/etc/systemd/system/azaion-loader.service  Systemd unit

First boot sequence

  1. systemd starts docker.service
  2. azaion-loader.service runs /opt/azaion/boot.sh
  3. boot.sh runs docker load -i /opt/azaion/loader-image.tar (first boot only), then deletes the tar
  4. boot.sh runs docker compose -f /opt/azaion/docker-compose.yml up -d
  5. The Loader container starts, reads /etc/azaion/device.conf, authenticates with the API
  6. The update manager begins polling for updates

Security hardening

The scripts/harden_rootfs.sh script applies production security hardening to the rootfs. It runs automatically unless --no-harden is passed.

Measure Details
SSH disabled sshd.service and ssh.service masked; sshd_config removed
Getty masked getty@.service and serial-getty@.service masked — no login prompt
Serial console disabled console=ttyTCU0 / console=ttyS0 removed from extlinux.conf
Sysctl hardening ptrace blocked, core dumps disabled, kernel pointers hidden, ICMP redirects off
Root locked Root account password-locked in /etc/shadow

To provision without hardening (e.g. for development devices):

./scripts/provision_devices.sh --no-harden

Or set HARDEN=false in .env.

Step-by-step flow

1. Connect Jetsons in recovery mode

Connect one or more Jetson devices via USB-C. Put each device into recovery mode: hold Force Recovery button, press Power, release Power, then release Force Recovery after 2 seconds.

Verify with lsusb -d 0955: -- each recovery-mode Jetson appears as NVIDIA Corp. APX.

2. Run the provisioning script

From the loader repository root:

./scripts/provision_devices.sh

The script will:

  1. Install dependencies -- installs lsusb, curl, jq, wget via apt; adds qemu-user-static and binfmt-support on x86 hosts for cross-arch chroot.
  2. Install L4T -- if L4T BSP is not present at L4T_DIR, downloads the BSP and sample rootfs, extracts them, and runs apply_binaries.sh. This only happens on first run.
  3. Set up Docker -- installs Docker Engine, NVIDIA Container Toolkit, compose file, and Loader image into the rootfs via chroot (setup_rootfs_docker.sh).
  4. Harden OS (unless --no-harden) -- disables SSH, getty, serial console, applies sysctl hardening (harden_rootfs.sh).
  5. Authenticate -- logs in to the admin API to get a JWT.
  6. Scan USB -- detects all supported Jetson devices in recovery mode, displays model names.
  7. Display selection UI -- lists detected devices with numbers and model type.
  8. Prompt for selection -- enter device numbers (e.g. 1 3 4), or 0 for all.

3. Per-device provisioning (automatic)

For each selected device, the script runs sequentially:

  1. Register -- calls POST /devices to get server-assigned serial, email, and password.
  2. Write device.conf -- embeds credentials in the rootfs staging directory.
  3. Fuse -- runs odmfuse.sh targeting the specific USB device instance. Board config is auto-detected from the USB product ID.
  4. Power-cycle prompt -- asks the admin to power-cycle the device and re-enter recovery mode.
  5. Flash -- runs flash.sh with the auto-detected board config to write the rootfs (including device.conf, Docker, and application files) to the device. Default target is nvme0n1p1 (NVMe SSD); override with FLASH_TARGET in .env (e.g. mmcblk0p1 for eMMC).
  6. Sticker prompt -- displays the assigned serial and asks the admin to apply a physical label.

4. Apply serial labels

After each device is flashed, the script prints the assigned serial (e.g. azj-0042). Apply a label/sticker with this serial to the device enclosure for physical identification.

5. First boot

Power the Jetson. Docker starts automatically, loads the Loader image, and starts Docker Compose. The Loader service reads AZAION_DEVICE_EMAIL and AZAION_DEVICE_PASSWORD from /etc/azaion/device.conf and uses them to authenticate with the admin API via POST /login. The update manager begins checking for updates.

6. Smoke verification

  • From another host: Loader GET /health on port 8080 returns healthy.
  • docker ps on the device (if unhardened) shows the loader container running.
  • Optional: trigger a resource or unlock smoke test against a staging API.

Troubleshooting

Symptom Check
No devices found by script USB cables, recovery mode entry sequence, lsusb -d 0955:
Unknown product ID warning Device is an NVIDIA USB device but not in the supported models table. Check SKU.
L4T download fails Internet access, NVIDIA download servers availability, L4T_VERSION value
Login fails (HTTP 401) ADMIN_EMAIL and ADMIN_PASSWORD in .env; account must have ApiAdmin role
POST /devices fails Admin API logs; ensure AZ-196 endpoint is deployed
Fuse fails L4T version compatibility, USB connection stability, sudo access
Flash fails Rootfs contents, USB device still in recovery mode after power-cycle, verify FLASH_TARGET matches your storage (NVMe vs eMMC)
Docker setup fails in chroot Verify qemu-user-static was installed (auto-installed on x86 hosts); check internet in chroot
Loader container not starting Check docker logs on device; verify /etc/azaion/device.conf exists and has correct permissions
Loader cannot log in after boot device.conf path and permissions; password must match the account created by POST /devices
Cannot SSH to hardened device Expected behavior. Use --no-harden for dev devices, or reflash with USB recovery mode

Security notes

  • Treat device.conf as a secret at rest; restrict file permissions and disk encryption per your product policy.
  • The .env file contains ApiAdmin credentials -- do not commit it. It is listed in .gitignore.
  • Prefer short-lived credentials or key rotation if the admin API supports it; this runbook describes the baseline manufacturing flow.
  • Hardened devices have no SSH, no serial console, and no interactive login. Field debug requires USB recovery mode reflash with --no-harden.