diff --git a/_docs/02_document/deployment/provisioning_runbook.md b/_docs/02_document/deployment/provisioning_runbook.md index 29d4ce1..9149434 100644 --- a/_docs/02_document/deployment/provisioning_runbook.md +++ b/_docs/02_document/deployment/provisioning_runbook.md @@ -1,102 +1,222 @@ # Jetson device provisioning runbook -This runbook describes the end-to-end flow to fuse, flash, provision a device identity, and reach a state where the Azaion Loader can authenticate against the admin/resource APIs. It targets a Jetson Orin Nano class device; adapt paths and NVIDIA bundle versions to your manufacturing image. +This runbook describes the end-to-end flow to fuse, flash, and provision device identities so the Azaion Loader can authenticate against the admin/resource APIs. It supports Jetson Orin Nano, Orin NX 8GB, and Orin NX 16GB devices. Board configuration is auto-detected from the USB product ID. + +The `scripts/provision_devices.sh` script automates the entire flow: detecting connected Jetsons, auto-installing L4T if needed, setting up Docker with the Loader container, optionally hardening the OS, registering device identities via the admin API, writing credentials, fusing, and flashing. + +After provisioning, each Jetson boots into a production-ready state with Docker Compose running the Loader container. ## Prerequisites -- Provisioning workstation with bash, curl, openssl, python3, and USB/network access to the Jetson in recovery or mass-storage mode as required by your flash tools. -- Admin API reachable from the workstation (base URL, for example `https://admin.internal.example.com`). -- NVIDIA Jetson Linux Driver Package (L4T) and flash scripts for your SKU (for example `odmfuse.sh`, `flash.sh` from the board support package). -- Root filesystem staging directory on the workstation that will be merged into the image before `flash.sh` (often a `Linux_for_Tegra/rootfs/` tree or an extracted sample rootfs overlay). +- Ubuntu amd64 provisioning workstation with bash, curl, jq, wget, lsusb. +- Admin API reachable from the workstation (base URL configured in `scripts/.env`). +- An ApiAdmin account on the admin API (email and password in `scripts/.env`). +- `sudo` access on the workstation. +- USB-C cables that support both power and data transfer. +- Physical label/sticker materials for serial numbers. +- Internet access on first run (to download L4T BSP if not already installed). +- Loader Docker image tar file (see [Preparing the Loader image](#preparing-the-loader-image)). -## Admin API contract (provisioning) +The NVIDIA L4T BSP and sample rootfs are downloaded and installed automatically to `/opt/nvidia/Linux_for_Tegra` if not already present. No manual L4T setup is required. -The `scripts/provision_device.sh` script expects: +## Configuration -1. **POST** `{admin_base}/users` with JSON body `{"email":"","password":"","role":"CompanionPC"}` - - **201** or **200**: user created. - - **409**: user with this email already exists (idempotent re-run). +Copy `scripts/.env.example` to `scripts/.env` and fill in values: -2. **PATCH** `{admin_base}/users/password` with JSON body `{"email":"","password":""}` - - Used when POST returns **409** so the password in `device.conf` matches the account after re-provisioning. - - **200** or **204**: password updated. +``` +ADMIN_EMAIL=admin@azaion.com +ADMIN_PASSWORD= +API_URL=https://admin.azaion.com +LOADER_IMAGE_TAR=/path/to/loader-image.tar +``` -Adjust URL paths or JSON field names in the script if your deployment uses a different but equivalent contract. +Optional overrides (auto-detected/defaulted if omitted): + +``` +L4T_VERSION=r36.4.4 +L4T_DIR=/opt/nvidia/Linux_for_Tegra +ROOTFS_DIR=/opt/nvidia/Linux_for_Tegra/rootfs +RESOURCE_API_URL=https://admin.azaion.com +LOADER_DEV_STAGE=main +LOADER_IMAGE=localhost:5000/loader:arm +FLASH_TARGET=nvme0n1p1 +HARDEN=true +``` + +The `.env` file is git-ignored and must not be committed. + +## Preparing the Loader image + +The provisioning script requires a Loader Docker image tar to pre-load onto each device. Options: + +**From CI (recommended):** Download the `loader-image.tar` artifact from the Woodpecker CI pipeline for the target branch. + +**Local build (requires arm64 builder or BuildKit cross-compilation):** + +```bash +docker build -f Dockerfile -t localhost:5000/loader:arm . +docker save localhost:5000/loader:arm -o loader-image.tar +``` + +Set `LOADER_IMAGE_TAR` in `.env` to the absolute path of the resulting tar file. + +## Supported devices + +| USB Product ID | Model | Board Config (auto-detected) | +| --- | --- | --- | +| 0955:7523 | Jetson Orin Nano | jetson-orin-nano-devkit | +| 0955:7323 | Jetson Orin NX 16GB | jetson-orin-nx-devkit | +| 0955:7423 | Jetson Orin NX 8GB | jetson-orin-nx-devkit | + +The script scans for all NVIDIA USB devices (`lsusb -d 0955:`), matches them against the table above, and displays the model name next to each detected device. + +## Admin API contract (device registration) + +The script calls: + +1. **POST** `{API_URL}/login` with `{"email":"","password":""}` to obtain a JWT. +2. **POST** `{API_URL}/devices` with `Authorization: Bearer ` and no request body. + - **200** or **201**: returns `{"serial":"azj-NNNN","email":"azj-NNNN@azaion.com","password":"<32-hex-chars>"}`. + - The server auto-assigns the next sequential serial number. ## Device identity and `device.conf` -For serial **AZJN-0042**, the script creates email **azaion-jetson-0042@azaion.com** (suffix is the segment after the last hyphen in the serial, lowercased). The password is 32 hexadecimal characters from `openssl rand -hex 16`. +For each registered device, the script writes: -The script writes: +`{ROOTFS_DIR}/etc/azaion/device.conf` -`{rootfs_staging}/etc/azaion/device.conf` +On the flashed device this becomes `/etc/azaion/device.conf` with: -On the flashed device this becomes **`/etc/azaion/device.conf`** with: +- `AZAION_DEVICE_EMAIL=azj-NNNN@azaion.com` +- `AZAION_DEVICE_PASSWORD=<32-hex-chars>` -- `AZAION_DEVICE_EMAIL=...` -- `AZAION_DEVICE_PASSWORD=...` +File permissions are set to **600**. -File permissions on the staging file are set to **600**. Ensure your image build preserves ownership and permissions appropriate for the service user that runs the Loader. +## Docker and application setup + +The `scripts/setup_rootfs_docker.sh` script prepares the rootfs before flashing. It runs automatically as part of `provision_devices.sh`. What it installs: + +| Component | Details | +| --- | --- | +| Docker Engine + Compose plugin | Installed via apt in chroot from Docker's official repository | +| NVIDIA Container Toolkit | GPU passthrough for containers; nvidia set as default runtime | +| Production compose file | `/opt/azaion/docker-compose.yml` — defines the `loader` service | +| Loader image | Pre-loaded from `LOADER_IMAGE_TAR` at `/opt/azaion/loader-image.tar` | +| Boot service | `azaion-loader.service` — loads the image tar on first boot, starts compose | + +### Device filesystem layout after flash + +``` +/etc/azaion/device.conf Per-device credentials +/etc/docker/daemon.json Docker config (NVIDIA default runtime) +/opt/azaion/docker-compose.yml Production compose file +/opt/azaion/boot.sh Boot startup script +/opt/azaion/loader-image.tar Initial Loader image (deleted after first boot) +/opt/azaion/models/ Model storage +/opt/azaion/state/ Update manager state +/etc/systemd/system/azaion-loader.service Systemd unit +``` + +### First boot sequence + +1. systemd starts `docker.service` +2. `azaion-loader.service` runs `/opt/azaion/boot.sh` +3. `boot.sh` runs `docker load -i /opt/azaion/loader-image.tar` (first boot only), then deletes the tar +4. `boot.sh` runs `docker compose -f /opt/azaion/docker-compose.yml up -d` +5. The Loader container starts, reads `/etc/azaion/device.conf`, authenticates with the API +6. The update manager begins polling for updates + +## Security hardening + +The `scripts/harden_rootfs.sh` script applies production security hardening to the rootfs. It runs automatically unless `--no-harden` is passed. + +| Measure | Details | +| --- | --- | +| SSH disabled | `sshd.service` and `ssh.service` masked; `sshd_config` removed | +| Getty masked | `getty@.service` and `serial-getty@.service` masked — no login prompt | +| Serial console disabled | `console=ttyTCU0` / `console=ttyS0` removed from `extlinux.conf` | +| Sysctl hardening | ptrace blocked, core dumps disabled, kernel pointers hidden, ICMP redirects off | +| Root locked | Root account password-locked in `/etc/shadow` | + +To provision without hardening (e.g. for development devices): + +```bash +./scripts/provision_devices.sh --no-harden +``` + +Or set `HARDEN=false` in `.env`. ## Step-by-step flow -### 1. Unbox and record the serial +### 1. Connect Jetsons in recovery mode -Read the manufacturing label or use your factory barcode process. Example serial: `AZJN-0042`. +Connect one or more Jetson devices via USB-C. Put each device into recovery mode: hold Force Recovery button, press Power, release Power, then release Force Recovery after 2 seconds. -### 2. Fuse (if your product requires it) +Verify with `lsusb -d 0955:` -- each recovery-mode Jetson appears as `NVIDIA Corp. APX`. -Run your approved **fuse** workflow (for example NVIDIA `odmfuse.sh` or internal wrapper). This task does not replace secure boot or fTPM scripts; complete them per your security phase checklist before or after provisioning, according to your process. +### 2. Run the provisioning script -### 3. Prepare the rootfs staging tree - -Extract or sync the rootfs you will flash into a directory on the workstation, for example: - -`/work/images/orin-nano/rootfs-staging/` - -Ensure `etc/` exists or can be created under this tree. - -### 4. Provision the CompanionPC user and embed credentials - -From the Loader repository root (or using an absolute path to the script): +From the loader repository root: ```bash -./scripts/provision_device.sh \ - --serial AZJN-0042 \ - --api-url "https://admin.internal.example.com" \ - --rootfs-dir "/work/images/orin-nano/rootfs-staging" +./scripts/provision_devices.sh ``` -Confirm the script prints success and that `rootfs-staging/etc/azaion/device.conf` exists. +The script will: -Re-running the same command for the same serial must not create a duplicate user; the script updates the password via **PATCH** when POST returns **409**. +1. **Install dependencies** -- installs lsusb, curl, jq, wget via apt; adds `qemu-user-static` and `binfmt-support` on x86 hosts for cross-arch chroot. +2. **Install L4T** -- if L4T BSP is not present at `L4T_DIR`, downloads the BSP and sample rootfs, extracts them, and runs `apply_binaries.sh`. This only happens on first run. +3. **Set up Docker** -- installs Docker Engine, NVIDIA Container Toolkit, compose file, and Loader image into the rootfs via chroot (`setup_rootfs_docker.sh`). +4. **Harden OS** (unless `--no-harden`) -- disables SSH, getty, serial console, applies sysctl hardening (`harden_rootfs.sh`). +5. **Authenticate** -- logs in to the admin API to get a JWT. +6. **Scan USB** -- detects all supported Jetson devices in recovery mode, displays model names. +7. **Display selection UI** -- lists detected devices with numbers and model type. +8. **Prompt for selection** -- enter device numbers (e.g. `1 3 4`), or `0` for all. -If the admin API requires authentication (Bearer token, mTLS), extend the script or shell wrapper to pass the required `curl` headers or use a local proxy; the stock script assumes network-restricted admin access without extra headers. +### 3. Per-device provisioning (automatic) -### 5. Flash the device +For each selected device, the script runs sequentially: -Run your normal **flash** procedure (for example `flash.sh` or SDK Manager) so the staged rootfs—including `etc/azaion/device.conf`—is written to the device storage. +1. **Register** -- calls `POST /devices` to get server-assigned serial, email, and password. +2. **Write device.conf** -- embeds credentials in the rootfs staging directory. +3. **Fuse** -- runs `odmfuse.sh` targeting the specific USB device instance. Board config is auto-detected from the USB product ID. +4. **Power-cycle prompt** -- asks the admin to power-cycle the device and re-enter recovery mode. +5. **Flash** -- runs `flash.sh` with the auto-detected board config to write the rootfs (including `device.conf`, Docker, and application files) to the device. Default target is `nvme0n1p1` (NVMe SSD); override with `FLASH_TARGET` in `.env` (e.g. `mmcblk0p1` for eMMC). +6. **Sticker prompt** -- displays the assigned serial and asks the admin to apply a physical label. -### 6. First boot +### 4. Apply serial labels -Power the Jetson, complete first-boot configuration if any, and verify the Loader service starts. The Loader should read `AZAION_DEVICE_EMAIL` and `AZAION_DEVICE_PASSWORD` from `/etc/azaion/device.conf`, then use them when calling **POST /login** on the Loader HTTP API (which forwards credentials to the configured resource API per your deployment). After a successful login path, the device can request resources and unlock flows as designed. +After each device is flashed, the script prints the assigned serial (e.g. `azj-0042`). Apply a label/sticker with this serial to the device enclosure for physical identification. -### 7. Smoke verification +### 5. First boot -- From another host: Loader **GET /health** returns healthy. -- **POST /login** on the Loader with the same email and password as in `device.conf` returns success (for example `{"status":"ok"}` in the reference implementation). -- Optional: trigger your normal resource or unlock smoke test against a staging API. +Power the Jetson. Docker starts automatically, loads the Loader image, and starts Docker Compose. The Loader service reads `AZAION_DEVICE_EMAIL` and `AZAION_DEVICE_PASSWORD` from `/etc/azaion/device.conf` and uses them to authenticate with the admin API via `POST /login`. The update manager begins checking for updates. + +### 6. Smoke verification + +- From another host: Loader `GET /health` on port 8080 returns healthy. +- `docker ps` on the device (if unhardened) shows the loader container running. +- Optional: trigger a resource or unlock smoke test against a staging API. ## Troubleshooting | Symptom | Check | -|--------|--------| -| curl fails to reach admin API | DNS, VPN, firewall, and `API_URL` trailing slash (script strips one trailing slash). | -| HTTP 4xx/5xx from POST /users | Admin logs; confirm role value **CompanionPC** and email uniqueness rules. | -| 409 then failure on PATCH | Implement or enable **PATCH /users/password** (or change script to match your upsert API). | -| Loader cannot log in | `device.conf` path, permissions, and that the password in the file matches the account after the last successful provision. | +| --- | --- | +| No devices found by script | USB cables, recovery mode entry sequence, `lsusb -d 0955:` | +| Unknown product ID warning | Device is an NVIDIA USB device but not in the supported models table. Check SKU. | +| L4T download fails | Internet access, NVIDIA download servers availability, `L4T_VERSION` value | +| Login fails (HTTP 401) | `ADMIN_EMAIL` and `ADMIN_PASSWORD` in `.env`; account must have ApiAdmin role | +| POST /devices fails | Admin API logs; ensure AZ-196 endpoint is deployed | +| Fuse fails | L4T version compatibility, USB connection stability, sudo access | +| Flash fails | Rootfs contents, USB device still in recovery mode after power-cycle, verify `FLASH_TARGET` matches your storage (NVMe vs eMMC) | +| Docker setup fails in chroot | Verify `qemu-user-static` was installed (auto-installed on x86 hosts); check internet in chroot | +| Loader container not starting | Check `docker logs` on device; verify `/etc/azaion/device.conf` exists and has correct permissions | +| Loader cannot log in after boot | `device.conf` path and permissions; password must match the account created by POST /devices | +| Cannot SSH to hardened device | Expected behavior. Use `--no-harden` for dev devices, or reflash with USB recovery mode | ## Security notes - Treat `device.conf` as a secret at rest; restrict file permissions and disk encryption per your product policy. +- The `.env` file contains ApiAdmin credentials -- do not commit it. It is listed in `.gitignore`. - Prefer short-lived credentials or key rotation if the admin API supports it; this runbook describes the baseline manufacturing flow. +- Hardened devices have no SSH, no serial console, and no interactive login. Field debug requires USB recovery mode reflash with `--no-harden`. diff --git a/scripts/.env.example b/scripts/.env.example new file mode 100644 index 0000000..f0f6726 --- /dev/null +++ b/scripts/.env.example @@ -0,0 +1,23 @@ +API_URL=https://admin.azaion.com +ADMIN_EMAIL=admin@azaion.com +ADMIN_PASSWORD= + +# Path to the Loader Docker image tar (required). +# Build with: docker save localhost:5000/loader:arm -o loader-image.tar +LOADER_IMAGE_TAR= + +# Optional overrides (auto-detected if omitted): +# L4T_VERSION=r36.4.4 +# L4T_DIR=/opt/nvidia/Linux_for_Tegra +# ROOTFS_DIR=/opt/nvidia/Linux_for_Tegra/rootfs + +# Flash target (default: nvme0n1p1 for NVMe SSD, use mmcblk0p1 for eMMC): +# FLASH_TARGET=nvme0n1p1 + +# Loader runtime configuration (defaults shown): +# RESOURCE_API_URL=https://admin.azaion.com +# LOADER_DEV_STAGE=main +# LOADER_IMAGE=localhost:5000/loader:arm + +# Security hardening (default: true). Set to false or use --no-harden flag. +# HARDEN=true diff --git a/scripts/ensure_l4t.sh b/scripts/ensure_l4t.sh new file mode 100755 index 0000000..cfbe1fa --- /dev/null +++ b/scripts/ensure_l4t.sh @@ -0,0 +1,62 @@ +#!/usr/bin/env bash +set -euo pipefail + +L4T_VERSION="${L4T_VERSION:-r36.4.4}" +L4T_DIR="${L4T_DIR:-/opt/nvidia/Linux_for_Tegra}" +ROOTFS_DIR="${ROOTFS_DIR:-${L4T_DIR}/rootfs}" + +l4t_download_url() { + local version="$1" + local major minor patch + IFS='.' read -r major minor patch <<< "${version#r}" + echo "https://developer.nvidia.com/downloads/embedded/l4t/r${major}_release_v${minor}.${patch}/release/Jetson_Linux_${version}_aarch64.tbz2" +} + +rootfs_download_url() { + local version="$1" + local major minor patch + IFS='.' read -r major minor patch <<< "${version#r}" + echo "https://developer.nvidia.com/downloads/embedded/l4t/r${major}_release_v${minor}.${patch}/release/Tegra_Linux_Sample-Root-Filesystem_${version}_aarch64.tbz2" +} + +if [[ -f "$L4T_DIR/flash.sh" ]]; then + echo "L4T BSP already installed at $L4T_DIR" +else + echo "L4T BSP not found at $L4T_DIR" + echo "Downloading and installing L4T $L4T_VERSION..." + + bsp_url="$(l4t_download_url "$L4T_VERSION")" + rootfs_url="$(rootfs_download_url "$L4T_VERSION")" + tmp_dir="$(mktemp -d)" + + echo " Downloading BSP from $bsp_url ..." + wget -q --show-progress -O "$tmp_dir/bsp.tbz2" "$bsp_url" + + echo " Downloading Sample Root Filesystem from $rootfs_url ..." + wget -q --show-progress -O "$tmp_dir/rootfs.tbz2" "$rootfs_url" + + echo " Extracting BSP to $(dirname "$L4T_DIR")/ ..." + sudo mkdir -p "$(dirname "$L4T_DIR")" + sudo tar -xjf "$tmp_dir/bsp.tbz2" -C "$(dirname "$L4T_DIR")" + + echo " Extracting rootfs to $L4T_DIR/rootfs/ ..." + sudo tar -xjf "$tmp_dir/rootfs.tbz2" -C "$L4T_DIR/rootfs/" + + echo " Running apply_binaries.sh ..." + sudo "$L4T_DIR/apply_binaries.sh" + + rm -rf "$tmp_dir" + echo "L4T $L4T_VERSION installed to $L4T_DIR" +fi + +for tool in flash.sh odmfuse.sh; do + if [[ ! -f "$L4T_DIR/$tool" ]]; then + echo "ERROR: $L4T_DIR/$tool not found after L4T setup" >&2 + exit 1 + fi +done + +NV_RELEASE="$L4T_DIR/rootfs/etc/nv_tegra_release" +if [[ -f "$NV_RELEASE" ]]; then + echo "L4T release: $(head -1 "$NV_RELEASE")" +fi diff --git a/scripts/harden_rootfs.sh b/scripts/harden_rootfs.sh new file mode 100755 index 0000000..4ef6cf6 --- /dev/null +++ b/scripts/harden_rootfs.sh @@ -0,0 +1,52 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOTFS="${ROOTFS_DIR:-/opt/nvidia/Linux_for_Tegra/rootfs}" + +if [[ ! -d "$ROOTFS" ]]; then + echo "ERROR: Rootfs directory not found: $ROOTFS" >&2 + exit 1 +fi + +echo "=== Hardening rootfs: $ROOTFS ===" + +echo "[1/5] Disabling SSH..." +for unit in sshd.service ssh.service; do + sudo ln -sf /dev/null "$ROOTFS/etc/systemd/system/$unit" 2>/dev/null || true +done +sudo rm -f "$ROOTFS/etc/ssh/sshd_config" + +echo "[2/5] Masking getty and serial console services..." +for unit in "getty@.service" "serial-getty@.service"; do + sudo ln -sf /dev/null "$ROOTFS/etc/systemd/system/$unit" +done + +echo "[3/5] Disabling serial console in bootloader config..." +EXTLINUX="$ROOTFS/boot/extlinux/extlinux.conf" +if [[ -f "$EXTLINUX" ]]; then + sudo sed -i 's/console=ttyTCU0[^ ]*//' "$EXTLINUX" + sudo sed -i 's/console=ttyS0[^ ]*//' "$EXTLINUX" + sudo sed -i 's/ */ /g' "$EXTLINUX" +fi + +echo "[4/5] Applying sysctl hardening..." +sudo tee "$ROOTFS/etc/sysctl.d/99-azaion-hardening.conf" > /dev/null <<'EOF' +kernel.yama.ptrace_scope = 3 +kernel.core_pattern = |/bin/false +kernel.kptr_restrict = 2 +kernel.dmesg_restrict = 1 +net.ipv4.conf.all.rp_filter = 1 +net.ipv4.conf.default.rp_filter = 1 +net.ipv4.conf.all.accept_redirects = 0 +net.ipv4.conf.default.accept_redirects = 0 +net.ipv4.conf.all.send_redirects = 0 +net.ipv4.conf.default.send_redirects = 0 +EOF + +echo "[5/5] Locking root account..." +if [[ -f "$ROOTFS/etc/shadow" ]]; then + sudo sed -i 's|^root:[^:]*:|root:!:|' "$ROOTFS/etc/shadow" +fi + +echo "" +echo "Hardening complete." diff --git a/scripts/provision_devices.sh b/scripts/provision_devices.sh new file mode 100755 index 0000000..9f6cadb --- /dev/null +++ b/scripts/provision_devices.sh @@ -0,0 +1,292 @@ +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +ENV_FILE="$SCRIPT_DIR/.env" + +HARDEN="${HARDEN:-true}" +while [[ $# -gt 0 ]]; do + case "$1" in + --no-harden) HARDEN="false"; shift ;; + *) echo "ERROR: Unknown option: $1" >&2; exit 1 ;; + esac +done + +NVIDIA_VENDOR="0955" + +declare -A PID_TO_MODEL=( + ["7523"]="Orin Nano" + ["7323"]="Orin NX 16GB" + ["7423"]="Orin NX 8GB" +) + +declare -A PID_TO_BOARD_CONFIG=( + ["7523"]="jetson-orin-nano-devkit" + ["7323"]="jetson-orin-nx-devkit" + ["7423"]="jetson-orin-nx-devkit" +) + +require_env_var() { + local name="$1" + local val="${!name:-}" + if [[ -z "$val" ]]; then + echo "ERROR: $name is not set in $ENV_FILE" >&2 + exit 1 + fi +} + +api_post() { + local url="$1"; shift + local response + response="$(curl -sS -w "\n%{http_code}" -X POST "$url" -H "Content-Type: application/json" "$@")" + echo "$response" +} + +provision_single_device() { + local device_line="$1" + local usb_id="$2" + local board_config="$3" + + echo "[Step 1/5] Registering device with admin API..." + local reg_response reg_http reg_body + reg_response="$(api_post "${API_URL}/devices" -H "Authorization: Bearer $TOKEN")" + reg_http="$(echo "$reg_response" | tail -1)" + reg_body="$(echo "$reg_response" | sed '$d')" + + if [[ "$reg_http" != "200" && "$reg_http" != "201" ]]; then + echo "ERROR: Device registration failed (HTTP $reg_http)" >&2 + echo "$reg_body" >&2 + RESULTS+=("$usb_id | FAILED | registration error HTTP $reg_http") + return 1 + fi + + local serial dev_email dev_password + serial="$(echo "$reg_body" | jq -r '.serial // .Serial // empty')" + dev_email="$(echo "$reg_body" | jq -r '.email // .Email // empty')" + dev_password="$(echo "$reg_body" | jq -r '.password // .Password // empty')" + + if [[ -z "$serial" || -z "$dev_email" || -z "$dev_password" ]]; then + echo "ERROR: Incomplete response from POST /devices" >&2 + RESULTS+=("$usb_id | FAILED | incomplete API response") + return 1 + fi + + echo " Assigned serial: $serial" + echo " Email: $dev_email" + + echo "[Step 2/5] Writing device.conf to rootfs staging..." + local conf_dir="${ROOTFS_DIR}/etc/azaion" + sudo mkdir -p "$conf_dir" + local conf_path="${conf_dir}/device.conf" + printf 'AZAION_DEVICE_EMAIL=%s\nAZAION_DEVICE_PASSWORD=%s\n' "$dev_email" "$dev_password" \ + | sudo tee "$conf_path" > /dev/null + sudo chmod 600 "$conf_path" + echo " Written: $conf_path" + + echo "[Step 3/5] Fusing device (odmfuse.sh)..." + if ! sudo "$L4T_DIR/odmfuse.sh" --instance "$usb_id" 2>&1; then + echo "ERROR: Fusing failed for $serial ($usb_id)" >&2 + RESULTS+=("$usb_id | $serial | FAILED | fuse error") + return 1 + fi + + echo "" + echo "[$serial] Fuse complete." + read -rp " Power-cycle the device and put it back in recovery mode. Press Enter when ready..." + + echo "[Step 4/5] Flashing device (flash.sh)..." + if ! sudo "$L4T_DIR/flash.sh" "$board_config" "$FLASH_TARGET" --instance "$usb_id" 2>&1; then + echo "ERROR: Flashing failed for $serial ($usb_id)" >&2 + RESULTS+=("$usb_id | $serial | FAILED | flash error") + return 1 + fi + + echo "" + echo "[$serial] Flash complete." + echo " >>> Apply sticker with serial: $serial <<<" + read -rp " Power-cycle for first boot. Press Enter when done..." + + echo "[Step 5/5] $serial provisioned successfully." + RESULTS+=("$usb_id | $serial | OK") +} + +# --- main --- + +if [[ ! -f "$ENV_FILE" ]]; then + echo "ERROR: $ENV_FILE not found. Copy .env.example to .env and fill in values." >&2 + exit 1 +fi + +set -a +source "$ENV_FILE" +set +a + +for var in ADMIN_EMAIL ADMIN_PASSWORD API_URL LOADER_IMAGE_TAR; do + require_env_var "$var" +done +API_URL="${API_URL%/}" + +RESOURCE_API_URL="${RESOURCE_API_URL:-$API_URL}" +LOADER_DEV_STAGE="${LOADER_DEV_STAGE:-main}" +LOADER_IMAGE="${LOADER_IMAGE:-localhost:5000/loader:arm}" +FLASH_TARGET="${FLASH_TARGET:-nvme0n1p1}" + +L4T_VERSION="${L4T_VERSION:-r36.4.4}" +L4T_DIR="${L4T_DIR:-/opt/nvidia/Linux_for_Tegra}" +ROOTFS_DIR="${ROOTFS_DIR:-$L4T_DIR/rootfs}" + +export L4T_VERSION L4T_DIR ROOTFS_DIR RESOURCE_API_URL LOADER_DEV_STAGE LOADER_IMAGE LOADER_IMAGE_TAR + +echo "=== Installing host dependencies ===" +sudo apt-get update -qq +sudo apt-get install -y usbutils curl jq wget +[[ "$(uname -m)" != "aarch64" ]] && sudo apt-get install -y qemu-user-static binfmt-support +echo "" + +echo "=== L4T BSP setup ===" +"$SCRIPT_DIR/ensure_l4t.sh" +echo "" + +echo "=== Setting up rootfs (Docker + application) ===" +"$SCRIPT_DIR/setup_rootfs_docker.sh" +echo "" + +if [[ "$HARDEN" == "true" ]]; then + echo "=== Applying security hardening ===" + "$SCRIPT_DIR/harden_rootfs.sh" + echo "" +else + echo "=== Security hardening SKIPPED (--no-harden) ===" + echo "" +fi + +echo "=== Authenticating with admin API ===" +LOGIN_JSON="$(printf '{"email":"%s","password":"%s"}' "$ADMIN_EMAIL" "$ADMIN_PASSWORD")" +LOGIN_RESPONSE="$(api_post "${API_URL}/login" -d "$LOGIN_JSON")" +LOGIN_HTTP="$(echo "$LOGIN_RESPONSE" | tail -1)" +LOGIN_BODY="$(echo "$LOGIN_RESPONSE" | sed '$d')" + +if [[ "$LOGIN_HTTP" != "200" ]]; then + echo "ERROR: Login failed (HTTP $LOGIN_HTTP)" >&2 + echo "$LOGIN_BODY" >&2 + exit 1 +fi + +TOKEN="$(echo "$LOGIN_BODY" | jq -r '.token // .Token // empty')" +if [[ -z "$TOKEN" ]]; then + echo "ERROR: No token in login response" >&2 + echo "$LOGIN_BODY" >&2 + exit 1 +fi +echo "Authenticated successfully." +echo "" + +echo "=== Scanning for Jetson devices in recovery mode ===" +LSUSB_OUTPUT="$(lsusb -d "${NVIDIA_VENDOR}:" 2>/dev/null || true)" + +if [[ -z "$LSUSB_OUTPUT" ]]; then + echo "No Jetson devices found in recovery mode." + echo "Ensure devices are connected via USB and in recovery mode (hold Force Recovery, press Power)." + exit 0 +fi + +DEVICES=() +DEVICE_PIDS=() +DEVICE_MODELS=() +while IFS= read -r line; do + pid="$(echo "$line" | grep -oP "${NVIDIA_VENDOR}:\K[0-9a-fA-F]+")" + if [[ -n "${PID_TO_BOARD_CONFIG[$pid]:-}" ]]; then + DEVICES+=("$line") + DEVICE_PIDS+=("$pid") + DEVICE_MODELS+=("${PID_TO_MODEL[$pid]:-Unknown (PID $pid)}") + fi +done <<< "$LSUSB_OUTPUT" + +DEVICE_COUNT="${#DEVICES[@]}" + +if [[ "$DEVICE_COUNT" -eq 0 ]]; then + echo "No supported Jetson devices found." + echo "Detected NVIDIA USB devices but none matched known Jetson Orin product IDs." + exit 0 +fi + +echo "" +echo "Select device(s) to provision." +echo " one device, e.g. 1" +echo " some devices, e.g. 1 3 4" +echo " or all devices: 0" +echo "" +echo "--------------------------------------------" +echo "Connected Jetson devices (recovery mode):" +echo "--------------------------------------------" +for i in "${!DEVICES[@]}"; do + num=$((i + 1)) + printf "%-3s %-16s %s\n" "$num" "[${DEVICE_MODELS[$i]}]" "${DEVICES[$i]}" +done +echo "--------------------------------------------" +echo "0 - provision all devices" +echo "" + +read -rp "Your selection: " SELECTION + +SELECTED_INDICES=() +if [[ "$SELECTION" == "0" ]]; then + for i in "${!DEVICES[@]}"; do + SELECTED_INDICES+=("$i") + done +else + for num in $SELECTION; do + if [[ "$num" =~ ^[0-9]+$ ]] && (( num >= 1 && num <= DEVICE_COUNT )); then + SELECTED_INDICES+=("$((num - 1))") + else + echo "ERROR: Invalid selection: $num (must be 1-$DEVICE_COUNT or 0 for all)" >&2 + exit 1 + fi + done +fi + +if [[ ${#SELECTED_INDICES[@]} -eq 0 ]]; then + echo "No devices selected." + exit 0 +fi + +echo "" +echo "=== Provisioning ${#SELECTED_INDICES[@]} device(s) ===" +echo "" + +RESULTS=() + +for idx in "${SELECTED_INDICES[@]}"; do + DEVICE_LINE="${DEVICES[$idx]}" + USB_ID="$(echo "$DEVICE_LINE" | grep -oP 'Bus \K[0-9]+')-$(echo "$DEVICE_LINE" | grep -oP 'Device \K[0-9]+')" + BOARD_CONFIG="${PID_TO_BOARD_CONFIG[${DEVICE_PIDS[$idx]}]:-}" + if [[ -z "$BOARD_CONFIG" ]]; then + echo "ERROR: Unknown Jetson product ID: $NVIDIA_VENDOR:${DEVICE_PIDS[$idx]}" >&2 + RESULTS+=("$USB_ID | FAILED | unknown product ID") + continue + fi + echo "--------------------------------------------" + echo "Device: ${DEVICE_MODELS[$idx]} — $DEVICE_LINE" + echo "USB instance: $USB_ID" + echo "Board config: $BOARD_CONFIG" + echo "--------------------------------------------" + + provision_single_device "$DEVICE_LINE" "$USB_ID" "$BOARD_CONFIG" || true + echo "" +done + +CONF_CLEANUP="${ROOTFS_DIR}/etc/azaion/device.conf" +if [[ -f "$CONF_CLEANUP" ]]; then + sudo rm -f "$CONF_CLEANUP" +fi + +echo "" +echo "========================================" +echo " Provisioning Summary" +echo "========================================" +printf "%-12s | %-10s | %s\n" "USB ID" "Serial" "Status" +echo "----------------------------------------" +for r in "${RESULTS[@]}"; do + echo "$r" +done +echo "========================================" diff --git a/scripts/setup_rootfs_docker.sh b/scripts/setup_rootfs_docker.sh new file mode 100755 index 0000000..2ff94a8 --- /dev/null +++ b/scripts/setup_rootfs_docker.sh @@ -0,0 +1,179 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOTFS="${ROOTFS_DIR:-/opt/nvidia/Linux_for_Tegra/rootfs}" +LOADER_IMAGE_TAR="${LOADER_IMAGE_TAR:-}" +RESOURCE_API_URL="${RESOURCE_API_URL:-https://api.azaion.com}" +LOADER_DEV_STAGE="${LOADER_DEV_STAGE:-main}" +LOADER_IMAGE="${LOADER_IMAGE:-localhost:5000/loader:arm}" + +if [[ ! -d "$ROOTFS" ]]; then + echo "ERROR: Rootfs directory not found: $ROOTFS" >&2 + exit 1 +fi + +if [[ -z "$LOADER_IMAGE_TAR" ]]; then + echo "ERROR: LOADER_IMAGE_TAR not set. Set it in .env to the Loader Docker image tar path." >&2 + exit 1 +fi + +if [[ ! -f "$LOADER_IMAGE_TAR" ]]; then + echo "ERROR: Loader image tar not found: $LOADER_IMAGE_TAR" >&2 + exit 1 +fi + +cleanup_mounts() { + for mp in proc sys dev/pts dev; do + sudo umount "$ROOTFS/$mp" 2>/dev/null || true + done + if [[ -f "$ROOTFS/etc/resolv.conf.setup-bak" ]]; then + sudo mv "$ROOTFS/etc/resolv.conf.setup-bak" "$ROOTFS/etc/resolv.conf" + fi +} + +setup_mounts() { + for mp in proc sys dev dev/pts; do + mountpoint -q "$ROOTFS/$mp" 2>/dev/null && sudo umount "$ROOTFS/$mp" 2>/dev/null || true + done + sudo mount --bind /proc "$ROOTFS/proc" + sudo mount --bind /sys "$ROOTFS/sys" + sudo mount --bind /dev "$ROOTFS/dev" + sudo mount --bind /dev/pts "$ROOTFS/dev/pts" + if [[ -f "$ROOTFS/etc/resolv.conf" ]]; then + sudo cp "$ROOTFS/etc/resolv.conf" "$ROOTFS/etc/resolv.conf.setup-bak" + fi + sudo cp /etc/resolv.conf "$ROOTFS/etc/resolv.conf" +} + +if [[ "$(uname -m)" != "aarch64" ]]; then + if [[ ! -f "$ROOTFS/usr/bin/qemu-aarch64-static" ]]; then + sudo cp /usr/bin/qemu-aarch64-static "$ROOTFS/usr/bin/" + fi +fi + +trap cleanup_mounts EXIT + +echo "=== Setting up Docker in rootfs ===" +echo " Rootfs: $ROOTFS" +echo " Image tar: $LOADER_IMAGE_TAR" +echo "" + +setup_mounts + +if sudo chroot "$ROOTFS" docker --version &>/dev/null; then + echo "[1/6] Docker already installed, skipping..." +else + echo "[1/6] Installing Docker Engine..." + sudo chroot "$ROOTFS" bash -c ' + apt-get update + apt-get install -y ca-certificates curl gnupg + install -m 0755 -d /etc/apt/keyrings + curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc + chmod a+r /etc/apt/keyrings/docker.asc + . /etc/os-release + echo "deb [arch=arm64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $VERSION_CODENAME stable" > /etc/apt/sources.list.d/docker.list + apt-get update + apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin + apt-get clean + rm -rf /var/lib/apt/lists/* + ' +fi + +if sudo chroot "$ROOTFS" dpkg -l nvidia-container-toolkit 2>/dev/null | grep -q '^ii'; then + echo "[2/6] NVIDIA Container Toolkit already installed, skipping..." +else + echo "[2/6] Installing NVIDIA Container Toolkit..." + sudo chroot "$ROOTFS" bash -c ' + curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ + | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg + curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ + | sed "s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g" \ + > /etc/apt/sources.list.d/nvidia-container-toolkit.list + apt-get update + apt-get install -y nvidia-container-toolkit + apt-get clean + rm -rf /var/lib/apt/lists/* + ' +fi + +echo "[3/6] Configuring Docker daemon (NVIDIA default runtime)..." +sudo mkdir -p "$ROOTFS/etc/docker" +sudo tee "$ROOTFS/etc/docker/daemon.json" > /dev/null <<'EOF' +{ + "default-runtime": "nvidia", + "runtimes": { + "nvidia": { + "path": "nvidia-container-runtime", + "runtimeArgs": [] + } + } +} +EOF + +echo "[4/6] Enabling Docker and containerd services..." +sudo mkdir -p "$ROOTFS/etc/systemd/system/multi-user.target.wants" +sudo ln -sf /lib/systemd/system/docker.service \ + "$ROOTFS/etc/systemd/system/multi-user.target.wants/docker.service" +sudo ln -sf /lib/systemd/system/containerd.service \ + "$ROOTFS/etc/systemd/system/multi-user.target.wants/containerd.service" + +echo "[5/6] Creating Azaion application layout..." +sudo mkdir -p "$ROOTFS/opt/azaion/models" +sudo mkdir -p "$ROOTFS/opt/azaion/state" + +sudo tee "$ROOTFS/opt/azaion/docker-compose.yml" > /dev/null < /dev/null <<'EOF' +#!/bin/bash +set -e +if [ -f /opt/azaion/loader-image.tar ]; then + docker load -i /opt/azaion/loader-image.tar + rm -f /opt/azaion/loader-image.tar +fi +docker compose -f /opt/azaion/docker-compose.yml up -d +EOF +sudo chmod 755 "$ROOTFS/opt/azaion/boot.sh" + +sudo tee "$ROOTFS/etc/systemd/system/azaion-loader.service" > /dev/null <<'EOF' +[Unit] +Description=Azaion Loader +After=docker.service +Requires=docker.service + +[Service] +Type=oneshot +RemainAfterExit=yes +ExecStart=/opt/azaion/boot.sh + +[Install] +WantedBy=multi-user.target +EOF + +sudo ln -sf /etc/systemd/system/azaion-loader.service \ + "$ROOTFS/etc/systemd/system/multi-user.target.wants/azaion-loader.service" + +echo "" +echo "Docker setup complete."