diff --git a/_docs/02_document/deployment/provisioning_runbook.md b/_docs/02_document/deployment/provisioning_runbook.md index 9149434..4500d6b 100644 --- a/_docs/02_document/deployment/provisioning_runbook.md +++ b/_docs/02_document/deployment/provisioning_runbook.md @@ -1,222 +1,14 @@ -# Jetson device provisioning runbook +# Jetson device provisioning runbook (moved) -This runbook describes the end-to-end flow to fuse, flash, and provision device identities so the Azaion Loader can authenticate against the admin/resource APIs. It supports Jetson Orin Nano, Orin NX 8GB, and Orin NX 16GB devices. Board configuration is auto-detected from the USB product ID. +The provisioning runbook and its scripts (`provision_devices.sh`, `ensure_l4t.sh`, `setup_rootfs_docker.sh`, `harden_rootfs.sh`) no longer live inside the `loader` repository. -The `scripts/provision_devices.sh` script automates the entire flow: detecting connected Jetsons, auto-installing L4T if needed, setting up Docker with the Loader container, optionally hardening the OS, registering device identities via the admin API, writing credentials, fusing, and flashing. +They were relocated to the suite meta-repo because device provisioning is a fleet/manufacturing concern, not a Loader concern — it sets up the base OS, Docker, identity, and hardening that all services share. -After provisioning, each Jetson boots into a production-ready state with Docker Compose running the Loader container. +- **New location:** `suite/provisioning/` +- **Runbook:** `suite/provisioning/README.md` -## Prerequisites +If you are working inside the loader repo in isolation, the scripts are not available here; clone the suite meta-repo to run provisioning. -- Ubuntu amd64 provisioning workstation with bash, curl, jq, wget, lsusb. -- Admin API reachable from the workstation (base URL configured in `scripts/.env`). -- An ApiAdmin account on the admin API (email and password in `scripts/.env`). -- `sudo` access on the workstation. -- USB-C cables that support both power and data transfer. -- Physical label/sticker materials for serial numbers. -- Internet access on first run (to download L4T BSP if not already installed). -- Loader Docker image tar file (see [Preparing the Loader image](#preparing-the-loader-image)). +## Note on Loader retirement -The NVIDIA L4T BSP and sample rootfs are downloaded and installed automatically to `/opt/nvidia/Linux_for_Tegra` if not already present. No manual L4T setup is required. - -## Configuration - -Copy `scripts/.env.example` to `scripts/.env` and fill in values: - -``` -ADMIN_EMAIL=admin@azaion.com -ADMIN_PASSWORD= -API_URL=https://admin.azaion.com -LOADER_IMAGE_TAR=/path/to/loader-image.tar -``` - -Optional overrides (auto-detected/defaulted if omitted): - -``` -L4T_VERSION=r36.4.4 -L4T_DIR=/opt/nvidia/Linux_for_Tegra -ROOTFS_DIR=/opt/nvidia/Linux_for_Tegra/rootfs -RESOURCE_API_URL=https://admin.azaion.com -LOADER_DEV_STAGE=main -LOADER_IMAGE=localhost:5000/loader:arm -FLASH_TARGET=nvme0n1p1 -HARDEN=true -``` - -The `.env` file is git-ignored and must not be committed. - -## Preparing the Loader image - -The provisioning script requires a Loader Docker image tar to pre-load onto each device. Options: - -**From CI (recommended):** Download the `loader-image.tar` artifact from the Woodpecker CI pipeline for the target branch. - -**Local build (requires arm64 builder or BuildKit cross-compilation):** - -```bash -docker build -f Dockerfile -t localhost:5000/loader:arm . -docker save localhost:5000/loader:arm -o loader-image.tar -``` - -Set `LOADER_IMAGE_TAR` in `.env` to the absolute path of the resulting tar file. - -## Supported devices - -| USB Product ID | Model | Board Config (auto-detected) | -| --- | --- | --- | -| 0955:7523 | Jetson Orin Nano | jetson-orin-nano-devkit | -| 0955:7323 | Jetson Orin NX 16GB | jetson-orin-nx-devkit | -| 0955:7423 | Jetson Orin NX 8GB | jetson-orin-nx-devkit | - -The script scans for all NVIDIA USB devices (`lsusb -d 0955:`), matches them against the table above, and displays the model name next to each detected device. - -## Admin API contract (device registration) - -The script calls: - -1. **POST** `{API_URL}/login` with `{"email":"","password":""}` to obtain a JWT. -2. **POST** `{API_URL}/devices` with `Authorization: Bearer ` and no request body. - - **200** or **201**: returns `{"serial":"azj-NNNN","email":"azj-NNNN@azaion.com","password":"<32-hex-chars>"}`. - - The server auto-assigns the next sequential serial number. - -## Device identity and `device.conf` - -For each registered device, the script writes: - -`{ROOTFS_DIR}/etc/azaion/device.conf` - -On the flashed device this becomes `/etc/azaion/device.conf` with: - -- `AZAION_DEVICE_EMAIL=azj-NNNN@azaion.com` -- `AZAION_DEVICE_PASSWORD=<32-hex-chars>` - -File permissions are set to **600**. - -## Docker and application setup - -The `scripts/setup_rootfs_docker.sh` script prepares the rootfs before flashing. It runs automatically as part of `provision_devices.sh`. What it installs: - -| Component | Details | -| --- | --- | -| Docker Engine + Compose plugin | Installed via apt in chroot from Docker's official repository | -| NVIDIA Container Toolkit | GPU passthrough for containers; nvidia set as default runtime | -| Production compose file | `/opt/azaion/docker-compose.yml` — defines the `loader` service | -| Loader image | Pre-loaded from `LOADER_IMAGE_TAR` at `/opt/azaion/loader-image.tar` | -| Boot service | `azaion-loader.service` — loads the image tar on first boot, starts compose | - -### Device filesystem layout after flash - -``` -/etc/azaion/device.conf Per-device credentials -/etc/docker/daemon.json Docker config (NVIDIA default runtime) -/opt/azaion/docker-compose.yml Production compose file -/opt/azaion/boot.sh Boot startup script -/opt/azaion/loader-image.tar Initial Loader image (deleted after first boot) -/opt/azaion/models/ Model storage -/opt/azaion/state/ Update manager state -/etc/systemd/system/azaion-loader.service Systemd unit -``` - -### First boot sequence - -1. systemd starts `docker.service` -2. `azaion-loader.service` runs `/opt/azaion/boot.sh` -3. `boot.sh` runs `docker load -i /opt/azaion/loader-image.tar` (first boot only), then deletes the tar -4. `boot.sh` runs `docker compose -f /opt/azaion/docker-compose.yml up -d` -5. The Loader container starts, reads `/etc/azaion/device.conf`, authenticates with the API -6. The update manager begins polling for updates - -## Security hardening - -The `scripts/harden_rootfs.sh` script applies production security hardening to the rootfs. It runs automatically unless `--no-harden` is passed. - -| Measure | Details | -| --- | --- | -| SSH disabled | `sshd.service` and `ssh.service` masked; `sshd_config` removed | -| Getty masked | `getty@.service` and `serial-getty@.service` masked — no login prompt | -| Serial console disabled | `console=ttyTCU0` / `console=ttyS0` removed from `extlinux.conf` | -| Sysctl hardening | ptrace blocked, core dumps disabled, kernel pointers hidden, ICMP redirects off | -| Root locked | Root account password-locked in `/etc/shadow` | - -To provision without hardening (e.g. for development devices): - -```bash -./scripts/provision_devices.sh --no-harden -``` - -Or set `HARDEN=false` in `.env`. - -## Step-by-step flow - -### 1. Connect Jetsons in recovery mode - -Connect one or more Jetson devices via USB-C. Put each device into recovery mode: hold Force Recovery button, press Power, release Power, then release Force Recovery after 2 seconds. - -Verify with `lsusb -d 0955:` -- each recovery-mode Jetson appears as `NVIDIA Corp. APX`. - -### 2. Run the provisioning script - -From the loader repository root: - -```bash -./scripts/provision_devices.sh -``` - -The script will: - -1. **Install dependencies** -- installs lsusb, curl, jq, wget via apt; adds `qemu-user-static` and `binfmt-support` on x86 hosts for cross-arch chroot. -2. **Install L4T** -- if L4T BSP is not present at `L4T_DIR`, downloads the BSP and sample rootfs, extracts them, and runs `apply_binaries.sh`. This only happens on first run. -3. **Set up Docker** -- installs Docker Engine, NVIDIA Container Toolkit, compose file, and Loader image into the rootfs via chroot (`setup_rootfs_docker.sh`). -4. **Harden OS** (unless `--no-harden`) -- disables SSH, getty, serial console, applies sysctl hardening (`harden_rootfs.sh`). -5. **Authenticate** -- logs in to the admin API to get a JWT. -6. **Scan USB** -- detects all supported Jetson devices in recovery mode, displays model names. -7. **Display selection UI** -- lists detected devices with numbers and model type. -8. **Prompt for selection** -- enter device numbers (e.g. `1 3 4`), or `0` for all. - -### 3. Per-device provisioning (automatic) - -For each selected device, the script runs sequentially: - -1. **Register** -- calls `POST /devices` to get server-assigned serial, email, and password. -2. **Write device.conf** -- embeds credentials in the rootfs staging directory. -3. **Fuse** -- runs `odmfuse.sh` targeting the specific USB device instance. Board config is auto-detected from the USB product ID. -4. **Power-cycle prompt** -- asks the admin to power-cycle the device and re-enter recovery mode. -5. **Flash** -- runs `flash.sh` with the auto-detected board config to write the rootfs (including `device.conf`, Docker, and application files) to the device. Default target is `nvme0n1p1` (NVMe SSD); override with `FLASH_TARGET` in `.env` (e.g. `mmcblk0p1` for eMMC). -6. **Sticker prompt** -- displays the assigned serial and asks the admin to apply a physical label. - -### 4. Apply serial labels - -After each device is flashed, the script prints the assigned serial (e.g. `azj-0042`). Apply a label/sticker with this serial to the device enclosure for physical identification. - -### 5. First boot - -Power the Jetson. Docker starts automatically, loads the Loader image, and starts Docker Compose. The Loader service reads `AZAION_DEVICE_EMAIL` and `AZAION_DEVICE_PASSWORD` from `/etc/azaion/device.conf` and uses them to authenticate with the admin API via `POST /login`. The update manager begins checking for updates. - -### 6. Smoke verification - -- From another host: Loader `GET /health` on port 8080 returns healthy. -- `docker ps` on the device (if unhardened) shows the loader container running. -- Optional: trigger a resource or unlock smoke test against a staging API. - -## Troubleshooting - -| Symptom | Check | -| --- | --- | -| No devices found by script | USB cables, recovery mode entry sequence, `lsusb -d 0955:` | -| Unknown product ID warning | Device is an NVIDIA USB device but not in the supported models table. Check SKU. | -| L4T download fails | Internet access, NVIDIA download servers availability, `L4T_VERSION` value | -| Login fails (HTTP 401) | `ADMIN_EMAIL` and `ADMIN_PASSWORD` in `.env`; account must have ApiAdmin role | -| POST /devices fails | Admin API logs; ensure AZ-196 endpoint is deployed | -| Fuse fails | L4T version compatibility, USB connection stability, sudo access | -| Flash fails | Rootfs contents, USB device still in recovery mode after power-cycle, verify `FLASH_TARGET` matches your storage (NVMe vs eMMC) | -| Docker setup fails in chroot | Verify `qemu-user-static` was installed (auto-installed on x86 hosts); check internet in chroot | -| Loader container not starting | Check `docker logs` on device; verify `/etc/azaion/device.conf` exists and has correct permissions | -| Loader cannot log in after boot | `device.conf` path and permissions; password must match the account created by POST /devices | -| Cannot SSH to hardened device | Expected behavior. Use `--no-harden` for dev devices, or reflash with USB recovery mode | - -## Security notes - -- Treat `device.conf` as a secret at rest; restrict file permissions and disk encryption per your product policy. -- The `.env` file contains ApiAdmin credentials -- do not commit it. It is listed in `.gitignore`. -- Prefer short-lived credentials or key rotation if the admin API supports it; this runbook describes the baseline manufacturing flow. -- Hardened devices have no SSH, no serial console, and no interactive login. Field debug requires USB recovery mode reflash with `--no-harden`. +The provisioning flow in `suite/provisioning/` still references Loader today because `setup_rootfs_docker.sh` has not yet been rewritten for the Scenario X deployment model (Watchtower + rclone + flight-gate). The rewrite is tracked as a follow-up in `suite/ci/README.md`. Until the rewrite lands, newly-provisioned devices continue to boot with the legacy Loader service and will need a separate migration step to the new deployment model. diff --git a/scripts/.env.example b/scripts/.env.example deleted file mode 100644 index f0f6726..0000000 --- a/scripts/.env.example +++ /dev/null @@ -1,23 +0,0 @@ -API_URL=https://admin.azaion.com -ADMIN_EMAIL=admin@azaion.com -ADMIN_PASSWORD= - -# Path to the Loader Docker image tar (required). -# Build with: docker save localhost:5000/loader:arm -o loader-image.tar -LOADER_IMAGE_TAR= - -# Optional overrides (auto-detected if omitted): -# L4T_VERSION=r36.4.4 -# L4T_DIR=/opt/nvidia/Linux_for_Tegra -# ROOTFS_DIR=/opt/nvidia/Linux_for_Tegra/rootfs - -# Flash target (default: nvme0n1p1 for NVMe SSD, use mmcblk0p1 for eMMC): -# FLASH_TARGET=nvme0n1p1 - -# Loader runtime configuration (defaults shown): -# RESOURCE_API_URL=https://admin.azaion.com -# LOADER_DEV_STAGE=main -# LOADER_IMAGE=localhost:5000/loader:arm - -# Security hardening (default: true). Set to false or use --no-harden flag. -# HARDEN=true diff --git a/scripts/ensure_l4t.sh b/scripts/ensure_l4t.sh deleted file mode 100755 index cfbe1fa..0000000 --- a/scripts/ensure_l4t.sh +++ /dev/null @@ -1,62 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail - -L4T_VERSION="${L4T_VERSION:-r36.4.4}" -L4T_DIR="${L4T_DIR:-/opt/nvidia/Linux_for_Tegra}" -ROOTFS_DIR="${ROOTFS_DIR:-${L4T_DIR}/rootfs}" - -l4t_download_url() { - local version="$1" - local major minor patch - IFS='.' read -r major minor patch <<< "${version#r}" - echo "https://developer.nvidia.com/downloads/embedded/l4t/r${major}_release_v${minor}.${patch}/release/Jetson_Linux_${version}_aarch64.tbz2" -} - -rootfs_download_url() { - local version="$1" - local major minor patch - IFS='.' read -r major minor patch <<< "${version#r}" - echo "https://developer.nvidia.com/downloads/embedded/l4t/r${major}_release_v${minor}.${patch}/release/Tegra_Linux_Sample-Root-Filesystem_${version}_aarch64.tbz2" -} - -if [[ -f "$L4T_DIR/flash.sh" ]]; then - echo "L4T BSP already installed at $L4T_DIR" -else - echo "L4T BSP not found at $L4T_DIR" - echo "Downloading and installing L4T $L4T_VERSION..." - - bsp_url="$(l4t_download_url "$L4T_VERSION")" - rootfs_url="$(rootfs_download_url "$L4T_VERSION")" - tmp_dir="$(mktemp -d)" - - echo " Downloading BSP from $bsp_url ..." - wget -q --show-progress -O "$tmp_dir/bsp.tbz2" "$bsp_url" - - echo " Downloading Sample Root Filesystem from $rootfs_url ..." - wget -q --show-progress -O "$tmp_dir/rootfs.tbz2" "$rootfs_url" - - echo " Extracting BSP to $(dirname "$L4T_DIR")/ ..." - sudo mkdir -p "$(dirname "$L4T_DIR")" - sudo tar -xjf "$tmp_dir/bsp.tbz2" -C "$(dirname "$L4T_DIR")" - - echo " Extracting rootfs to $L4T_DIR/rootfs/ ..." - sudo tar -xjf "$tmp_dir/rootfs.tbz2" -C "$L4T_DIR/rootfs/" - - echo " Running apply_binaries.sh ..." - sudo "$L4T_DIR/apply_binaries.sh" - - rm -rf "$tmp_dir" - echo "L4T $L4T_VERSION installed to $L4T_DIR" -fi - -for tool in flash.sh odmfuse.sh; do - if [[ ! -f "$L4T_DIR/$tool" ]]; then - echo "ERROR: $L4T_DIR/$tool not found after L4T setup" >&2 - exit 1 - fi -done - -NV_RELEASE="$L4T_DIR/rootfs/etc/nv_tegra_release" -if [[ -f "$NV_RELEASE" ]]; then - echo "L4T release: $(head -1 "$NV_RELEASE")" -fi diff --git a/scripts/harden_rootfs.sh b/scripts/harden_rootfs.sh deleted file mode 100755 index 4ef6cf6..0000000 --- a/scripts/harden_rootfs.sh +++ /dev/null @@ -1,52 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail - -ROOTFS="${ROOTFS_DIR:-/opt/nvidia/Linux_for_Tegra/rootfs}" - -if [[ ! -d "$ROOTFS" ]]; then - echo "ERROR: Rootfs directory not found: $ROOTFS" >&2 - exit 1 -fi - -echo "=== Hardening rootfs: $ROOTFS ===" - -echo "[1/5] Disabling SSH..." -for unit in sshd.service ssh.service; do - sudo ln -sf /dev/null "$ROOTFS/etc/systemd/system/$unit" 2>/dev/null || true -done -sudo rm -f "$ROOTFS/etc/ssh/sshd_config" - -echo "[2/5] Masking getty and serial console services..." -for unit in "getty@.service" "serial-getty@.service"; do - sudo ln -sf /dev/null "$ROOTFS/etc/systemd/system/$unit" -done - -echo "[3/5] Disabling serial console in bootloader config..." -EXTLINUX="$ROOTFS/boot/extlinux/extlinux.conf" -if [[ -f "$EXTLINUX" ]]; then - sudo sed -i 's/console=ttyTCU0[^ ]*//' "$EXTLINUX" - sudo sed -i 's/console=ttyS0[^ ]*//' "$EXTLINUX" - sudo sed -i 's/ */ /g' "$EXTLINUX" -fi - -echo "[4/5] Applying sysctl hardening..." -sudo tee "$ROOTFS/etc/sysctl.d/99-azaion-hardening.conf" > /dev/null <<'EOF' -kernel.yama.ptrace_scope = 3 -kernel.core_pattern = |/bin/false -kernel.kptr_restrict = 2 -kernel.dmesg_restrict = 1 -net.ipv4.conf.all.rp_filter = 1 -net.ipv4.conf.default.rp_filter = 1 -net.ipv4.conf.all.accept_redirects = 0 -net.ipv4.conf.default.accept_redirects = 0 -net.ipv4.conf.all.send_redirects = 0 -net.ipv4.conf.default.send_redirects = 0 -EOF - -echo "[5/5] Locking root account..." -if [[ -f "$ROOTFS/etc/shadow" ]]; then - sudo sed -i 's|^root:[^:]*:|root:!:|' "$ROOTFS/etc/shadow" -fi - -echo "" -echo "Hardening complete." diff --git a/scripts/provision_devices.sh b/scripts/provision_devices.sh deleted file mode 100755 index 9f6cadb..0000000 --- a/scripts/provision_devices.sh +++ /dev/null @@ -1,292 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -ENV_FILE="$SCRIPT_DIR/.env" - -HARDEN="${HARDEN:-true}" -while [[ $# -gt 0 ]]; do - case "$1" in - --no-harden) HARDEN="false"; shift ;; - *) echo "ERROR: Unknown option: $1" >&2; exit 1 ;; - esac -done - -NVIDIA_VENDOR="0955" - -declare -A PID_TO_MODEL=( - ["7523"]="Orin Nano" - ["7323"]="Orin NX 16GB" - ["7423"]="Orin NX 8GB" -) - -declare -A PID_TO_BOARD_CONFIG=( - ["7523"]="jetson-orin-nano-devkit" - ["7323"]="jetson-orin-nx-devkit" - ["7423"]="jetson-orin-nx-devkit" -) - -require_env_var() { - local name="$1" - local val="${!name:-}" - if [[ -z "$val" ]]; then - echo "ERROR: $name is not set in $ENV_FILE" >&2 - exit 1 - fi -} - -api_post() { - local url="$1"; shift - local response - response="$(curl -sS -w "\n%{http_code}" -X POST "$url" -H "Content-Type: application/json" "$@")" - echo "$response" -} - -provision_single_device() { - local device_line="$1" - local usb_id="$2" - local board_config="$3" - - echo "[Step 1/5] Registering device with admin API..." - local reg_response reg_http reg_body - reg_response="$(api_post "${API_URL}/devices" -H "Authorization: Bearer $TOKEN")" - reg_http="$(echo "$reg_response" | tail -1)" - reg_body="$(echo "$reg_response" | sed '$d')" - - if [[ "$reg_http" != "200" && "$reg_http" != "201" ]]; then - echo "ERROR: Device registration failed (HTTP $reg_http)" >&2 - echo "$reg_body" >&2 - RESULTS+=("$usb_id | FAILED | registration error HTTP $reg_http") - return 1 - fi - - local serial dev_email dev_password - serial="$(echo "$reg_body" | jq -r '.serial // .Serial // empty')" - dev_email="$(echo "$reg_body" | jq -r '.email // .Email // empty')" - dev_password="$(echo "$reg_body" | jq -r '.password // .Password // empty')" - - if [[ -z "$serial" || -z "$dev_email" || -z "$dev_password" ]]; then - echo "ERROR: Incomplete response from POST /devices" >&2 - RESULTS+=("$usb_id | FAILED | incomplete API response") - return 1 - fi - - echo " Assigned serial: $serial" - echo " Email: $dev_email" - - echo "[Step 2/5] Writing device.conf to rootfs staging..." - local conf_dir="${ROOTFS_DIR}/etc/azaion" - sudo mkdir -p "$conf_dir" - local conf_path="${conf_dir}/device.conf" - printf 'AZAION_DEVICE_EMAIL=%s\nAZAION_DEVICE_PASSWORD=%s\n' "$dev_email" "$dev_password" \ - | sudo tee "$conf_path" > /dev/null - sudo chmod 600 "$conf_path" - echo " Written: $conf_path" - - echo "[Step 3/5] Fusing device (odmfuse.sh)..." - if ! sudo "$L4T_DIR/odmfuse.sh" --instance "$usb_id" 2>&1; then - echo "ERROR: Fusing failed for $serial ($usb_id)" >&2 - RESULTS+=("$usb_id | $serial | FAILED | fuse error") - return 1 - fi - - echo "" - echo "[$serial] Fuse complete." - read -rp " Power-cycle the device and put it back in recovery mode. Press Enter when ready..." - - echo "[Step 4/5] Flashing device (flash.sh)..." - if ! sudo "$L4T_DIR/flash.sh" "$board_config" "$FLASH_TARGET" --instance "$usb_id" 2>&1; then - echo "ERROR: Flashing failed for $serial ($usb_id)" >&2 - RESULTS+=("$usb_id | $serial | FAILED | flash error") - return 1 - fi - - echo "" - echo "[$serial] Flash complete." - echo " >>> Apply sticker with serial: $serial <<<" - read -rp " Power-cycle for first boot. Press Enter when done..." - - echo "[Step 5/5] $serial provisioned successfully." - RESULTS+=("$usb_id | $serial | OK") -} - -# --- main --- - -if [[ ! -f "$ENV_FILE" ]]; then - echo "ERROR: $ENV_FILE not found. Copy .env.example to .env and fill in values." >&2 - exit 1 -fi - -set -a -source "$ENV_FILE" -set +a - -for var in ADMIN_EMAIL ADMIN_PASSWORD API_URL LOADER_IMAGE_TAR; do - require_env_var "$var" -done -API_URL="${API_URL%/}" - -RESOURCE_API_URL="${RESOURCE_API_URL:-$API_URL}" -LOADER_DEV_STAGE="${LOADER_DEV_STAGE:-main}" -LOADER_IMAGE="${LOADER_IMAGE:-localhost:5000/loader:arm}" -FLASH_TARGET="${FLASH_TARGET:-nvme0n1p1}" - -L4T_VERSION="${L4T_VERSION:-r36.4.4}" -L4T_DIR="${L4T_DIR:-/opt/nvidia/Linux_for_Tegra}" -ROOTFS_DIR="${ROOTFS_DIR:-$L4T_DIR/rootfs}" - -export L4T_VERSION L4T_DIR ROOTFS_DIR RESOURCE_API_URL LOADER_DEV_STAGE LOADER_IMAGE LOADER_IMAGE_TAR - -echo "=== Installing host dependencies ===" -sudo apt-get update -qq -sudo apt-get install -y usbutils curl jq wget -[[ "$(uname -m)" != "aarch64" ]] && sudo apt-get install -y qemu-user-static binfmt-support -echo "" - -echo "=== L4T BSP setup ===" -"$SCRIPT_DIR/ensure_l4t.sh" -echo "" - -echo "=== Setting up rootfs (Docker + application) ===" -"$SCRIPT_DIR/setup_rootfs_docker.sh" -echo "" - -if [[ "$HARDEN" == "true" ]]; then - echo "=== Applying security hardening ===" - "$SCRIPT_DIR/harden_rootfs.sh" - echo "" -else - echo "=== Security hardening SKIPPED (--no-harden) ===" - echo "" -fi - -echo "=== Authenticating with admin API ===" -LOGIN_JSON="$(printf '{"email":"%s","password":"%s"}' "$ADMIN_EMAIL" "$ADMIN_PASSWORD")" -LOGIN_RESPONSE="$(api_post "${API_URL}/login" -d "$LOGIN_JSON")" -LOGIN_HTTP="$(echo "$LOGIN_RESPONSE" | tail -1)" -LOGIN_BODY="$(echo "$LOGIN_RESPONSE" | sed '$d')" - -if [[ "$LOGIN_HTTP" != "200" ]]; then - echo "ERROR: Login failed (HTTP $LOGIN_HTTP)" >&2 - echo "$LOGIN_BODY" >&2 - exit 1 -fi - -TOKEN="$(echo "$LOGIN_BODY" | jq -r '.token // .Token // empty')" -if [[ -z "$TOKEN" ]]; then - echo "ERROR: No token in login response" >&2 - echo "$LOGIN_BODY" >&2 - exit 1 -fi -echo "Authenticated successfully." -echo "" - -echo "=== Scanning for Jetson devices in recovery mode ===" -LSUSB_OUTPUT="$(lsusb -d "${NVIDIA_VENDOR}:" 2>/dev/null || true)" - -if [[ -z "$LSUSB_OUTPUT" ]]; then - echo "No Jetson devices found in recovery mode." - echo "Ensure devices are connected via USB and in recovery mode (hold Force Recovery, press Power)." - exit 0 -fi - -DEVICES=() -DEVICE_PIDS=() -DEVICE_MODELS=() -while IFS= read -r line; do - pid="$(echo "$line" | grep -oP "${NVIDIA_VENDOR}:\K[0-9a-fA-F]+")" - if [[ -n "${PID_TO_BOARD_CONFIG[$pid]:-}" ]]; then - DEVICES+=("$line") - DEVICE_PIDS+=("$pid") - DEVICE_MODELS+=("${PID_TO_MODEL[$pid]:-Unknown (PID $pid)}") - fi -done <<< "$LSUSB_OUTPUT" - -DEVICE_COUNT="${#DEVICES[@]}" - -if [[ "$DEVICE_COUNT" -eq 0 ]]; then - echo "No supported Jetson devices found." - echo "Detected NVIDIA USB devices but none matched known Jetson Orin product IDs." - exit 0 -fi - -echo "" -echo "Select device(s) to provision." -echo " one device, e.g. 1" -echo " some devices, e.g. 1 3 4" -echo " or all devices: 0" -echo "" -echo "--------------------------------------------" -echo "Connected Jetson devices (recovery mode):" -echo "--------------------------------------------" -for i in "${!DEVICES[@]}"; do - num=$((i + 1)) - printf "%-3s %-16s %s\n" "$num" "[${DEVICE_MODELS[$i]}]" "${DEVICES[$i]}" -done -echo "--------------------------------------------" -echo "0 - provision all devices" -echo "" - -read -rp "Your selection: " SELECTION - -SELECTED_INDICES=() -if [[ "$SELECTION" == "0" ]]; then - for i in "${!DEVICES[@]}"; do - SELECTED_INDICES+=("$i") - done -else - for num in $SELECTION; do - if [[ "$num" =~ ^[0-9]+$ ]] && (( num >= 1 && num <= DEVICE_COUNT )); then - SELECTED_INDICES+=("$((num - 1))") - else - echo "ERROR: Invalid selection: $num (must be 1-$DEVICE_COUNT or 0 for all)" >&2 - exit 1 - fi - done -fi - -if [[ ${#SELECTED_INDICES[@]} -eq 0 ]]; then - echo "No devices selected." - exit 0 -fi - -echo "" -echo "=== Provisioning ${#SELECTED_INDICES[@]} device(s) ===" -echo "" - -RESULTS=() - -for idx in "${SELECTED_INDICES[@]}"; do - DEVICE_LINE="${DEVICES[$idx]}" - USB_ID="$(echo "$DEVICE_LINE" | grep -oP 'Bus \K[0-9]+')-$(echo "$DEVICE_LINE" | grep -oP 'Device \K[0-9]+')" - BOARD_CONFIG="${PID_TO_BOARD_CONFIG[${DEVICE_PIDS[$idx]}]:-}" - if [[ -z "$BOARD_CONFIG" ]]; then - echo "ERROR: Unknown Jetson product ID: $NVIDIA_VENDOR:${DEVICE_PIDS[$idx]}" >&2 - RESULTS+=("$USB_ID | FAILED | unknown product ID") - continue - fi - echo "--------------------------------------------" - echo "Device: ${DEVICE_MODELS[$idx]} — $DEVICE_LINE" - echo "USB instance: $USB_ID" - echo "Board config: $BOARD_CONFIG" - echo "--------------------------------------------" - - provision_single_device "$DEVICE_LINE" "$USB_ID" "$BOARD_CONFIG" || true - echo "" -done - -CONF_CLEANUP="${ROOTFS_DIR}/etc/azaion/device.conf" -if [[ -f "$CONF_CLEANUP" ]]; then - sudo rm -f "$CONF_CLEANUP" -fi - -echo "" -echo "========================================" -echo " Provisioning Summary" -echo "========================================" -printf "%-12s | %-10s | %s\n" "USB ID" "Serial" "Status" -echo "----------------------------------------" -for r in "${RESULTS[@]}"; do - echo "$r" -done -echo "========================================" diff --git a/scripts/setup_rootfs_docker.sh b/scripts/setup_rootfs_docker.sh deleted file mode 100755 index 2ff94a8..0000000 --- a/scripts/setup_rootfs_docker.sh +++ /dev/null @@ -1,179 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail - -ROOTFS="${ROOTFS_DIR:-/opt/nvidia/Linux_for_Tegra/rootfs}" -LOADER_IMAGE_TAR="${LOADER_IMAGE_TAR:-}" -RESOURCE_API_URL="${RESOURCE_API_URL:-https://api.azaion.com}" -LOADER_DEV_STAGE="${LOADER_DEV_STAGE:-main}" -LOADER_IMAGE="${LOADER_IMAGE:-localhost:5000/loader:arm}" - -if [[ ! -d "$ROOTFS" ]]; then - echo "ERROR: Rootfs directory not found: $ROOTFS" >&2 - exit 1 -fi - -if [[ -z "$LOADER_IMAGE_TAR" ]]; then - echo "ERROR: LOADER_IMAGE_TAR not set. Set it in .env to the Loader Docker image tar path." >&2 - exit 1 -fi - -if [[ ! -f "$LOADER_IMAGE_TAR" ]]; then - echo "ERROR: Loader image tar not found: $LOADER_IMAGE_TAR" >&2 - exit 1 -fi - -cleanup_mounts() { - for mp in proc sys dev/pts dev; do - sudo umount "$ROOTFS/$mp" 2>/dev/null || true - done - if [[ -f "$ROOTFS/etc/resolv.conf.setup-bak" ]]; then - sudo mv "$ROOTFS/etc/resolv.conf.setup-bak" "$ROOTFS/etc/resolv.conf" - fi -} - -setup_mounts() { - for mp in proc sys dev dev/pts; do - mountpoint -q "$ROOTFS/$mp" 2>/dev/null && sudo umount "$ROOTFS/$mp" 2>/dev/null || true - done - sudo mount --bind /proc "$ROOTFS/proc" - sudo mount --bind /sys "$ROOTFS/sys" - sudo mount --bind /dev "$ROOTFS/dev" - sudo mount --bind /dev/pts "$ROOTFS/dev/pts" - if [[ -f "$ROOTFS/etc/resolv.conf" ]]; then - sudo cp "$ROOTFS/etc/resolv.conf" "$ROOTFS/etc/resolv.conf.setup-bak" - fi - sudo cp /etc/resolv.conf "$ROOTFS/etc/resolv.conf" -} - -if [[ "$(uname -m)" != "aarch64" ]]; then - if [[ ! -f "$ROOTFS/usr/bin/qemu-aarch64-static" ]]; then - sudo cp /usr/bin/qemu-aarch64-static "$ROOTFS/usr/bin/" - fi -fi - -trap cleanup_mounts EXIT - -echo "=== Setting up Docker in rootfs ===" -echo " Rootfs: $ROOTFS" -echo " Image tar: $LOADER_IMAGE_TAR" -echo "" - -setup_mounts - -if sudo chroot "$ROOTFS" docker --version &>/dev/null; then - echo "[1/6] Docker already installed, skipping..." -else - echo "[1/6] Installing Docker Engine..." - sudo chroot "$ROOTFS" bash -c ' - apt-get update - apt-get install -y ca-certificates curl gnupg - install -m 0755 -d /etc/apt/keyrings - curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc - chmod a+r /etc/apt/keyrings/docker.asc - . /etc/os-release - echo "deb [arch=arm64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $VERSION_CODENAME stable" > /etc/apt/sources.list.d/docker.list - apt-get update - apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin - apt-get clean - rm -rf /var/lib/apt/lists/* - ' -fi - -if sudo chroot "$ROOTFS" dpkg -l nvidia-container-toolkit 2>/dev/null | grep -q '^ii'; then - echo "[2/6] NVIDIA Container Toolkit already installed, skipping..." -else - echo "[2/6] Installing NVIDIA Container Toolkit..." - sudo chroot "$ROOTFS" bash -c ' - curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ - | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg - curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ - | sed "s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g" \ - > /etc/apt/sources.list.d/nvidia-container-toolkit.list - apt-get update - apt-get install -y nvidia-container-toolkit - apt-get clean - rm -rf /var/lib/apt/lists/* - ' -fi - -echo "[3/6] Configuring Docker daemon (NVIDIA default runtime)..." -sudo mkdir -p "$ROOTFS/etc/docker" -sudo tee "$ROOTFS/etc/docker/daemon.json" > /dev/null <<'EOF' -{ - "default-runtime": "nvidia", - "runtimes": { - "nvidia": { - "path": "nvidia-container-runtime", - "runtimeArgs": [] - } - } -} -EOF - -echo "[4/6] Enabling Docker and containerd services..." -sudo mkdir -p "$ROOTFS/etc/systemd/system/multi-user.target.wants" -sudo ln -sf /lib/systemd/system/docker.service \ - "$ROOTFS/etc/systemd/system/multi-user.target.wants/docker.service" -sudo ln -sf /lib/systemd/system/containerd.service \ - "$ROOTFS/etc/systemd/system/multi-user.target.wants/containerd.service" - -echo "[5/6] Creating Azaion application layout..." -sudo mkdir -p "$ROOTFS/opt/azaion/models" -sudo mkdir -p "$ROOTFS/opt/azaion/state" - -sudo tee "$ROOTFS/opt/azaion/docker-compose.yml" > /dev/null < /dev/null <<'EOF' -#!/bin/bash -set -e -if [ -f /opt/azaion/loader-image.tar ]; then - docker load -i /opt/azaion/loader-image.tar - rm -f /opt/azaion/loader-image.tar -fi -docker compose -f /opt/azaion/docker-compose.yml up -d -EOF -sudo chmod 755 "$ROOTFS/opt/azaion/boot.sh" - -sudo tee "$ROOTFS/etc/systemd/system/azaion-loader.service" > /dev/null <<'EOF' -[Unit] -Description=Azaion Loader -After=docker.service -Requires=docker.service - -[Service] -Type=oneshot -RemainAfterExit=yes -ExecStart=/opt/azaion/boot.sh - -[Install] -WantedBy=multi-user.target -EOF - -sudo ln -sf /etc/systemd/system/azaion-loader.service \ - "$ROOTFS/etc/systemd/system/multi-user.target.wants/azaion-loader.service" - -echo "" -echo "Docker setup complete."