mirror of
https://github.com/azaion/loader.git
synced 2026-04-22 08:46:33 +00:00
Remove provisioning runbook and related scripts
- Deleted the Jetson device provisioning runbook and associated scripts (`provision_devices.sh`, `ensure_l4t.sh`, `harden_rootfs.sh`, `setup_rootfs_docker.sh`, and `.env.example`) as part of a cleanup effort. This streamlines the project by removing outdated documentation and scripts that are no longer in use.
This commit is contained in:
@@ -1,222 +1,14 @@
|
||||
# Jetson device provisioning runbook
|
||||
# Jetson device provisioning runbook (moved)
|
||||
|
||||
This runbook describes the end-to-end flow to fuse, flash, and provision device identities so the Azaion Loader can authenticate against the admin/resource APIs. It supports Jetson Orin Nano, Orin NX 8GB, and Orin NX 16GB devices. Board configuration is auto-detected from the USB product ID.
|
||||
The provisioning runbook and its scripts (`provision_devices.sh`, `ensure_l4t.sh`, `setup_rootfs_docker.sh`, `harden_rootfs.sh`) no longer live inside the `loader` repository.
|
||||
|
||||
The `scripts/provision_devices.sh` script automates the entire flow: detecting connected Jetsons, auto-installing L4T if needed, setting up Docker with the Loader container, optionally hardening the OS, registering device identities via the admin API, writing credentials, fusing, and flashing.
|
||||
They were relocated to the suite meta-repo because device provisioning is a fleet/manufacturing concern, not a Loader concern — it sets up the base OS, Docker, identity, and hardening that all services share.
|
||||
|
||||
After provisioning, each Jetson boots into a production-ready state with Docker Compose running the Loader container.
|
||||
- **New location:** `suite/provisioning/`
|
||||
- **Runbook:** `suite/provisioning/README.md`
|
||||
|
||||
## Prerequisites
|
||||
If you are working inside the loader repo in isolation, the scripts are not available here; clone the suite meta-repo to run provisioning.
|
||||
|
||||
- Ubuntu amd64 provisioning workstation with bash, curl, jq, wget, lsusb.
|
||||
- Admin API reachable from the workstation (base URL configured in `scripts/.env`).
|
||||
- An ApiAdmin account on the admin API (email and password in `scripts/.env`).
|
||||
- `sudo` access on the workstation.
|
||||
- USB-C cables that support both power and data transfer.
|
||||
- Physical label/sticker materials for serial numbers.
|
||||
- Internet access on first run (to download L4T BSP if not already installed).
|
||||
- Loader Docker image tar file (see [Preparing the Loader image](#preparing-the-loader-image)).
|
||||
## Note on Loader retirement
|
||||
|
||||
The NVIDIA L4T BSP and sample rootfs are downloaded and installed automatically to `/opt/nvidia/Linux_for_Tegra` if not already present. No manual L4T setup is required.
|
||||
|
||||
## Configuration
|
||||
|
||||
Copy `scripts/.env.example` to `scripts/.env` and fill in values:
|
||||
|
||||
```
|
||||
ADMIN_EMAIL=admin@azaion.com
|
||||
ADMIN_PASSWORD=<your ApiAdmin password>
|
||||
API_URL=https://admin.azaion.com
|
||||
LOADER_IMAGE_TAR=/path/to/loader-image.tar
|
||||
```
|
||||
|
||||
Optional overrides (auto-detected/defaulted if omitted):
|
||||
|
||||
```
|
||||
L4T_VERSION=r36.4.4
|
||||
L4T_DIR=/opt/nvidia/Linux_for_Tegra
|
||||
ROOTFS_DIR=/opt/nvidia/Linux_for_Tegra/rootfs
|
||||
RESOURCE_API_URL=https://admin.azaion.com
|
||||
LOADER_DEV_STAGE=main
|
||||
LOADER_IMAGE=localhost:5000/loader:arm
|
||||
FLASH_TARGET=nvme0n1p1
|
||||
HARDEN=true
|
||||
```
|
||||
|
||||
The `.env` file is git-ignored and must not be committed.
|
||||
|
||||
## Preparing the Loader image
|
||||
|
||||
The provisioning script requires a Loader Docker image tar to pre-load onto each device. Options:
|
||||
|
||||
**From CI (recommended):** Download the `loader-image.tar` artifact from the Woodpecker CI pipeline for the target branch.
|
||||
|
||||
**Local build (requires arm64 builder or BuildKit cross-compilation):**
|
||||
|
||||
```bash
|
||||
docker build -f Dockerfile -t localhost:5000/loader:arm .
|
||||
docker save localhost:5000/loader:arm -o loader-image.tar
|
||||
```
|
||||
|
||||
Set `LOADER_IMAGE_TAR` in `.env` to the absolute path of the resulting tar file.
|
||||
|
||||
## Supported devices
|
||||
|
||||
| USB Product ID | Model | Board Config (auto-detected) |
|
||||
| --- | --- | --- |
|
||||
| 0955:7523 | Jetson Orin Nano | jetson-orin-nano-devkit |
|
||||
| 0955:7323 | Jetson Orin NX 16GB | jetson-orin-nx-devkit |
|
||||
| 0955:7423 | Jetson Orin NX 8GB | jetson-orin-nx-devkit |
|
||||
|
||||
The script scans for all NVIDIA USB devices (`lsusb -d 0955:`), matches them against the table above, and displays the model name next to each detected device.
|
||||
|
||||
## Admin API contract (device registration)
|
||||
|
||||
The script calls:
|
||||
|
||||
1. **POST** `{API_URL}/login` with `{"email":"<admin>","password":"<password>"}` to obtain a JWT.
|
||||
2. **POST** `{API_URL}/devices` with `Authorization: Bearer <token>` and no request body.
|
||||
- **200** or **201**: returns `{"serial":"azj-NNNN","email":"azj-NNNN@azaion.com","password":"<32-hex-chars>"}`.
|
||||
- The server auto-assigns the next sequential serial number.
|
||||
|
||||
## Device identity and `device.conf`
|
||||
|
||||
For each registered device, the script writes:
|
||||
|
||||
`{ROOTFS_DIR}/etc/azaion/device.conf`
|
||||
|
||||
On the flashed device this becomes `/etc/azaion/device.conf` with:
|
||||
|
||||
- `AZAION_DEVICE_EMAIL=azj-NNNN@azaion.com`
|
||||
- `AZAION_DEVICE_PASSWORD=<32-hex-chars>`
|
||||
|
||||
File permissions are set to **600**.
|
||||
|
||||
## Docker and application setup
|
||||
|
||||
The `scripts/setup_rootfs_docker.sh` script prepares the rootfs before flashing. It runs automatically as part of `provision_devices.sh`. What it installs:
|
||||
|
||||
| Component | Details |
|
||||
| --- | --- |
|
||||
| Docker Engine + Compose plugin | Installed via apt in chroot from Docker's official repository |
|
||||
| NVIDIA Container Toolkit | GPU passthrough for containers; nvidia set as default runtime |
|
||||
| Production compose file | `/opt/azaion/docker-compose.yml` — defines the `loader` service |
|
||||
| Loader image | Pre-loaded from `LOADER_IMAGE_TAR` at `/opt/azaion/loader-image.tar` |
|
||||
| Boot service | `azaion-loader.service` — loads the image tar on first boot, starts compose |
|
||||
|
||||
### Device filesystem layout after flash
|
||||
|
||||
```
|
||||
/etc/azaion/device.conf Per-device credentials
|
||||
/etc/docker/daemon.json Docker config (NVIDIA default runtime)
|
||||
/opt/azaion/docker-compose.yml Production compose file
|
||||
/opt/azaion/boot.sh Boot startup script
|
||||
/opt/azaion/loader-image.tar Initial Loader image (deleted after first boot)
|
||||
/opt/azaion/models/ Model storage
|
||||
/opt/azaion/state/ Update manager state
|
||||
/etc/systemd/system/azaion-loader.service Systemd unit
|
||||
```
|
||||
|
||||
### First boot sequence
|
||||
|
||||
1. systemd starts `docker.service`
|
||||
2. `azaion-loader.service` runs `/opt/azaion/boot.sh`
|
||||
3. `boot.sh` runs `docker load -i /opt/azaion/loader-image.tar` (first boot only), then deletes the tar
|
||||
4. `boot.sh` runs `docker compose -f /opt/azaion/docker-compose.yml up -d`
|
||||
5. The Loader container starts, reads `/etc/azaion/device.conf`, authenticates with the API
|
||||
6. The update manager begins polling for updates
|
||||
|
||||
## Security hardening
|
||||
|
||||
The `scripts/harden_rootfs.sh` script applies production security hardening to the rootfs. It runs automatically unless `--no-harden` is passed.
|
||||
|
||||
| Measure | Details |
|
||||
| --- | --- |
|
||||
| SSH disabled | `sshd.service` and `ssh.service` masked; `sshd_config` removed |
|
||||
| Getty masked | `getty@.service` and `serial-getty@.service` masked — no login prompt |
|
||||
| Serial console disabled | `console=ttyTCU0` / `console=ttyS0` removed from `extlinux.conf` |
|
||||
| Sysctl hardening | ptrace blocked, core dumps disabled, kernel pointers hidden, ICMP redirects off |
|
||||
| Root locked | Root account password-locked in `/etc/shadow` |
|
||||
|
||||
To provision without hardening (e.g. for development devices):
|
||||
|
||||
```bash
|
||||
./scripts/provision_devices.sh --no-harden
|
||||
```
|
||||
|
||||
Or set `HARDEN=false` in `.env`.
|
||||
|
||||
## Step-by-step flow
|
||||
|
||||
### 1. Connect Jetsons in recovery mode
|
||||
|
||||
Connect one or more Jetson devices via USB-C. Put each device into recovery mode: hold Force Recovery button, press Power, release Power, then release Force Recovery after 2 seconds.
|
||||
|
||||
Verify with `lsusb -d 0955:` -- each recovery-mode Jetson appears as `NVIDIA Corp. APX`.
|
||||
|
||||
### 2. Run the provisioning script
|
||||
|
||||
From the loader repository root:
|
||||
|
||||
```bash
|
||||
./scripts/provision_devices.sh
|
||||
```
|
||||
|
||||
The script will:
|
||||
|
||||
1. **Install dependencies** -- installs lsusb, curl, jq, wget via apt; adds `qemu-user-static` and `binfmt-support` on x86 hosts for cross-arch chroot.
|
||||
2. **Install L4T** -- if L4T BSP is not present at `L4T_DIR`, downloads the BSP and sample rootfs, extracts them, and runs `apply_binaries.sh`. This only happens on first run.
|
||||
3. **Set up Docker** -- installs Docker Engine, NVIDIA Container Toolkit, compose file, and Loader image into the rootfs via chroot (`setup_rootfs_docker.sh`).
|
||||
4. **Harden OS** (unless `--no-harden`) -- disables SSH, getty, serial console, applies sysctl hardening (`harden_rootfs.sh`).
|
||||
5. **Authenticate** -- logs in to the admin API to get a JWT.
|
||||
6. **Scan USB** -- detects all supported Jetson devices in recovery mode, displays model names.
|
||||
7. **Display selection UI** -- lists detected devices with numbers and model type.
|
||||
8. **Prompt for selection** -- enter device numbers (e.g. `1 3 4`), or `0` for all.
|
||||
|
||||
### 3. Per-device provisioning (automatic)
|
||||
|
||||
For each selected device, the script runs sequentially:
|
||||
|
||||
1. **Register** -- calls `POST /devices` to get server-assigned serial, email, and password.
|
||||
2. **Write device.conf** -- embeds credentials in the rootfs staging directory.
|
||||
3. **Fuse** -- runs `odmfuse.sh` targeting the specific USB device instance. Board config is auto-detected from the USB product ID.
|
||||
4. **Power-cycle prompt** -- asks the admin to power-cycle the device and re-enter recovery mode.
|
||||
5. **Flash** -- runs `flash.sh` with the auto-detected board config to write the rootfs (including `device.conf`, Docker, and application files) to the device. Default target is `nvme0n1p1` (NVMe SSD); override with `FLASH_TARGET` in `.env` (e.g. `mmcblk0p1` for eMMC).
|
||||
6. **Sticker prompt** -- displays the assigned serial and asks the admin to apply a physical label.
|
||||
|
||||
### 4. Apply serial labels
|
||||
|
||||
After each device is flashed, the script prints the assigned serial (e.g. `azj-0042`). Apply a label/sticker with this serial to the device enclosure for physical identification.
|
||||
|
||||
### 5. First boot
|
||||
|
||||
Power the Jetson. Docker starts automatically, loads the Loader image, and starts Docker Compose. The Loader service reads `AZAION_DEVICE_EMAIL` and `AZAION_DEVICE_PASSWORD` from `/etc/azaion/device.conf` and uses them to authenticate with the admin API via `POST /login`. The update manager begins checking for updates.
|
||||
|
||||
### 6. Smoke verification
|
||||
|
||||
- From another host: Loader `GET /health` on port 8080 returns healthy.
|
||||
- `docker ps` on the device (if unhardened) shows the loader container running.
|
||||
- Optional: trigger a resource or unlock smoke test against a staging API.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Check |
|
||||
| --- | --- |
|
||||
| No devices found by script | USB cables, recovery mode entry sequence, `lsusb -d 0955:` |
|
||||
| Unknown product ID warning | Device is an NVIDIA USB device but not in the supported models table. Check SKU. |
|
||||
| L4T download fails | Internet access, NVIDIA download servers availability, `L4T_VERSION` value |
|
||||
| Login fails (HTTP 401) | `ADMIN_EMAIL` and `ADMIN_PASSWORD` in `.env`; account must have ApiAdmin role |
|
||||
| POST /devices fails | Admin API logs; ensure AZ-196 endpoint is deployed |
|
||||
| Fuse fails | L4T version compatibility, USB connection stability, sudo access |
|
||||
| Flash fails | Rootfs contents, USB device still in recovery mode after power-cycle, verify `FLASH_TARGET` matches your storage (NVMe vs eMMC) |
|
||||
| Docker setup fails in chroot | Verify `qemu-user-static` was installed (auto-installed on x86 hosts); check internet in chroot |
|
||||
| Loader container not starting | Check `docker logs` on device; verify `/etc/azaion/device.conf` exists and has correct permissions |
|
||||
| Loader cannot log in after boot | `device.conf` path and permissions; password must match the account created by POST /devices |
|
||||
| Cannot SSH to hardened device | Expected behavior. Use `--no-harden` for dev devices, or reflash with USB recovery mode |
|
||||
|
||||
## Security notes
|
||||
|
||||
- Treat `device.conf` as a secret at rest; restrict file permissions and disk encryption per your product policy.
|
||||
- The `.env` file contains ApiAdmin credentials -- do not commit it. It is listed in `.gitignore`.
|
||||
- Prefer short-lived credentials or key rotation if the admin API supports it; this runbook describes the baseline manufacturing flow.
|
||||
- Hardened devices have no SSH, no serial console, and no interactive login. Field debug requires USB recovery mode reflash with `--no-harden`.
|
||||
The provisioning flow in `suite/provisioning/` still references Loader today because `setup_rootfs_docker.sh` has not yet been rewritten for the Scenario X deployment model (Watchtower + rclone + flight-gate). The rewrite is tracked as a follow-up in `suite/ci/README.md`. Until the rewrite lands, newly-provisioned devices continue to boot with the legacy Loader service and will need a separate migration step to the new deployment model.
|
||||
|
||||
Reference in New Issue
Block a user