Files
ai-training/_docs/04_deploy/deploy_scripts.md
T
Oleksandr Bezdieniezhnykh aeb7f8ca8c Update autopilot workflow and documentation for project cycle completion
- Modified the existing-code workflow to automatically loop back to New Task after project completion without user confirmation.
- Updated the autopilot state to reflect the current step as `done` and status as `completed`.
- Clarified the deployment status report by specifying non-deployed services and their purposes.

These changes enhance the automation of task management and improve documentation clarity.
2026-03-29 05:02:22 +03:00

116 lines
4.0 KiB
Markdown

# Azaion AI Training — Deployment Scripts
## Overview
| Script | Purpose | Location |
|--------|---------|----------|
| `deploy.sh` | Main deployment orchestrator | `scripts/deploy.sh` |
| `generate-config.sh` | Generate `config.yaml` from environment variables | `scripts/generate-config.sh` |
| `pull-images.sh` | Pull Docker images from registry | `scripts/pull-images.sh` |
| `start-services.sh` | Start all services via Docker Compose | `scripts/start-services.sh` |
| `stop-services.sh` | Graceful shutdown with tag backup | `scripts/stop-services.sh` |
| `health-check.sh` | Verify deployment health | `scripts/health-check.sh` |
## Prerequisites
- Docker and Docker Compose installed on target machine
- NVIDIA driver + Docker GPU support (`nvidia-container-toolkit`)
- SSH access to target machine (for remote deployment)
- `.env` file with required environment variables (see `.env.example`)
## Environment Variables
All scripts source `.env` from the project root.
| Variable | Required By | Purpose |
|----------|------------|---------|
| `DEPLOY_HOST` | `deploy.sh` (remote) | SSH target for remote deployment |
| `DEPLOY_USER` | `deploy.sh` (remote) | SSH user (default: `deploy`) |
| `DOCKER_REGISTRY` | `pull-images.sh` | Container registry URL |
| `DOCKER_IMAGE_TAG` | `pull-images.sh` | Image version to deploy (default: `latest`) |
| `AZAION_API_URL` | `generate-config.sh` | Azaion REST API URL |
| `AZAION_API_EMAIL` | `generate-config.sh` | API login email |
| `AZAION_API_PASSWORD` | `generate-config.sh` | API login password |
| `RABBITMQ_HOST` | `generate-config.sh` | RabbitMQ host |
| `RABBITMQ_PORT` | `generate-config.sh` | RabbitMQ port |
| `RABBITMQ_USER` | `generate-config.sh` | RabbitMQ username |
| `RABBITMQ_PASSWORD` | `generate-config.sh` | RabbitMQ password |
| `RABBITMQ_QUEUE_NAME` | `generate-config.sh` | RabbitMQ queue name |
| `AZAION_ROOT_DIR` | `start-services.sh`, `health-check.sh` | Root data directory (default: `/azaion`) |
## Script Details
### deploy.sh
Main orchestrator: generates config, pulls images, stops old services, starts new ones, checks health.
```
./scripts/deploy.sh # Deploy latest version (local)
./scripts/deploy.sh --rollback # Rollback to previous version
./scripts/deploy.sh --local # Force local mode (skip SSH)
./scripts/deploy.sh --help # Show usage
```
Flow: `generate-config.sh``pull-images.sh``stop-services.sh``start-services.sh``health-check.sh`
When `DEPLOY_HOST` is set, commands execute over SSH on the remote server. Without it, runs locally.
### generate-config.sh
Generates `config.yaml` from environment variables, preserving the existing config format the codebase expects. Validates that all required variables are set before writing.
```
./scripts/generate-config.sh # Generate config.yaml
```
### pull-images.sh
Pulls Docker images for both deployable components from the configured registry.
```
./scripts/pull-images.sh # Pull images
```
Images pulled:
- `${DOCKER_REGISTRY}/azaion/training:${DOCKER_IMAGE_TAG}`
- `${DOCKER_REGISTRY}/azaion/annotation-queue:${DOCKER_IMAGE_TAG}`
### start-services.sh
Creates the `/azaion/` directory tree if needed, then runs `docker compose up -d`.
```
./scripts/start-services.sh # Start services
```
### stop-services.sh
Saves current image tags to `scripts/.previous-tags` for rollback, then stops and removes containers with a 30-second grace period.
```
./scripts/stop-services.sh # Stop services
```
### health-check.sh
Checks container status, GPU availability, disk usage, and queue offset. Returns exit code 0 (healthy) or 1 (unhealthy).
```
./scripts/health-check.sh # Run health check
```
Checks performed:
- Annotation queue and RabbitMQ containers running
- GPU available and temperature < 90°C
- Disk usage < 95% (warning at 80%)
- Queue offset file exists
## Common Properties
All scripts:
- Use `#!/bin/bash` with `set -euo pipefail`
- Support `--help` flag
- Source `.env` from project root if present
- Are idempotent
- Support remote execution via SSH (`DEPLOY_HOST` + `DEPLOY_USER`)