mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 08:16:36 +00:00
aeb7f8ca8c
- Modified the existing-code workflow to automatically loop back to New Task after project completion without user confirmation. - Updated the autopilot state to reflect the current step as `done` and status as `completed`. - Clarified the deployment status report by specifying non-deployed services and their purposes. These changes enhance the automation of task management and improve documentation clarity.
116 lines
4.0 KiB
Markdown
116 lines
4.0 KiB
Markdown
# Azaion AI Training — Deployment Scripts
|
|
|
|
## Overview
|
|
|
|
| Script | Purpose | Location |
|
|
|--------|---------|----------|
|
|
| `deploy.sh` | Main deployment orchestrator | `scripts/deploy.sh` |
|
|
| `generate-config.sh` | Generate `config.yaml` from environment variables | `scripts/generate-config.sh` |
|
|
| `pull-images.sh` | Pull Docker images from registry | `scripts/pull-images.sh` |
|
|
| `start-services.sh` | Start all services via Docker Compose | `scripts/start-services.sh` |
|
|
| `stop-services.sh` | Graceful shutdown with tag backup | `scripts/stop-services.sh` |
|
|
| `health-check.sh` | Verify deployment health | `scripts/health-check.sh` |
|
|
|
|
## Prerequisites
|
|
|
|
- Docker and Docker Compose installed on target machine
|
|
- NVIDIA driver + Docker GPU support (`nvidia-container-toolkit`)
|
|
- SSH access to target machine (for remote deployment)
|
|
- `.env` file with required environment variables (see `.env.example`)
|
|
|
|
## Environment Variables
|
|
|
|
All scripts source `.env` from the project root.
|
|
|
|
| Variable | Required By | Purpose |
|
|
|----------|------------|---------|
|
|
| `DEPLOY_HOST` | `deploy.sh` (remote) | SSH target for remote deployment |
|
|
| `DEPLOY_USER` | `deploy.sh` (remote) | SSH user (default: `deploy`) |
|
|
| `DOCKER_REGISTRY` | `pull-images.sh` | Container registry URL |
|
|
| `DOCKER_IMAGE_TAG` | `pull-images.sh` | Image version to deploy (default: `latest`) |
|
|
| `AZAION_API_URL` | `generate-config.sh` | Azaion REST API URL |
|
|
| `AZAION_API_EMAIL` | `generate-config.sh` | API login email |
|
|
| `AZAION_API_PASSWORD` | `generate-config.sh` | API login password |
|
|
| `RABBITMQ_HOST` | `generate-config.sh` | RabbitMQ host |
|
|
| `RABBITMQ_PORT` | `generate-config.sh` | RabbitMQ port |
|
|
| `RABBITMQ_USER` | `generate-config.sh` | RabbitMQ username |
|
|
| `RABBITMQ_PASSWORD` | `generate-config.sh` | RabbitMQ password |
|
|
| `RABBITMQ_QUEUE_NAME` | `generate-config.sh` | RabbitMQ queue name |
|
|
| `AZAION_ROOT_DIR` | `start-services.sh`, `health-check.sh` | Root data directory (default: `/azaion`) |
|
|
|
|
## Script Details
|
|
|
|
### deploy.sh
|
|
|
|
Main orchestrator: generates config, pulls images, stops old services, starts new ones, checks health.
|
|
|
|
```
|
|
./scripts/deploy.sh # Deploy latest version (local)
|
|
./scripts/deploy.sh --rollback # Rollback to previous version
|
|
./scripts/deploy.sh --local # Force local mode (skip SSH)
|
|
./scripts/deploy.sh --help # Show usage
|
|
```
|
|
|
|
Flow: `generate-config.sh` → `pull-images.sh` → `stop-services.sh` → `start-services.sh` → `health-check.sh`
|
|
|
|
When `DEPLOY_HOST` is set, commands execute over SSH on the remote server. Without it, runs locally.
|
|
|
|
### generate-config.sh
|
|
|
|
Generates `config.yaml` from environment variables, preserving the existing config format the codebase expects. Validates that all required variables are set before writing.
|
|
|
|
```
|
|
./scripts/generate-config.sh # Generate config.yaml
|
|
```
|
|
|
|
### pull-images.sh
|
|
|
|
Pulls Docker images for both deployable components from the configured registry.
|
|
|
|
```
|
|
./scripts/pull-images.sh # Pull images
|
|
```
|
|
|
|
Images pulled:
|
|
- `${DOCKER_REGISTRY}/azaion/training:${DOCKER_IMAGE_TAG}`
|
|
- `${DOCKER_REGISTRY}/azaion/annotation-queue:${DOCKER_IMAGE_TAG}`
|
|
|
|
### start-services.sh
|
|
|
|
Creates the `/azaion/` directory tree if needed, then runs `docker compose up -d`.
|
|
|
|
```
|
|
./scripts/start-services.sh # Start services
|
|
```
|
|
|
|
### stop-services.sh
|
|
|
|
Saves current image tags to `scripts/.previous-tags` for rollback, then stops and removes containers with a 30-second grace period.
|
|
|
|
```
|
|
./scripts/stop-services.sh # Stop services
|
|
```
|
|
|
|
### health-check.sh
|
|
|
|
Checks container status, GPU availability, disk usage, and queue offset. Returns exit code 0 (healthy) or 1 (unhealthy).
|
|
|
|
```
|
|
./scripts/health-check.sh # Run health check
|
|
```
|
|
|
|
Checks performed:
|
|
- Annotation queue and RabbitMQ containers running
|
|
- GPU available and temperature < 90°C
|
|
- Disk usage < 95% (warning at 80%)
|
|
- Queue offset file exists
|
|
|
|
## Common Properties
|
|
|
|
All scripts:
|
|
- Use `#!/bin/bash` with `set -euo pipefail`
|
|
- Support `--help` flag
|
|
- Source `.env` from project root if present
|
|
- Are idempotent
|
|
- Support remote execution via SSH (`DEPLOY_HOST` + `DEPLOY_USER`)
|