mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 05:26:36 +00:00
aeb7f8ca8c
- Modified the existing-code workflow to automatically loop back to New Task after project completion without user confirmation. - Updated the autopilot state to reflect the current step as `done` and status as `completed`. - Clarified the deployment status report by specifying non-deployed services and their purposes. These changes enhance the automation of task management and improve documentation clarity.
5.9 KiB
5.9 KiB
Azaion AI Training — CI/CD Pipeline
Pipeline Overview
| Stage | Trigger | Quality Gate |
|---|---|---|
| Lint | Every push | Zero lint errors |
| Test | Every push | All tests pass |
| Security | Every push | Zero critical/high CVEs |
| Build | PR merge to dev | Docker build succeeds |
| Push | After build | Images pushed to registry |
| Deploy | Manual trigger | Health checks pass on target server |
No staging environment — the system runs on a dedicated GPU server. "Staging" is replaced by the test suite running in CI on CPU-only runners (annotation queue tests, unit tests) and manual GPU verification on the target machine.
Stage Details
Lint
black --check src/— Python formattingruff check src/— Python linting- Runs on standard CI runner (no GPU)
Test
- Framework:
pytest - Command:
pytest tests/ -v --tb=short - Test compose for annotation queue integration tests:
docker compose -f docker-compose.test.yml up --abort-on-container-exit - GPU-dependent tests (training, export) are excluded from CI — they require a physical GPU and run during manual verification on the target server
- Coverage report published as pipeline artifact
Security
- Dependency audit:
pip-audit -r requirements.txt - Dependency audit:
pip-audit -r src/annotation-queue/requirements.txt - SAST scan: Semgrep with
p/pythonruleset - Image scan: Trivy on built Docker images
- Block on: critical or high severity findings
Build
- Docker images built for both components:
docker/training.Dockerfile→azaion/training:<git-sha>docker/annotation-queue.Dockerfile→azaion/annotation-queue:<git-sha>
- Build cache: Docker layer cache via GitHub Actions cache
- Build runs on standard runner — no GPU needed for
docker build
Push
- Registry: configurable via
DOCKER_REGISTRYsecret (e.g., GitHub Container Registryghcr.io, or private registry) - Authentication: registry login via CI secrets (
DOCKER_REGISTRY_USER,DOCKER_REGISTRY_TOKEN)
Deploy
- Manual trigger only (workflow_dispatch) — training runs for days, unattended deploys are risky
- Deployment method: SSH to target GPU server, run deploy scripts (
scripts/deploy.sh) - Pre-deploy: pull new images, stop services gracefully
- Post-deploy: start services, run health check script
- Rollback:
scripts/deploy.sh --rollbackredeploys previous image tags
Pipeline Configuration (GitHub Actions)
name: CI/CD
on:
push:
branches: [dev, main]
pull_request:
branches: [dev]
workflow_dispatch:
inputs:
deploy_target:
description: "Deploy to target server"
required: true
type: boolean
default: false
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- run: pip install black ruff
- run: black --check src/
- run: ruff check src/
test:
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- run: pip install -r requirements-test.txt
- run: pytest tests/ -v --tb=short --ignore=tests/gpu
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- run: pip install pip-audit
- run: pip-audit -r requirements.txt || true
- run: pip-audit -r src/annotation-queue/requirements.txt
- uses: returntocorp/semgrep-action@v1
with:
config: p/python
build-and-push:
runs-on: ubuntu-latest
needs: [test, security]
if: github.ref == 'refs/heads/dev' && github.event_name == 'push'
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ${{ secrets.DOCKER_REGISTRY }}
username: ${{ secrets.DOCKER_REGISTRY_USER }}
password: ${{ secrets.DOCKER_REGISTRY_TOKEN }}
- uses: docker/build-push-action@v6
with:
context: .
file: docker/training.Dockerfile
push: true
tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/training:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
- uses: docker/build-push-action@v6
with:
context: .
file: docker/annotation-queue.Dockerfile
push: true
tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/annotation-queue:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy:
runs-on: ubuntu-latest
needs: build-and-push
if: github.event.inputs.deploy_target == 'true'
environment: production
steps:
- uses: actions/checkout@v4
- run: |
ssh ${{ secrets.DEPLOY_USER }}@${{ secrets.DEPLOY_HOST }} \
"cd /opt/azaion-training && \
DOCKER_IMAGE_TAG=${{ github.sha }} \
bash scripts/deploy.sh"
Caching Strategy
| Cache | Key | Restore Keys |
|---|---|---|
| pip dependencies | requirements.txt hash |
pip- prefix |
| Docker layers | GitHub Actions cache (BuildKit) | gha- prefix |
Parallelization
push event
├── lint ──► test ──┐
│ ├──► build-and-push ──► deploy (manual)
└── security ───────┘
Lint and security run in parallel. Test depends on lint. Build depends on both test and security passing.
Notifications
| Event | Channel | Recipients |
|---|---|---|
| Build failure | GitHub PR check | PR author |
| Security alert | GitHub security tab | Repository maintainers |
| Deploy success | GitHub Actions log | Deployment team |
| Deploy failure | GitHub Actions log + email | Deployment team |