Files
ai-training/_docs/04_deploy/ci_cd_pipeline.md
T
Oleksandr Bezdieniezhnykh aeb7f8ca8c Update autopilot workflow and documentation for project cycle completion
- Modified the existing-code workflow to automatically loop back to New Task after project completion without user confirmation.
- Updated the autopilot state to reflect the current step as `done` and status as `completed`.
- Clarified the deployment status report by specifying non-deployed services and their purposes.

These changes enhance the automation of task management and improve documentation clarity.
2026-03-29 05:02:22 +03:00

5.9 KiB

Azaion AI Training — CI/CD Pipeline

Pipeline Overview

Stage Trigger Quality Gate
Lint Every push Zero lint errors
Test Every push All tests pass
Security Every push Zero critical/high CVEs
Build PR merge to dev Docker build succeeds
Push After build Images pushed to registry
Deploy Manual trigger Health checks pass on target server

No staging environment — the system runs on a dedicated GPU server. "Staging" is replaced by the test suite running in CI on CPU-only runners (annotation queue tests, unit tests) and manual GPU verification on the target machine.

Stage Details

Lint

  • black --check src/ — Python formatting
  • ruff check src/ — Python linting
  • Runs on standard CI runner (no GPU)

Test

  • Framework: pytest
  • Command: pytest tests/ -v --tb=short
  • Test compose for annotation queue integration tests: docker compose -f docker-compose.test.yml up --abort-on-container-exit
  • GPU-dependent tests (training, export) are excluded from CI — they require a physical GPU and run during manual verification on the target server
  • Coverage report published as pipeline artifact

Security

  • Dependency audit: pip-audit -r requirements.txt
  • Dependency audit: pip-audit -r src/annotation-queue/requirements.txt
  • SAST scan: Semgrep with p/python ruleset
  • Image scan: Trivy on built Docker images
  • Block on: critical or high severity findings

Build

  • Docker images built for both components:
    • docker/training.Dockerfileazaion/training:<git-sha>
    • docker/annotation-queue.Dockerfileazaion/annotation-queue:<git-sha>
  • Build cache: Docker layer cache via GitHub Actions cache
  • Build runs on standard runner — no GPU needed for docker build

Push

  • Registry: configurable via DOCKER_REGISTRY secret (e.g., GitHub Container Registry ghcr.io, or private registry)
  • Authentication: registry login via CI secrets (DOCKER_REGISTRY_USER, DOCKER_REGISTRY_TOKEN)

Deploy

  • Manual trigger only (workflow_dispatch) — training runs for days, unattended deploys are risky
  • Deployment method: SSH to target GPU server, run deploy scripts (scripts/deploy.sh)
  • Pre-deploy: pull new images, stop services gracefully
  • Post-deploy: start services, run health check script
  • Rollback: scripts/deploy.sh --rollback redeploys previous image tags

Pipeline Configuration (GitHub Actions)

name: CI/CD

on:
  push:
    branches: [dev, main]
  pull_request:
    branches: [dev]
  workflow_dispatch:
    inputs:
      deploy_target:
        description: "Deploy to target server"
        required: true
        type: boolean
        default: false

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - run: pip install black ruff
      - run: black --check src/
      - run: ruff check src/

  test:
    runs-on: ubuntu-latest
    needs: lint
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - run: pip install -r requirements-test.txt
      - run: pytest tests/ -v --tb=short --ignore=tests/gpu

  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - run: pip install pip-audit
      - run: pip-audit -r requirements.txt || true
      - run: pip-audit -r src/annotation-queue/requirements.txt
      - uses: returntocorp/semgrep-action@v1
        with:
          config: p/python

  build-and-push:
    runs-on: ubuntu-latest
    needs: [test, security]
    if: github.ref == 'refs/heads/dev' && github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ${{ secrets.DOCKER_REGISTRY }}
          username: ${{ secrets.DOCKER_REGISTRY_USER }}
          password: ${{ secrets.DOCKER_REGISTRY_TOKEN }}
      - uses: docker/build-push-action@v6
        with:
          context: .
          file: docker/training.Dockerfile
          push: true
          tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/training:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
      - uses: docker/build-push-action@v6
        with:
          context: .
          file: docker/annotation-queue.Dockerfile
          push: true
          tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/annotation-queue:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy:
    runs-on: ubuntu-latest
    needs: build-and-push
    if: github.event.inputs.deploy_target == 'true'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - run: |
          ssh ${{ secrets.DEPLOY_USER }}@${{ secrets.DEPLOY_HOST }} \
            "cd /opt/azaion-training && \
             DOCKER_IMAGE_TAG=${{ github.sha }} \
             bash scripts/deploy.sh"

Caching Strategy

Cache Key Restore Keys
pip dependencies requirements.txt hash pip- prefix
Docker layers GitHub Actions cache (BuildKit) gha- prefix

Parallelization

push event
  ├── lint ──► test ──┐
  │                   ├──► build-and-push ──► deploy (manual)
  └── security ───────┘

Lint and security run in parallel. Test depends on lint. Build depends on both test and security passing.

Notifications

Event Channel Recipients
Build failure GitHub PR check PR author
Security alert GitHub security tab Repository maintainers
Deploy success GitHub Actions log Deployment team
Deploy failure GitHub Actions log + email Deployment team