mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 05:26:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh aeb7f8ca8c Update autopilot workflow and documentation for project cycle completion

- Modified the existing-code workflow to automatically loop back to New Task after project completion without user confirmation.
- Updated the autopilot state to reflect the current step as `done` and status as `completed`.
- Clarified the deployment status report by specifying non-deployed services and their purposes.

These changes enhance the automation of task management and improve documentation clarity.

2026-03-29 05:02:22 +03:00

5.9 KiB

Raw Blame History

Azaion AI Training — CI/CD Pipeline

Pipeline Overview

Stage	Trigger	Quality Gate
Lint	Every push	Zero lint errors
Test	Every push	All tests pass
Security	Every push	Zero critical/high CVEs
Build	PR merge to dev	Docker build succeeds
Push	After build	Images pushed to registry
Deploy	Manual trigger	Health checks pass on target server

No staging environment — the system runs on a dedicated GPU server. "Staging" is replaced by the test suite running in CI on CPU-only runners (annotation queue tests, unit tests) and manual GPU verification on the target machine.

Stage Details

Lint

black --check src/ — Python formatting
ruff check src/ — Python linting
Runs on standard CI runner (no GPU)

Test

Framework: pytest
Command: pytest tests/ -v --tb=short
Test compose for annotation queue integration tests: docker compose -f docker-compose.test.yml up --abort-on-container-exit
GPU-dependent tests (training, export) are excluded from CI — they require a physical GPU and run during manual verification on the target server
Coverage report published as pipeline artifact

Security

Dependency audit: pip-audit -r requirements.txt
Dependency audit: pip-audit -r src/annotation-queue/requirements.txt
SAST scan: Semgrep with p/python ruleset
Image scan: Trivy on built Docker images
Block on: critical or high severity findings

Build

Docker images built for both components:
- docker/training.Dockerfile → azaion/training:<git-sha>
- docker/annotation-queue.Dockerfile → azaion/annotation-queue:<git-sha>
Build cache: Docker layer cache via GitHub Actions cache
Build runs on standard runner — no GPU needed for docker build

Push

Registry: configurable via DOCKER_REGISTRY secret (e.g., GitHub Container Registry ghcr.io, or private registry)
Authentication: registry login via CI secrets (DOCKER_REGISTRY_USER, DOCKER_REGISTRY_TOKEN)

Deploy

Manual trigger only (workflow_dispatch) — training runs for days, unattended deploys are risky
Deployment method: SSH to target GPU server, run deploy scripts (scripts/deploy.sh)
Pre-deploy: pull new images, stop services gracefully
Post-deploy: start services, run health check script
Rollback: scripts/deploy.sh --rollback redeploys previous image tags

Pipeline Configuration (GitHub Actions)

name: CI/CD

on:
  push:
    branches: [dev, main]
  pull_request:
    branches: [dev]
  workflow_dispatch:
    inputs:
      deploy_target:
        description: "Deploy to target server"
        required: true
        type: boolean
        default: false

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - run: pip install black ruff
      - run: black --check src/
      - run: ruff check src/

  test:
    runs-on: ubuntu-latest
    needs: lint
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - run: pip install -r requirements-test.txt
      - run: pytest tests/ -v --tb=short --ignore=tests/gpu

  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - run: pip install pip-audit
      - run: pip-audit -r requirements.txt || true
      - run: pip-audit -r src/annotation-queue/requirements.txt
      - uses: returntocorp/semgrep-action@v1
        with:
          config: p/python

  build-and-push:
    runs-on: ubuntu-latest
    needs: [test, security]
    if: github.ref == 'refs/heads/dev' && github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ${{ secrets.DOCKER_REGISTRY }}
          username: ${{ secrets.DOCKER_REGISTRY_USER }}
          password: ${{ secrets.DOCKER_REGISTRY_TOKEN }}
      - uses: docker/build-push-action@v6
        with:
          context: .
          file: docker/training.Dockerfile
          push: true
          tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/training:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
      - uses: docker/build-push-action@v6
        with:
          context: .
          file: docker/annotation-queue.Dockerfile
          push: true
          tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/annotation-queue:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy:
    runs-on: ubuntu-latest
    needs: build-and-push
    if: github.event.inputs.deploy_target == 'true'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - run: |
          ssh ${{ secrets.DEPLOY_USER }}@${{ secrets.DEPLOY_HOST }} \
            "cd /opt/azaion-training && \
             DOCKER_IMAGE_TAG=${{ github.sha }} \
             bash scripts/deploy.sh"

Caching Strategy

Cache	Key	Restore Keys
pip dependencies	`requirements.txt` hash	`pip-` prefix
Docker layers	GitHub Actions cache (BuildKit)	`gha-` prefix

Parallelization

push event
  ├── lint ──► test ──┐
  │                   ├──► build-and-push ──► deploy (manual)
  └── security ───────┘

Lint and security run in parallel. Test depends on lint. Build depends on both test and security passing.

Notifications

Event	Channel	Recipients
Build failure	GitHub PR check	PR author
Security alert	GitHub security tab	Repository maintainers
Deploy success	GitHub Actions log	Deployment team
Deploy failure	GitHub Actions log + email	Deployment team

5.9 KiB Raw Blame History