mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-23 00:56:35 +00:00
aeb7f8ca8c
- Modified the existing-code workflow to automatically loop back to New Task after project completion without user confirmation. - Updated the autopilot state to reflect the current step as `done` and status as `completed`. - Clarified the deployment status report by specifying non-deployed services and their purposes. These changes enhance the automation of task management and improve documentation clarity.
185 lines
5.9 KiB
Markdown
185 lines
5.9 KiB
Markdown
# Azaion AI Training — CI/CD Pipeline
|
|
|
|
## Pipeline Overview
|
|
|
|
| Stage | Trigger | Quality Gate |
|
|
|-------|---------|-------------|
|
|
| Lint | Every push | Zero lint errors |
|
|
| Test | Every push | All tests pass |
|
|
| Security | Every push | Zero critical/high CVEs |
|
|
| Build | PR merge to dev | Docker build succeeds |
|
|
| Push | After build | Images pushed to registry |
|
|
| Deploy | Manual trigger | Health checks pass on target server |
|
|
|
|
No staging environment — the system runs on a dedicated GPU server. "Staging" is replaced by the test suite running in CI on CPU-only runners (annotation queue tests, unit tests) and manual GPU verification on the target machine.
|
|
|
|
## Stage Details
|
|
|
|
### Lint
|
|
|
|
- `black --check src/` — Python formatting
|
|
- `ruff check src/` — Python linting
|
|
- Runs on standard CI runner (no GPU)
|
|
|
|
### Test
|
|
|
|
- Framework: `pytest`
|
|
- Command: `pytest tests/ -v --tb=short`
|
|
- Test compose for annotation queue integration tests: `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
|
|
- GPU-dependent tests (training, export) are excluded from CI — they require a physical GPU and run during manual verification on the target server
|
|
- Coverage report published as pipeline artifact
|
|
|
|
### Security
|
|
|
|
- Dependency audit: `pip-audit -r requirements.txt`
|
|
- Dependency audit: `pip-audit -r src/annotation-queue/requirements.txt`
|
|
- SAST scan: Semgrep with `p/python` ruleset
|
|
- Image scan: Trivy on built Docker images
|
|
- Block on: critical or high severity findings
|
|
|
|
### Build
|
|
|
|
- Docker images built for both components:
|
|
- `docker/training.Dockerfile` → `azaion/training:<git-sha>`
|
|
- `docker/annotation-queue.Dockerfile` → `azaion/annotation-queue:<git-sha>`
|
|
- Build cache: Docker layer cache via GitHub Actions cache
|
|
- Build runs on standard runner — no GPU needed for `docker build`
|
|
|
|
### Push
|
|
|
|
- Registry: configurable via `DOCKER_REGISTRY` secret (e.g., GitHub Container Registry `ghcr.io`, or private registry)
|
|
- Authentication: registry login via CI secrets (`DOCKER_REGISTRY_USER`, `DOCKER_REGISTRY_TOKEN`)
|
|
|
|
### Deploy
|
|
|
|
- **Manual trigger only** (workflow_dispatch) — training runs for days, unattended deploys are risky
|
|
- Deployment method: SSH to target GPU server, run deploy scripts (`scripts/deploy.sh`)
|
|
- Pre-deploy: pull new images, stop services gracefully
|
|
- Post-deploy: start services, run health check script
|
|
- Rollback: `scripts/deploy.sh --rollback` redeploys previous image tags
|
|
|
|
## Pipeline Configuration (GitHub Actions)
|
|
|
|
```yaml
|
|
name: CI/CD
|
|
|
|
on:
|
|
push:
|
|
branches: [dev, main]
|
|
pull_request:
|
|
branches: [dev]
|
|
workflow_dispatch:
|
|
inputs:
|
|
deploy_target:
|
|
description: "Deploy to target server"
|
|
required: true
|
|
type: boolean
|
|
default: false
|
|
|
|
jobs:
|
|
lint:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: actions/setup-python@v5
|
|
with:
|
|
python-version: "3.10"
|
|
- run: pip install black ruff
|
|
- run: black --check src/
|
|
- run: ruff check src/
|
|
|
|
test:
|
|
runs-on: ubuntu-latest
|
|
needs: lint
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: actions/setup-python@v5
|
|
with:
|
|
python-version: "3.10"
|
|
- run: pip install -r requirements-test.txt
|
|
- run: pytest tests/ -v --tb=short --ignore=tests/gpu
|
|
|
|
security:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: actions/setup-python@v5
|
|
with:
|
|
python-version: "3.10"
|
|
- run: pip install pip-audit
|
|
- run: pip-audit -r requirements.txt || true
|
|
- run: pip-audit -r src/annotation-queue/requirements.txt
|
|
- uses: returntocorp/semgrep-action@v1
|
|
with:
|
|
config: p/python
|
|
|
|
build-and-push:
|
|
runs-on: ubuntu-latest
|
|
needs: [test, security]
|
|
if: github.ref == 'refs/heads/dev' && github.event_name == 'push'
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: docker/setup-buildx-action@v3
|
|
- uses: docker/login-action@v3
|
|
with:
|
|
registry: ${{ secrets.DOCKER_REGISTRY }}
|
|
username: ${{ secrets.DOCKER_REGISTRY_USER }}
|
|
password: ${{ secrets.DOCKER_REGISTRY_TOKEN }}
|
|
- uses: docker/build-push-action@v6
|
|
with:
|
|
context: .
|
|
file: docker/training.Dockerfile
|
|
push: true
|
|
tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/training:${{ github.sha }}
|
|
cache-from: type=gha
|
|
cache-to: type=gha,mode=max
|
|
- uses: docker/build-push-action@v6
|
|
with:
|
|
context: .
|
|
file: docker/annotation-queue.Dockerfile
|
|
push: true
|
|
tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/annotation-queue:${{ github.sha }}
|
|
cache-from: type=gha
|
|
cache-to: type=gha,mode=max
|
|
|
|
deploy:
|
|
runs-on: ubuntu-latest
|
|
needs: build-and-push
|
|
if: github.event.inputs.deploy_target == 'true'
|
|
environment: production
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- run: |
|
|
ssh ${{ secrets.DEPLOY_USER }}@${{ secrets.DEPLOY_HOST }} \
|
|
"cd /opt/azaion-training && \
|
|
DOCKER_IMAGE_TAG=${{ github.sha }} \
|
|
bash scripts/deploy.sh"
|
|
```
|
|
|
|
## Caching Strategy
|
|
|
|
| Cache | Key | Restore Keys |
|
|
|-------|-----|-------------|
|
|
| pip dependencies | `requirements.txt` hash | `pip-` prefix |
|
|
| Docker layers | GitHub Actions cache (BuildKit) | `gha-` prefix |
|
|
|
|
## Parallelization
|
|
|
|
```
|
|
push event
|
|
├── lint ──► test ──┐
|
|
│ ├──► build-and-push ──► deploy (manual)
|
|
└── security ───────┘
|
|
```
|
|
|
|
Lint and security run in parallel. Test depends on lint. Build depends on both test and security passing.
|
|
|
|
## Notifications
|
|
|
|
| Event | Channel | Recipients |
|
|
|-------|---------|-----------|
|
|
| Build failure | GitHub PR check | PR author |
|
|
| Security alert | GitHub security tab | Repository maintainers |
|
|
| Deploy success | GitHub Actions log | Deployment team |
|
|
| Deploy failure | GitHub Actions log + email | Deployment team |
|