mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 11:06:35 +00:00
Update autopilot workflow and documentation for project cycle completion
- Modified the existing-code workflow to automatically loop back to New Task after project completion without user confirmation. - Updated the autopilot state to reflect the current step as `done` and status as `completed`. - Clarified the deployment status report by specifying non-deployed services and their purposes. These changes enhance the automation of task management and improve documentation clarity.
This commit is contained in:
@@ -0,0 +1,184 @@
|
||||
# Azaion AI Training — CI/CD Pipeline
|
||||
|
||||
## Pipeline Overview
|
||||
|
||||
| Stage | Trigger | Quality Gate |
|
||||
|-------|---------|-------------|
|
||||
| Lint | Every push | Zero lint errors |
|
||||
| Test | Every push | All tests pass |
|
||||
| Security | Every push | Zero critical/high CVEs |
|
||||
| Build | PR merge to dev | Docker build succeeds |
|
||||
| Push | After build | Images pushed to registry |
|
||||
| Deploy | Manual trigger | Health checks pass on target server |
|
||||
|
||||
No staging environment — the system runs on a dedicated GPU server. "Staging" is replaced by the test suite running in CI on CPU-only runners (annotation queue tests, unit tests) and manual GPU verification on the target machine.
|
||||
|
||||
## Stage Details
|
||||
|
||||
### Lint
|
||||
|
||||
- `black --check src/` — Python formatting
|
||||
- `ruff check src/` — Python linting
|
||||
- Runs on standard CI runner (no GPU)
|
||||
|
||||
### Test
|
||||
|
||||
- Framework: `pytest`
|
||||
- Command: `pytest tests/ -v --tb=short`
|
||||
- Test compose for annotation queue integration tests: `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
|
||||
- GPU-dependent tests (training, export) are excluded from CI — they require a physical GPU and run during manual verification on the target server
|
||||
- Coverage report published as pipeline artifact
|
||||
|
||||
### Security
|
||||
|
||||
- Dependency audit: `pip-audit -r requirements.txt`
|
||||
- Dependency audit: `pip-audit -r src/annotation-queue/requirements.txt`
|
||||
- SAST scan: Semgrep with `p/python` ruleset
|
||||
- Image scan: Trivy on built Docker images
|
||||
- Block on: critical or high severity findings
|
||||
|
||||
### Build
|
||||
|
||||
- Docker images built for both components:
|
||||
- `docker/training.Dockerfile` → `azaion/training:<git-sha>`
|
||||
- `docker/annotation-queue.Dockerfile` → `azaion/annotation-queue:<git-sha>`
|
||||
- Build cache: Docker layer cache via GitHub Actions cache
|
||||
- Build runs on standard runner — no GPU needed for `docker build`
|
||||
|
||||
### Push
|
||||
|
||||
- Registry: configurable via `DOCKER_REGISTRY` secret (e.g., GitHub Container Registry `ghcr.io`, or private registry)
|
||||
- Authentication: registry login via CI secrets (`DOCKER_REGISTRY_USER`, `DOCKER_REGISTRY_TOKEN`)
|
||||
|
||||
### Deploy
|
||||
|
||||
- **Manual trigger only** (workflow_dispatch) — training runs for days, unattended deploys are risky
|
||||
- Deployment method: SSH to target GPU server, run deploy scripts (`scripts/deploy.sh`)
|
||||
- Pre-deploy: pull new images, stop services gracefully
|
||||
- Post-deploy: start services, run health check script
|
||||
- Rollback: `scripts/deploy.sh --rollback` redeploys previous image tags
|
||||
|
||||
## Pipeline Configuration (GitHub Actions)
|
||||
|
||||
```yaml
|
||||
name: CI/CD
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [dev, main]
|
||||
pull_request:
|
||||
branches: [dev]
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
deploy_target:
|
||||
description: "Deploy to target server"
|
||||
required: true
|
||||
type: boolean
|
||||
default: false
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.10"
|
||||
- run: pip install black ruff
|
||||
- run: black --check src/
|
||||
- run: ruff check src/
|
||||
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
needs: lint
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.10"
|
||||
- run: pip install -r requirements-test.txt
|
||||
- run: pytest tests/ -v --tb=short --ignore=tests/gpu
|
||||
|
||||
security:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.10"
|
||||
- run: pip install pip-audit
|
||||
- run: pip-audit -r requirements.txt || true
|
||||
- run: pip-audit -r src/annotation-queue/requirements.txt
|
||||
- uses: returntocorp/semgrep-action@v1
|
||||
with:
|
||||
config: p/python
|
||||
|
||||
build-and-push:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [test, security]
|
||||
if: github.ref == 'refs/heads/dev' && github.event_name == 'push'
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: docker/setup-buildx-action@v3
|
||||
- uses: docker/login-action@v3
|
||||
with:
|
||||
registry: ${{ secrets.DOCKER_REGISTRY }}
|
||||
username: ${{ secrets.DOCKER_REGISTRY_USER }}
|
||||
password: ${{ secrets.DOCKER_REGISTRY_TOKEN }}
|
||||
- uses: docker/build-push-action@v6
|
||||
with:
|
||||
context: .
|
||||
file: docker/training.Dockerfile
|
||||
push: true
|
||||
tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/training:${{ github.sha }}
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max
|
||||
- uses: docker/build-push-action@v6
|
||||
with:
|
||||
context: .
|
||||
file: docker/annotation-queue.Dockerfile
|
||||
push: true
|
||||
tags: ${{ secrets.DOCKER_REGISTRY }}/azaion/annotation-queue:${{ github.sha }}
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max
|
||||
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
needs: build-and-push
|
||||
if: github.event.inputs.deploy_target == 'true'
|
||||
environment: production
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- run: |
|
||||
ssh ${{ secrets.DEPLOY_USER }}@${{ secrets.DEPLOY_HOST }} \
|
||||
"cd /opt/azaion-training && \
|
||||
DOCKER_IMAGE_TAG=${{ github.sha }} \
|
||||
bash scripts/deploy.sh"
|
||||
```
|
||||
|
||||
## Caching Strategy
|
||||
|
||||
| Cache | Key | Restore Keys |
|
||||
|-------|-----|-------------|
|
||||
| pip dependencies | `requirements.txt` hash | `pip-` prefix |
|
||||
| Docker layers | GitHub Actions cache (BuildKit) | `gha-` prefix |
|
||||
|
||||
## Parallelization
|
||||
|
||||
```
|
||||
push event
|
||||
├── lint ──► test ──┐
|
||||
│ ├──► build-and-push ──► deploy (manual)
|
||||
└── security ───────┘
|
||||
```
|
||||
|
||||
Lint and security run in parallel. Test depends on lint. Build depends on both test and security passing.
|
||||
|
||||
## Notifications
|
||||
|
||||
| Event | Channel | Recipients |
|
||||
|-------|---------|-----------|
|
||||
| Build failure | GitHub PR check | PR author |
|
||||
| Security alert | GitHub security tab | Repository maintainers |
|
||||
| Deploy success | GitHub Actions log | Deployment team |
|
||||
| Deploy failure | GitHub Actions log + email | Deployment team |
|
||||
Reference in New Issue
Block a user