- Modified the existing-code workflow to automatically loop back to New Task after project completion without user confirmation. - Updated the autopilot state to reflect the current step as `done` and status as `completed`. - Clarified the deployment status report by specifying non-deployed services and their purposes. These changes enhance the automation of task management and improve documentation clarity.
5.6 KiB
Azaion AI Training — Environment Strategy
Environments
| Environment | Purpose | Infrastructure | Data Source |
|---|---|---|---|
| Development | Local developer workflow | docker-compose, local RabbitMQ | Test annotations, small sample dataset |
| Production | Live training on GPU server | Direct host processes or Docker, real RabbitMQ | Real annotations from Azaion platform |
No staging environment — the system is an ML training pipeline on a dedicated GPU server, not a multi-tier web service. Validation happens through the CI test suite (CPU tests) and manual verification on the GPU server before committing to a long training run.
Environment Variables
Required Variables
| Variable | Purpose | Dev Default | Prod Source |
|---|---|---|---|
AZAION_API_URL |
Azaion REST API base URL | https://api.azaion.com |
.env on server |
AZAION_API_EMAIL |
API login email | dev account | .env on server |
AZAION_API_PASSWORD |
API login password | dev password | .env on server |
RABBITMQ_HOST |
RabbitMQ host | 127.0.0.1 (local container) |
.env on server |
RABBITMQ_PORT |
RabbitMQ Streams port | 5552 |
.env on server |
RABBITMQ_USER |
RabbitMQ username | azaion_receiver |
.env on server |
RABBITMQ_PASSWORD |
RabbitMQ password | changeme |
.env on server |
RABBITMQ_QUEUE_NAME |
Queue name | azaion-annotations |
.env on server |
AZAION_ROOT_DIR |
Root data directory | /azaion |
.env on server |
AZAION_DATA_DIR |
Validated annotations dir name | data |
.env on server |
AZAION_DATA_SEED_DIR |
Unvalidated annotations dir name | data-seed |
.env on server |
AZAION_DATA_DELETED_DIR |
Deleted annotations dir name | data_deleted |
.env on server |
TRAINING_MODEL |
Base model filename | yolo26m.pt |
.env on server |
TRAINING_EPOCHS |
Training epochs | 120 |
.env on server |
TRAINING_BATCH_SIZE |
Training batch size | 11 |
.env on server |
TRAINING_IMGSZ |
Training image size | 1280 |
.env on server |
TRAINING_SAVE_PERIOD |
Checkpoint save interval | 1 |
.env on server |
TRAINING_WORKERS |
Dataloader workers | 24 |
.env on server |
EXPORT_ONNX_IMGSZ |
ONNX export image size | 1280 |
.env on server |
.env.example
Committed to version control with placeholder values. See .env.example in project root (created in Step 1).
Variable Validation
The config.yaml generation script (part of deploy scripts) validates that all required environment variables are set before writing the config file. Missing variables cause an immediate failure with a clear error listing which variables are absent.
Config Generation
The codebase reads configuration from config.yaml, not directly from environment variables. The deployment flow generates config.yaml from environment variables at deploy time:
.envcontains all variable values (never committed)- Deploy script sources
.envand rendersconfig.yamlfrom a template config.yamlis placed at the expected location for the application
This preserves the existing code's config reading pattern while externalizing secrets to environment variables.
Secrets Management
| Environment | Method | Location |
|---|---|---|
| Development | .env file (git-ignored) |
Project root |
| Production | .env file (restricted permissions) |
GPU server /opt/azaion-training/.env |
Production .env file:
- Ownership:
root:deploy(deploy user's group) - Permissions:
640(owner read/write, group read, others none) - Located outside the Docker build context
Secrets in this project:
AZAION_API_PASSWORD— API authenticationRABBITMQ_PASSWORD— message queue access- CDN credentials — auto-provisioned via API at runtime (encrypted
cdn.yaml), not in.env - Model encryption key — hardcoded in
security.py(existing pattern, flagged as security concern)
Rotation policy: rotate API and RabbitMQ passwords quarterly. Update .env on the server, restart affected services.
Filesystem Management
| Environment | /azaion/ Location |
Contents |
|---|---|---|
| Development | Docker volume or local dir | Test images, small sample labels |
| Production | Host directory /azaion/ |
Full annotation dataset, trained models, export artifacts |
The /azaion/ directory tree must exist before services start:
/azaion/
├── data/ (validated annotations: images/ + labels/)
├── data-seed/ (unvalidated annotations: images/ + labels/)
├── data_deleted/ (soft-deleted annotations: images/ + labels/)
├── datasets/ (formed training datasets: azaion-YYYY-MM-DD/)
├── models/ (trained models: azaion-YYYY-MM-DD/, azaion.pt)
└── classes.json (annotation class definitions)
Production data is persistent and never deleted by deployment. Docker containers mount this directory as a bind mount.
External Service Configuration
| Service | Dev | Prod |
|---|---|---|
| Azaion REST API | Real API (dev credentials) | Real API (prod credentials) |
| S3-compatible CDN | Auto-provisioned via API | Auto-provisioned via API |
| RabbitMQ | Local container (docker-compose) | Managed instance on network |
| NVIDIA GPU | Host GPU via --gpus all |
Host GPU via --gpus all |
CDN credentials are not in .env — they are fetched from the API at runtime as an encrypted cdn.yaml file, decrypted using the hardware-bound key. This is the existing pattern and does not need environment variable configuration.