# Azaion AI Training — Environment Strategy ## Environments | Environment | Purpose | Infrastructure | Data Source | |-------------|---------|---------------|-------------| | Development | Local developer workflow | docker-compose, local RabbitMQ | Test annotations, small sample dataset | | Production | Live training on GPU server | Direct host processes or Docker, real RabbitMQ | Real annotations from Azaion platform | No staging environment — the system is an ML training pipeline on a dedicated GPU server, not a multi-tier web service. Validation happens through the CI test suite (CPU tests) and manual verification on the GPU server before committing to a long training run. ## Environment Variables ### Required Variables | Variable | Purpose | Dev Default | Prod Source | |----------|---------|-------------|-------------| | `AZAION_API_URL` | Azaion REST API base URL | `https://api.azaion.com` | `.env` on server | | `AZAION_API_EMAIL` | API login email | dev account | `.env` on server | | `AZAION_API_PASSWORD` | API login password | dev password | `.env` on server | | `RABBITMQ_HOST` | RabbitMQ host | `127.0.0.1` (local container) | `.env` on server | | `RABBITMQ_PORT` | RabbitMQ Streams port | `5552` | `.env` on server | | `RABBITMQ_USER` | RabbitMQ username | `azaion_receiver` | `.env` on server | | `RABBITMQ_PASSWORD` | RabbitMQ password | `changeme` | `.env` on server | | `RABBITMQ_QUEUE_NAME` | Queue name | `azaion-annotations` | `.env` on server | | `AZAION_ROOT_DIR` | Root data directory | `/azaion` | `.env` on server | | `AZAION_DATA_DIR` | Validated annotations dir name | `data` | `.env` on server | | `AZAION_DATA_SEED_DIR` | Unvalidated annotations dir name | `data-seed` | `.env` on server | | `AZAION_DATA_DELETED_DIR` | Deleted annotations dir name | `data_deleted` | `.env` on server | | `TRAINING_MODEL` | Base model filename | `yolo26m.pt` | `.env` on server | | `TRAINING_EPOCHS` | Training epochs | `120` | `.env` on server | | `TRAINING_BATCH_SIZE` | Training batch size | `11` | `.env` on server | | `TRAINING_IMGSZ` | Training image size | `1280` | `.env` on server | | `TRAINING_SAVE_PERIOD` | Checkpoint save interval | `1` | `.env` on server | | `TRAINING_WORKERS` | Dataloader workers | `24` | `.env` on server | | `EXPORT_ONNX_IMGSZ` | ONNX export image size | `1280` | `.env` on server | ### `.env.example` Committed to version control with placeholder values. See `.env.example` in project root (created in Step 1). ### Variable Validation The `config.yaml` generation script (part of deploy scripts) validates that all required environment variables are set before writing the config file. Missing variables cause an immediate failure with a clear error listing which variables are absent. ## Config Generation The codebase reads configuration from `config.yaml`, not directly from environment variables. The deployment flow generates `config.yaml` from environment variables at deploy time: 1. `.env` contains all variable values (never committed) 2. Deploy script sources `.env` and renders `config.yaml` from a template 3. `config.yaml` is placed at the expected location for the application This preserves the existing code's config reading pattern while externalizing secrets to environment variables. ## Secrets Management | Environment | Method | Location | |-------------|--------|----------| | Development | `.env` file (git-ignored) | Project root | | Production | `.env` file (restricted permissions) | GPU server `/opt/azaion-training/.env` | Production `.env` file: - Ownership: `root:deploy` (deploy user's group) - Permissions: `640` (owner read/write, group read, others none) - Located outside the Docker build context Secrets in this project: - `AZAION_API_PASSWORD` — API authentication - `RABBITMQ_PASSWORD` — message queue access - CDN credentials — auto-provisioned via API at runtime (encrypted `cdn.yaml`), not in `.env` - Model encryption key — hardcoded in `security.py` (existing pattern, flagged as security concern) Rotation policy: rotate API and RabbitMQ passwords quarterly. Update `.env` on the server, restart affected services. ## Filesystem Management | Environment | `/azaion/` Location | Contents | |-------------|-------------------|----------| | Development | Docker volume or local dir | Test images, small sample labels | | Production | Host directory `/azaion/` | Full annotation dataset, trained models, export artifacts | The `/azaion/` directory tree must exist before services start: ``` /azaion/ ├── data/ (validated annotations: images/ + labels/) ├── data-seed/ (unvalidated annotations: images/ + labels/) ├── data_deleted/ (soft-deleted annotations: images/ + labels/) ├── datasets/ (formed training datasets: azaion-YYYY-MM-DD/) ├── models/ (trained models: azaion-YYYY-MM-DD/, azaion.pt) └── classes.json (annotation class definitions) ``` Production data is persistent and never deleted by deployment. Docker containers mount this directory as a bind mount. ## External Service Configuration | Service | Dev | Prod | |---------|-----|------| | Azaion REST API | Real API (dev credentials) | Real API (prod credentials) | | S3-compatible CDN | Auto-provisioned via API | Auto-provisioned via API | | RabbitMQ | Local container (docker-compose) | Managed instance on network | | NVIDIA GPU | Host GPU via `--gpus all` | Host GPU via `--gpus all` | CDN credentials are not in `.env` — they are fetched from the API at runtime as an encrypted `cdn.yaml` file, decrypted using the hardware-bound key. This is the existing pattern and does not need environment variable configuration.