mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 11:36:36 +00:00
Update .gitignore to include environment secrets and modify autopilot state documentation
- Added .env to .gitignore to prevent tracking of environment secrets. - Updated autopilot state in _docs/_autopilot_state.md: changed step from 9 (Implement) to 13 (Deploy) and updated sub-step to 1 — Status & Env Check. These changes enhance security by ignoring sensitive files and improve clarity in task management.
This commit is contained in:
@@ -0,0 +1,37 @@
|
|||||||
|
# Azaion AI Training — Environment Variables
|
||||||
|
# Copy to .env and fill in real values. Never commit .env to version control.
|
||||||
|
|
||||||
|
# ── Azaion REST API ──
|
||||||
|
AZAION_API_URL=https://api.azaion.com
|
||||||
|
AZAION_API_EMAIL=uploader@azaion.com
|
||||||
|
AZAION_API_PASSWORD=changeme
|
||||||
|
|
||||||
|
# ── RabbitMQ Streams ──
|
||||||
|
RABBITMQ_HOST=127.0.0.1
|
||||||
|
RABBITMQ_PORT=5552
|
||||||
|
RABBITMQ_USER=azaion_receiver
|
||||||
|
RABBITMQ_PASSWORD=changeme
|
||||||
|
RABBITMQ_QUEUE_NAME=azaion-annotations
|
||||||
|
|
||||||
|
# ── Filesystem paths ──
|
||||||
|
AZAION_ROOT_DIR=/azaion
|
||||||
|
AZAION_DATA_DIR=data
|
||||||
|
AZAION_DATA_SEED_DIR=data-seed
|
||||||
|
AZAION_DATA_DELETED_DIR=data_deleted
|
||||||
|
|
||||||
|
# ── Training parameters ──
|
||||||
|
TRAINING_MODEL=yolo26m.pt
|
||||||
|
TRAINING_EPOCHS=120
|
||||||
|
TRAINING_BATCH_SIZE=11
|
||||||
|
TRAINING_IMGSZ=1280
|
||||||
|
TRAINING_SAVE_PERIOD=1
|
||||||
|
TRAINING_WORKERS=24
|
||||||
|
|
||||||
|
# ── Export ──
|
||||||
|
EXPORT_ONNX_IMGSZ=1280
|
||||||
|
|
||||||
|
# ── Docker (deploy scripts) ──
|
||||||
|
DOCKER_REGISTRY=registry.example.com
|
||||||
|
DOCKER_IMAGE_TAG=latest
|
||||||
|
DEPLOY_HOST=gpu-server.example.com
|
||||||
|
DEPLOY_USER=deploy
|
||||||
@@ -23,5 +23,8 @@ venv
|
|||||||
*.jpeg
|
*.jpeg
|
||||||
*.png
|
*.png
|
||||||
|
|
||||||
|
# Environment secrets
|
||||||
|
.env
|
||||||
|
|
||||||
# Test results
|
# Test results
|
||||||
tests/test-results/
|
tests/test-results/
|
||||||
|
|||||||
@@ -0,0 +1,71 @@
|
|||||||
|
# Deployment Status Report
|
||||||
|
|
||||||
|
**Date**: 2026-03-28
|
||||||
|
|
||||||
|
## Component Readiness
|
||||||
|
|
||||||
|
| Component | Status | Entry Point | Runtime | Host Requirements |
|
||||||
|
|-----------|--------|-------------|---------|-------------------|
|
||||||
|
| Training Pipeline | Implemented & Tested | `train.py` | Long-running (days) | GPU server, RTX 4090 (24GB VRAM) |
|
||||||
|
| Annotation Queue | Implemented & Tested | `annotation-queue/annotation_queue_handler.py` | Continuous (async) | Any server with network access |
|
||||||
|
| Inference Engine | Implemented & Tested | `start_inference.py` | On-demand | GPU-equipped machine |
|
||||||
|
| Data Tools | Implemented | `convert-annotations.py`, `dataset-visualiser.py` | Ad-hoc | Developer machine |
|
||||||
|
|
||||||
|
Note: Augmentation is not a separate process — it is YOLO's built-in mosaic/mixup within the training pipeline.
|
||||||
|
|
||||||
|
## External Dependencies
|
||||||
|
|
||||||
|
| Dependency | Protocol | Current Config | Required |
|
||||||
|
|-----------|----------|---------------|----------|
|
||||||
|
| Azaion REST API | HTTPS (JWT) | `config.yaml` → `api.url` | API URL, email, password |
|
||||||
|
| S3-compatible CDN | HTTPS (S3 API) | Encrypted `cdn.yaml` fetched from API at runtime | CDN credentials auto-provisioned via API |
|
||||||
|
| RabbitMQ Streams | AMQP (rstream) | `config.yaml` → `queue.*` | Host, port, user, password, queue name |
|
||||||
|
| Local Filesystem | POSIX | `/azaion/` hardcoded root | Writable `/azaion/` directory tree |
|
||||||
|
| NVIDIA GPU | CUDA 12.1 | Host driver | NVIDIA driver + CUDA 12.1 + TensorRT |
|
||||||
|
|
||||||
|
## Infrastructure Prerequisites
|
||||||
|
|
||||||
|
| Prerequisite | Status | Notes |
|
||||||
|
|-------------|--------|-------|
|
||||||
|
| Container registry | Not set up | Needed for Docker-based deployment |
|
||||||
|
| CI/CD pipeline | Not set up | Manual deployment currently |
|
||||||
|
| DNS / SSL | Not applicable | Runs on GPU server, no public endpoint |
|
||||||
|
| GPU server access | Assumed available | RTX 4090 with 24GB VRAM |
|
||||||
|
| `/azaion/` directory tree | Must exist on host | Created manually or via deploy script |
|
||||||
|
|
||||||
|
## Deployment Blockers
|
||||||
|
|
||||||
|
None. All components are implemented and tested. Current deployment is manual (copy files to server, run scripts).
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
All configuration is currently in `config.yaml` (committed with credentials). The deployment plan will externalize these into environment variables.
|
||||||
|
|
||||||
|
| Variable | Source | Description |
|
||||||
|
|----------|--------|-------------|
|
||||||
|
| `AZAION_API_URL` | `config.yaml` → `api.url` | Azaion REST API base URL |
|
||||||
|
| `AZAION_API_EMAIL` | `config.yaml` → `api.email` | API login email |
|
||||||
|
| `AZAION_API_PASSWORD` | `config.yaml` → `api.password` | API login password |
|
||||||
|
| `RABBITMQ_HOST` | `config.yaml` → `queue.host` | RabbitMQ host |
|
||||||
|
| `RABBITMQ_PORT` | `config.yaml` → `queue.port` | RabbitMQ port |
|
||||||
|
| `RABBITMQ_USER` | `config.yaml` → `queue.consumer_user` | RabbitMQ username |
|
||||||
|
| `RABBITMQ_PASSWORD` | `config.yaml` → `queue.consumer_pw` | RabbitMQ password |
|
||||||
|
| `RABBITMQ_QUEUE_NAME` | `config.yaml` → `queue.name` | RabbitMQ queue name |
|
||||||
|
| `AZAION_ROOT_DIR` | `config.yaml` → `dirs.root` | Root data directory |
|
||||||
|
| `AZAION_DATA_DIR` | `config.yaml` → `dirs.data` | Validated annotations directory name |
|
||||||
|
| `AZAION_DATA_SEED_DIR` | `config.yaml` → `dirs.data_seed` | Unvalidated annotations directory name |
|
||||||
|
| `AZAION_DATA_DELETED_DIR` | `config.yaml` → `dirs.data_deleted` | Deleted annotations directory name |
|
||||||
|
| `TRAINING_MODEL` | `config.yaml` → `training.model` | Base model filename |
|
||||||
|
| `TRAINING_EPOCHS` | `config.yaml` → `training.epochs` | Training epochs |
|
||||||
|
| `TRAINING_BATCH_SIZE` | `config.yaml` → `training.batch` | Training batch size |
|
||||||
|
| `TRAINING_IMGSZ` | `config.yaml` → `training.imgsz` | Training image size |
|
||||||
|
| `TRAINING_SAVE_PERIOD` | `config.yaml` → `training.save_period` | Checkpoint save interval |
|
||||||
|
| `TRAINING_WORKERS` | `config.yaml` → `training.workers` | Dataloader workers |
|
||||||
|
| `EXPORT_ONNX_IMGSZ` | `config.yaml` → `export.onnx_imgsz` | ONNX export image size |
|
||||||
|
|
||||||
|
## Recommendation
|
||||||
|
|
||||||
|
The current codebase reads all configuration from `config.yaml`. The deployment plan should:
|
||||||
|
1. Keep `config.yaml` as the single config file (existing pattern)
|
||||||
|
2. Generate `config.yaml` from environment variables at deploy time via a template
|
||||||
|
3. Never commit real credentials — use `.env` files (git-ignored) or a secret manager
|
||||||
@@ -2,8 +2,8 @@
|
|||||||
|
|
||||||
## Current Step
|
## Current Step
|
||||||
flow: existing-code
|
flow: existing-code
|
||||||
step: 9
|
step: 13
|
||||||
name: Implement
|
name: Deploy
|
||||||
status: in_progress
|
status: in_progress
|
||||||
sub_step: 0
|
sub_step: 1 — Status & Env Check
|
||||||
retry_count: 0
|
retry_count: 0
|
||||||
|
|||||||
Reference in New Issue
Block a user