Made-with: Cursor
9.0 KiB
Azaion.Loader — Architecture
1. System Context
Problem being solved: Azaion's suite of AI/drone services ships as encrypted Docker images. Edge devices need a secure way to authenticate, download encryption keys, decrypt the image archive, and load it into Docker — plus an ongoing mechanism to download and upload encrypted model resources (split into small+big parts for security and CDN offloading).
System boundaries:
- Inside: FastAPI service handling auth, resource management, and Docker image unlock
- Outside: Azaion Resource API, S3-compatible CDN, Docker daemon, external HTTP clients
External systems:
| System | Integration Type | Direction | Purpose |
|---|---|---|---|
| Azaion Resource API | REST (HTTPS) | Both | Authentication, resource download/upload, key fragment retrieval |
| S3-compatible CDN | S3 API (boto3) | Both | Large resource part storage |
| Docker daemon | CLI (subprocess) | Outbound | Load decrypted image archives, inspect images |
| Host OS | CLI (subprocess) | Inbound | Hardware fingerprint collection |
2. Technology Stack
| Layer | Technology | Version | Rationale |
|---|---|---|---|
| Language | Python + Cython | 3.11 / 3.1.3 | Cython for IP protection (compiled .so) + performance |
| Framework | FastAPI + Uvicorn | latest | Async HTTP, auto-generated OpenAPI docs |
| Database | None | — | Stateless service; all persistence is external |
| Cache | In-memory (module globals) | — | JWT token, hardware fingerprint, CDN config |
| Message Queue | None | — | Synchronous request-response only |
| Container | Docker (python:3.11-slim) | — | Docker CLI installed inside container for docker load |
| CI/CD | Woodpecker CI | — | ARM64 Docker builds pushed to local registry |
Key constraints:
- Must run on ARM64 edge devices
- Requires Docker-in-Docker (Docker socket mount) for image loading
- Cython compilation at build time —
.pyxfiles compiled to native extensions for IP protection
3. Deployment Model
Environments: Development (local), Production (edge devices)
Infrastructure:
- Containerized via Docker (single container)
- Runs on edge devices with Docker socket access
- No orchestration layer — standalone container
Environment-specific configuration:
| Config | Development | Production |
|---|---|---|
| RESOURCE_API_URL | https://api.azaion.com |
https://api.azaion.com (same) |
| IMAGES_PATH | /opt/azaion/images.enc |
/opt/azaion/images.enc |
| Secrets | Env vars / cdn.yaml | Env vars / cdn.yaml (encrypted) |
| Logging | stdout + stderr | File (Logs/) + stdout + stderr |
| Docker socket | Mounted from host | Mounted from host |
4. Data Model Overview
Core entities:
| Entity | Description | Owned By Component |
|---|---|---|
| Credentials | Email + password pair | 01 Core Models |
| User | Authenticated user with role | 01 Core Models |
| RoleEnum | Authorization role hierarchy | 01 Core Models |
| UnlockState | State machine for unlock workflow | 01 Core Models |
| CDNCredentials | S3 endpoint + read/write key pairs | 03 Resource Mgmt |
Key relationships:
- Credentials → User: login produces a User from JWT claims
- Credentials → CDNCredentials: credentials enable downloading the encrypted cdn.yaml config
Data flow summary:
- Client → Loader → Resource API: authentication, encrypted resource download (small part)
- Client → Loader → CDN: large resource part upload/download
- Client → Loader → Docker: decrypted image archive loading
5. Integration Points
Internal Communication
| From | To | Protocol | Pattern |
|---|---|---|---|
| HTTP API (04) | Resource Mgmt (03) | Direct call | Request-Response |
| Resource Mgmt | Security (02) | Direct call | Request-Response |
| Resource Mgmt | Core Models (01) | Direct call | Read constants |
External Integrations
| External System | Protocol | Auth | Rate Limits | Failure Mode |
|---|---|---|---|---|
| Azaion Resource API | REST/HTTPS | JWT Bearer | Unknown | Retry once on 401/403; raise on 500/409 |
| S3-compatible CDN | S3 API/HTTPS | Access key pair | Unknown | Return False, log error |
| Docker daemon | CLI/socket | Docker socket | — | Raise CalledProcessError |
6. Non-Functional Requirements
| Requirement | Target | Measurement | Priority |
|---|---|---|---|
| Availability | Service uptime | /health endpoint |
High |
| Latency (p95) | Varies by resource size | Per-request timing | Medium |
| Data retention | 30 days (logs) | Loguru rotation config | Low |
No explicit SLAs, throughput targets, or recovery objectives are defined in the codebase.
7. Security Architecture
Authentication: JWT Bearer tokens issued by Azaion Resource API. Tokens decoded without signature verification (trusts the API server).
Authorization: Role-based (RoleEnum: NONE → Operator → Validator → CompanionPC → Admin → ResourceUploader → ApiAdmin). Roles parsed from JWT but not enforced by Loader endpoints.
Data protection:
- At rest: AES-256-CBC encrypted resources on disk; Docker images stored as encrypted
.encarchive - In transit: HTTPS for API calls; S3 HTTPS for CDN
- Secrets management: CDN credentials stored in encrypted
cdn.yamldownloaded from API; user credentials in memory only
Key derivation:
- Per-user/per-machine keys:
SHA-384(email + password + hardware_hash + salt)→ used for API resource downloads - Shared resource key:
SHA-384(fixed_salt)→ used for big/small resource split encryption - Hardware binding:
SHA-384("Azaion_" + hardware_fingerprint + salt)→ ties decryption to specific hardware
Audit logging: Application-level logging via Loguru (file + stdout/stderr). No structured audit trail.
8. Key Architectural Decisions
ADR-001: Cython for IP Protection
Context: The loader handles encryption keys and security-sensitive logic that should not be trivially readable.
Decision: Core modules (api_client, security, cdn_manager, hardware_service, credentials, user, constants) are written in Cython and compiled to native .so extensions.
Alternatives considered:
- Pure Python with obfuscation — rejected because obfuscation is reversible
- Compiled language (Rust/Go) — rejected because of tighter integration needed with Python ecosystem (FastAPI, boto3)
Consequences: Build step required (setup.py build_ext --inplace); cdef methods not callable from pure Python; debugging compiled extensions is harder.
ADR-002: Binary-Split Resource Scheme
Context: Large model files need secure distribution. Storing entire encrypted files on one server creates a single point of compromise.
Decision: Resources are encrypted, then split into a small part (uploaded to the authenticated API) and a large part (uploaded to CDN). Decryption requires both parts.
Alternatives considered:
- Single encrypted download from API — rejected because of bandwidth/cost for large files
- Unencrypted CDN with signed URLs — rejected because CDN compromise would expose models
Consequences: More complex download/upload logic; local caching of big parts for performance; CDN credentials managed separately from API credentials.
ADR-003: Docker-in-Docker for Image Loading
Context: The loader needs to inject Docker images into the host Docker daemon on edge devices.
Decision: Mount Docker socket into the loader container; use Docker CLI (docker load, docker image inspect) via subprocess.
Alternatives considered:
- Docker API via Python library — rejected because Docker CLI is simpler and universally available
- Image loading outside the loader — rejected because the unlock workflow needs to be self-contained
Consequences: Container requires Docker socket mount (security implication); Docker CLI must be installed in the container image.