Files
loader/_docs/02_document/architecture.md
T
Oleksandr Bezdieniezhnykh 8f7deb3fca Add E2E tests, fix bugs
Made-with: Cursor
2026-04-13 05:17:48 +03:00

160 lines
9.0 KiB
Markdown

# Azaion.Loader — Architecture
## 1. System Context
**Problem being solved**: Azaion's suite of AI/drone services ships as encrypted Docker images. Edge devices need a secure way to authenticate, download encryption keys, decrypt the image archive, and load it into Docker — plus an ongoing mechanism to download and upload encrypted model resources (split into small+big parts for security and CDN offloading).
**System boundaries**:
- **Inside**: FastAPI service handling auth, resource management, and Docker image unlock
- **Outside**: Azaion Resource API, S3-compatible CDN, Docker daemon, external HTTP clients
**External systems**:
| System | Integration Type | Direction | Purpose |
|----------------------|------------------|-----------|--------------------------------------------|
| Azaion Resource API | REST (HTTPS) | Both | Authentication, resource download/upload, key fragment retrieval |
| S3-compatible CDN | S3 API (boto3) | Both | Large resource part storage |
| Docker daemon | CLI (subprocess) | Outbound | Load decrypted image archives, inspect images |
| Host OS | CLI (subprocess) | Inbound | Hardware fingerprint collection |
## 2. Technology Stack
| Layer | Technology | Version | Rationale |
|------------|-------------------------|----------|-----------------------------------------------------------|
| Language | Python + Cython | 3.11 / 3.1.3 | Cython for IP protection (compiled .so) + performance |
| Framework | FastAPI + Uvicorn | latest | Async HTTP, auto-generated OpenAPI docs |
| Database | None | — | Stateless service; all persistence is external |
| Cache | In-memory (module globals)| — | JWT token, hardware fingerprint, CDN config |
| Message Queue | None | — | Synchronous request-response only |
| Container | Docker (python:3.11-slim)| — | Docker CLI installed inside container for `docker load` |
| CI/CD | Woodpecker CI | — | ARM64 Docker builds pushed to local registry |
**Key constraints**:
- Must run on ARM64 edge devices
- Requires Docker-in-Docker (Docker socket mount) for image loading
- Cython compilation at build time — `.pyx` files compiled to native extensions for IP protection
## 3. Deployment Model
**Environments**: Development (local), Production (edge devices)
**Infrastructure**:
- Containerized via Docker (single container)
- Runs on edge devices with Docker socket access
- No orchestration layer — standalone container
**Environment-specific configuration**:
| Config | Development | Production |
|-----------------|------------------------------|---------------------------------|
| RESOURCE_API_URL| `https://api.azaion.com` | `https://api.azaion.com` (same) |
| IMAGES_PATH | `/opt/azaion/images.enc` | `/opt/azaion/images.enc` |
| Secrets | Env vars / cdn.yaml | Env vars / cdn.yaml (encrypted) |
| Logging | stdout + stderr | File (Logs/) + stdout + stderr |
| Docker socket | Mounted from host | Mounted from host |
## 4. Data Model Overview
**Core entities**:
| Entity | Description | Owned By Component |
|---------------|--------------------------------------|--------------------|
| Credentials | Email + password pair | 01 Core Models |
| User | Authenticated user with role | 01 Core Models |
| RoleEnum | Authorization role hierarchy | 01 Core Models |
| UnlockState | State machine for unlock workflow | 01 Core Models |
| CDNCredentials| S3 endpoint + read/write key pairs | 03 Resource Mgmt |
**Key relationships**:
- Credentials → User: login produces a User from JWT claims
- Credentials → CDNCredentials: credentials enable downloading the encrypted cdn.yaml config
**Data flow summary**:
- Client → Loader → Resource API: authentication, encrypted resource download (small part)
- Client → Loader → CDN: large resource part upload/download
- Client → Loader → Docker: decrypted image archive loading
## 5. Integration Points
### Internal Communication
| From | To | Protocol | Pattern |
|----------------|---------------------|--------------|------------------|
| HTTP API (04) | Resource Mgmt (03) | Direct call | Request-Response |
| Resource Mgmt | Security (02) | Direct call | Request-Response |
| Resource Mgmt | Core Models (01) | Direct call | Read constants |
### External Integrations
| External System | Protocol | Auth | Rate Limits | Failure Mode |
|----------------------|--------------|----------------|-------------|----------------------------------|
| Azaion Resource API | REST/HTTPS | JWT Bearer | Unknown | Retry once on 401/403; raise on 500/409 |
| S3-compatible CDN | S3 API/HTTPS | Access key pair| Unknown | Return False, log error |
| Docker daemon | CLI/socket | Docker socket | — | Raise CalledProcessError |
## 6. Non-Functional Requirements
| Requirement | Target | Measurement | Priority |
|-----------------|-----------------|--------------------------|----------|
| Availability | Service uptime | `/health` endpoint | High |
| Latency (p95) | Varies by resource size | Per-request timing | Medium |
| Data retention | 30 days (logs) | Loguru rotation config | Low |
No explicit SLAs, throughput targets, or recovery objectives are defined in the codebase.
## 7. Security Architecture
**Authentication**: JWT Bearer tokens issued by Azaion Resource API. Tokens decoded without signature verification (trusts the API server).
**Authorization**: Role-based (RoleEnum: NONE → Operator → Validator → CompanionPC → Admin → ResourceUploader → ApiAdmin). Roles parsed from JWT but not enforced by Loader endpoints.
**Data protection**:
- At rest: AES-256-CBC encrypted resources on disk; Docker images stored as encrypted `.enc` archive
- In transit: HTTPS for API calls; S3 HTTPS for CDN
- Secrets management: CDN credentials stored in encrypted `cdn.yaml` downloaded from API; user credentials in memory only
**Key derivation**:
- Per-user/per-machine keys: `SHA-384(email + password + hardware_hash + salt)` → used for API resource downloads
- Shared resource key: `SHA-384(fixed_salt)` → used for big/small resource split encryption
- Hardware binding: `SHA-384("Azaion_" + hardware_fingerprint + salt)` → ties decryption to specific hardware
**Audit logging**: Application-level logging via Loguru (file + stdout/stderr). No structured audit trail.
## 8. Key Architectural Decisions
### ADR-001: Cython for IP Protection
**Context**: The loader handles encryption keys and security-sensitive logic that should not be trivially readable.
**Decision**: Core modules (api_client, security, cdn_manager, hardware_service, credentials, user, constants) are written in Cython and compiled to native `.so` extensions.
**Alternatives considered**:
1. Pure Python with obfuscation — rejected because obfuscation is reversible
2. Compiled language (Rust/Go) — rejected because of tighter integration needed with Python ecosystem (FastAPI, boto3)
**Consequences**: Build step required (`setup.py build_ext --inplace`); `cdef` methods not callable from pure Python; debugging compiled extensions is harder.
### ADR-002: Binary-Split Resource Scheme
**Context**: Large model files need secure distribution. Storing entire encrypted files on one server creates a single point of compromise.
**Decision**: Resources are encrypted, then split into a small part (uploaded to the authenticated API) and a large part (uploaded to CDN). Decryption requires both parts.
**Alternatives considered**:
1. Single encrypted download from API — rejected because of bandwidth/cost for large files
2. Unencrypted CDN with signed URLs — rejected because CDN compromise would expose models
**Consequences**: More complex download/upload logic; local caching of big parts for performance; CDN credentials managed separately from API credentials.
### ADR-003: Docker-in-Docker for Image Loading
**Context**: The loader needs to inject Docker images into the host Docker daemon on edge devices.
**Decision**: Mount Docker socket into the loader container; use Docker CLI (`docker load`, `docker image inspect`) via subprocess.
**Alternatives considered**:
1. Docker API via Python library — rejected because Docker CLI is simpler and universally available
2. Image loading outside the loader — rejected because the unlock workflow needs to be self-contained
**Consequences**: Container requires Docker socket mount (security implication); Docker CLI must be installed in the container image.