Files
Oleksandr Bezdieniezhnykh 8f7deb3fca Add E2E tests, fix bugs
Made-with: Cursor
2026-04-13 05:17:48 +03:00

9.0 KiB

Azaion.Loader — Architecture

1. System Context

Problem being solved: Azaion's suite of AI/drone services ships as encrypted Docker images. Edge devices need a secure way to authenticate, download encryption keys, decrypt the image archive, and load it into Docker — plus an ongoing mechanism to download and upload encrypted model resources (split into small+big parts for security and CDN offloading).

System boundaries:

  • Inside: FastAPI service handling auth, resource management, and Docker image unlock
  • Outside: Azaion Resource API, S3-compatible CDN, Docker daemon, external HTTP clients

External systems:

System Integration Type Direction Purpose
Azaion Resource API REST (HTTPS) Both Authentication, resource download/upload, key fragment retrieval
S3-compatible CDN S3 API (boto3) Both Large resource part storage
Docker daemon CLI (subprocess) Outbound Load decrypted image archives, inspect images
Host OS CLI (subprocess) Inbound Hardware fingerprint collection

2. Technology Stack

Layer Technology Version Rationale
Language Python + Cython 3.11 / 3.1.3 Cython for IP protection (compiled .so) + performance
Framework FastAPI + Uvicorn latest Async HTTP, auto-generated OpenAPI docs
Database None Stateless service; all persistence is external
Cache In-memory (module globals) JWT token, hardware fingerprint, CDN config
Message Queue None Synchronous request-response only
Container Docker (python:3.11-slim) Docker CLI installed inside container for docker load
CI/CD Woodpecker CI ARM64 Docker builds pushed to local registry

Key constraints:

  • Must run on ARM64 edge devices
  • Requires Docker-in-Docker (Docker socket mount) for image loading
  • Cython compilation at build time — .pyx files compiled to native extensions for IP protection

3. Deployment Model

Environments: Development (local), Production (edge devices)

Infrastructure:

  • Containerized via Docker (single container)
  • Runs on edge devices with Docker socket access
  • No orchestration layer — standalone container

Environment-specific configuration:

Config Development Production
RESOURCE_API_URL https://api.azaion.com https://api.azaion.com (same)
IMAGES_PATH /opt/azaion/images.enc /opt/azaion/images.enc
Secrets Env vars / cdn.yaml Env vars / cdn.yaml (encrypted)
Logging stdout + stderr File (Logs/) + stdout + stderr
Docker socket Mounted from host Mounted from host

4. Data Model Overview

Core entities:

Entity Description Owned By Component
Credentials Email + password pair 01 Core Models
User Authenticated user with role 01 Core Models
RoleEnum Authorization role hierarchy 01 Core Models
UnlockState State machine for unlock workflow 01 Core Models
CDNCredentials S3 endpoint + read/write key pairs 03 Resource Mgmt

Key relationships:

  • Credentials → User: login produces a User from JWT claims
  • Credentials → CDNCredentials: credentials enable downloading the encrypted cdn.yaml config

Data flow summary:

  • Client → Loader → Resource API: authentication, encrypted resource download (small part)
  • Client → Loader → CDN: large resource part upload/download
  • Client → Loader → Docker: decrypted image archive loading

5. Integration Points

Internal Communication

From To Protocol Pattern
HTTP API (04) Resource Mgmt (03) Direct call Request-Response
Resource Mgmt Security (02) Direct call Request-Response
Resource Mgmt Core Models (01) Direct call Read constants

External Integrations

External System Protocol Auth Rate Limits Failure Mode
Azaion Resource API REST/HTTPS JWT Bearer Unknown Retry once on 401/403; raise on 500/409
S3-compatible CDN S3 API/HTTPS Access key pair Unknown Return False, log error
Docker daemon CLI/socket Docker socket Raise CalledProcessError

6. Non-Functional Requirements

Requirement Target Measurement Priority
Availability Service uptime /health endpoint High
Latency (p95) Varies by resource size Per-request timing Medium
Data retention 30 days (logs) Loguru rotation config Low

No explicit SLAs, throughput targets, or recovery objectives are defined in the codebase.

7. Security Architecture

Authentication: JWT Bearer tokens issued by Azaion Resource API. Tokens decoded without signature verification (trusts the API server).

Authorization: Role-based (RoleEnum: NONE → Operator → Validator → CompanionPC → Admin → ResourceUploader → ApiAdmin). Roles parsed from JWT but not enforced by Loader endpoints.

Data protection:

  • At rest: AES-256-CBC encrypted resources on disk; Docker images stored as encrypted .enc archive
  • In transit: HTTPS for API calls; S3 HTTPS for CDN
  • Secrets management: CDN credentials stored in encrypted cdn.yaml downloaded from API; user credentials in memory only

Key derivation:

  • Per-user/per-machine keys: SHA-384(email + password + hardware_hash + salt) → used for API resource downloads
  • Shared resource key: SHA-384(fixed_salt) → used for big/small resource split encryption
  • Hardware binding: SHA-384("Azaion_" + hardware_fingerprint + salt) → ties decryption to specific hardware

Audit logging: Application-level logging via Loguru (file + stdout/stderr). No structured audit trail.

8. Key Architectural Decisions

ADR-001: Cython for IP Protection

Context: The loader handles encryption keys and security-sensitive logic that should not be trivially readable.

Decision: Core modules (api_client, security, cdn_manager, hardware_service, credentials, user, constants) are written in Cython and compiled to native .so extensions.

Alternatives considered:

  1. Pure Python with obfuscation — rejected because obfuscation is reversible
  2. Compiled language (Rust/Go) — rejected because of tighter integration needed with Python ecosystem (FastAPI, boto3)

Consequences: Build step required (setup.py build_ext --inplace); cdef methods not callable from pure Python; debugging compiled extensions is harder.

ADR-002: Binary-Split Resource Scheme

Context: Large model files need secure distribution. Storing entire encrypted files on one server creates a single point of compromise.

Decision: Resources are encrypted, then split into a small part (uploaded to the authenticated API) and a large part (uploaded to CDN). Decryption requires both parts.

Alternatives considered:

  1. Single encrypted download from API — rejected because of bandwidth/cost for large files
  2. Unencrypted CDN with signed URLs — rejected because CDN compromise would expose models

Consequences: More complex download/upload logic; local caching of big parts for performance; CDN credentials managed separately from API credentials.

ADR-003: Docker-in-Docker for Image Loading

Context: The loader needs to inject Docker images into the host Docker daemon on edge devices.

Decision: Mount Docker socket into the loader container; use Docker CLI (docker load, docker image inspect) via subprocess.

Alternatives considered:

  1. Docker API via Python library — rejected because Docker CLI is simpler and universally available
  2. Image loading outside the loader — rejected because the unlock workflow needs to be self-contained

Consequences: Container requires Docker socket mount (security implication); Docker CLI must be installed in the container image.