mirror of
https://github.com/azaion/loader.git
synced 2026-04-22 18:56:33 +00:00
Add E2E tests, fix bugs
Made-with: Cursor
This commit is contained in:
@@ -0,0 +1,98 @@
|
||||
# Core Models
|
||||
|
||||
## 1. High-Level Overview
|
||||
|
||||
**Purpose**: Provides shared constants, data models (Credentials, User, UnlockState), and the application-wide logging facility used by all other components.
|
||||
|
||||
**Architectural Pattern**: Shared kernel — foundational types and utilities with no business logic.
|
||||
|
||||
**Upstream dependencies**: None (leaf component)
|
||||
|
||||
**Downstream consumers**: Security, Resource Management, HTTP API
|
||||
|
||||
## 2. Internal Interfaces
|
||||
|
||||
### Interface: Constants
|
||||
|
||||
| Symbol | Type | Value / Signature |
|
||||
|-----------------------|------|----------------------------|
|
||||
| `CONFIG_FILE` | str | `"config.yaml"` |
|
||||
| `QUEUE_CONFIG_FILENAME`| str | `"secured-config.json"` |
|
||||
| `AI_ONNX_MODEL_FILE` | str | `"azaion.onnx"` |
|
||||
| `CDN_CONFIG` | str | `"cdn.yaml"` |
|
||||
| `MODELS_FOLDER` | str | `"models"` |
|
||||
| `SMALL_SIZE_KB` | int | `3` |
|
||||
| `ALIGNMENT_WIDTH` | int | `32` |
|
||||
| `log(str)` | cdef | INFO-level log via Loguru |
|
||||
| `logerror(str)` | cdef | ERROR-level log via Loguru |
|
||||
|
||||
### Interface: Credentials
|
||||
|
||||
| Method | Input | Output | Async | Error Types |
|
||||
|----------------|--------------------------|-------------|-------|-------------|
|
||||
| `__init__` | `str email, str password`| Credentials | No | — |
|
||||
|
||||
**Fields**: `email: str (public)`, `password: str (public)`
|
||||
|
||||
### Interface: User
|
||||
|
||||
| Method | Input | Output | Async | Error Types |
|
||||
|------------|-----------------------------------|--------|-------|-------------|
|
||||
| `__init__` | `str id, str email, RoleEnum role`| User | No | — |
|
||||
|
||||
**Enum: RoleEnum** — NONE(0), Operator(10), Validator(20), CompanionPC(30), Admin(40), ResourceUploader(50), ApiAdmin(1000)
|
||||
|
||||
### Interface: UnlockState
|
||||
|
||||
Python `str` enum: idle, authenticating, downloading_key, decrypting, loading_images, ready, error.
|
||||
|
||||
## 3. External API Specification
|
||||
|
||||
N/A — internal-only component.
|
||||
|
||||
## 4. Data Access Patterns
|
||||
|
||||
N/A — no persistent storage. All data is in-memory.
|
||||
|
||||
## 5. Implementation Details
|
||||
|
||||
**State Management**: Stateless — pure data definitions and a configured logger singleton.
|
||||
|
||||
**Key Dependencies**:
|
||||
|
||||
| Library | Version | Purpose |
|
||||
|---------|---------|--------------------------------|
|
||||
| loguru | 0.7.3 | Structured logging with rotation |
|
||||
|
||||
**Error Handling Strategy**: Logging functions never throw; they are the error-reporting mechanism.
|
||||
|
||||
## 6. Extensions and Helpers
|
||||
|
||||
None.
|
||||
|
||||
## 7. Caveats & Edge Cases
|
||||
|
||||
**Known limitations**:
|
||||
- `QUEUE_MAXSIZE`, `COMMANDS_QUEUE`, `ANNOTATIONS_QUEUE` are declared in `constants.pxd` but never defined — dead declarations
|
||||
- Log directory `Logs/` is hardcoded; not configurable via env var
|
||||
- `psutil` is in `requirements.txt` but not used by any module
|
||||
|
||||
## 8. Dependency Graph
|
||||
|
||||
**Must be implemented after**: —
|
||||
|
||||
**Can be implemented in parallel with**: Security (02), Resource Management (03)
|
||||
|
||||
**Blocks**: Security (02), Resource Management (03), HTTP API (04)
|
||||
|
||||
## 9. Logging Strategy
|
||||
|
||||
| Log Level | When | Example |
|
||||
|-----------|------|---------|
|
||||
| ERROR | `logerror()` calls | Forwarded from caller modules |
|
||||
| INFO | `log()` calls | Forwarded from caller modules |
|
||||
| DEBUG | Stdout filter includes DEBUG | Available for development |
|
||||
|
||||
**Log format**: `[HH:mm:ss LEVEL] message`
|
||||
|
||||
**Log storage**: File (`Logs/log_loader_{date}.txt`) + stdout (INFO/DEBUG) + stderr (WARNING+)
|
||||
@@ -0,0 +1,102 @@
|
||||
# Security
|
||||
|
||||
## 1. High-Level Overview
|
||||
|
||||
**Purpose**: Provides AES-256-CBC encryption/decryption, multiple key derivation strategies, and OS-specific hardware fingerprinting for machine-bound access control.
|
||||
|
||||
**Architectural Pattern**: Utility / Strategy — stateless static methods for crypto operations; hardware fingerprinting with caching.
|
||||
|
||||
**Upstream dependencies**: Core Models (01) — uses `Credentials` type, `constants.log()`
|
||||
|
||||
**Downstream consumers**: Resource Management (03) — `ApiClient` uses all Security and HardwareService methods
|
||||
|
||||
## 2. Internal Interfaces
|
||||
|
||||
### Interface: Security
|
||||
|
||||
| Method | Input | Output | Async | Error Types |
|
||||
|-----------------------------|----------------------------------------|--------|-------|-------------|
|
||||
| `encrypt_to` | `bytes input_bytes, str key` | bytes | No | cryptography errors |
|
||||
| `decrypt_to` | `bytes ciphertext_with_iv, str key` | bytes | No | cryptography errors |
|
||||
| `get_hw_hash` | `str hardware` | str | No | — |
|
||||
| `get_api_encryption_key` | `Credentials creds, str hardware_hash` | str | No | — |
|
||||
| `get_resource_encryption_key`| — | str | No | — |
|
||||
| `calc_hash` | `str key` | str | No | — |
|
||||
|
||||
All methods are `@staticmethod cdef` (Cython-only visibility).
|
||||
|
||||
### Interface: HardwareService
|
||||
|
||||
| Method | Input | Output | Async | Error Types |
|
||||
|---------------------|-------|--------|-------|---------------------|
|
||||
| `get_hardware_info` | — | str | No | subprocess errors |
|
||||
|
||||
`@staticmethod cdef` with module-level caching in `_CACHED_HW_INFO`.
|
||||
|
||||
## 3. External API Specification
|
||||
|
||||
N/A — internal-only component.
|
||||
|
||||
## 4. Data Access Patterns
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
| Data | Cache Type | TTL | Invalidation |
|
||||
|-----------------|-----------|----------|---------------|
|
||||
| Hardware info | In-memory (module global) | Process lifetime | Never (static hardware) |
|
||||
|
||||
## 5. Implementation Details
|
||||
|
||||
**Algorithmic Complexity**: All crypto operations are O(n) in input size.
|
||||
|
||||
**State Management**: HardwareService has one cached string; Security is fully stateless.
|
||||
|
||||
**Key Dependencies**:
|
||||
|
||||
| Library | Version | Purpose |
|
||||
|--------------|---------|--------------------------------------|
|
||||
| cryptography | 44.0.2 | AES-256-CBC cipher, PKCS7 padding |
|
||||
|
||||
**Error Handling Strategy**:
|
||||
- Crypto errors propagate to caller (no catch)
|
||||
- `subprocess.check_output` in HardwareService raises `CalledProcessError` on failure
|
||||
|
||||
**Key Derivation Hierarchy**:
|
||||
1. Hardware hash: `SHA-384("Azaion_{hw_string}_%$$$)0_")` → base64
|
||||
2. API encryption key: `SHA-384("{email}-{password}-{hw_hash}-#%@AzaionKey@%#---")` → base64 (per-user, per-machine)
|
||||
3. Resource encryption key: `SHA-384("-#%@AzaionKey@%#---234sdfklgvhjbnn")` → base64 (fixed, shared)
|
||||
4. AES key expansion: `SHA-256(string_key)` → 32-byte AES key (inside encrypt/decrypt)
|
||||
|
||||
## 6. Extensions and Helpers
|
||||
|
||||
None.
|
||||
|
||||
## 7. Caveats & Edge Cases
|
||||
|
||||
**Known limitations**:
|
||||
- `get_resource_encryption_key()` returns a fixed key — all users share the same resource encryption key
|
||||
- Hardware detection uses `shell=True` subprocess — injection risk if inputs were user-controlled (they are not)
|
||||
- Linux hardware detection may fail on systems without `lscpu`, `lspci`, or `/sys/block/sda`
|
||||
- Multiple GPUs: only the first GPU line is captured
|
||||
|
||||
**Potential race conditions**:
|
||||
- `_CACHED_HW_INFO` is a module global written without locking — concurrent first calls could race, but the result is idempotent
|
||||
|
||||
## 8. Dependency Graph
|
||||
|
||||
**Must be implemented after**: Core Models (01)
|
||||
|
||||
**Can be implemented in parallel with**: Resource Management (03) depends on this, so Security must be ready first
|
||||
|
||||
**Blocks**: Resource Management (03)
|
||||
|
||||
## 9. Logging Strategy
|
||||
|
||||
| Log Level | When | Example |
|
||||
|-----------|------|---------|
|
||||
| INFO | Hardware info gathered | `"Gathered hardware: CPU: ... GPU: ... Memory: ... DriveSerial: ..."` |
|
||||
| INFO | Cached hardware reuse | `"Using cached hardware info"` |
|
||||
|
||||
**Log format**: Via `constants.log()` — `[HH:mm:ss INFO] message`
|
||||
|
||||
**Log storage**: Same as Core Models logging configuration
|
||||
@@ -0,0 +1,131 @@
|
||||
# Resource Management
|
||||
|
||||
## 1. High-Level Overview
|
||||
|
||||
**Purpose**: Orchestrates authenticated resource download/upload using a binary-split scheme (small encrypted part via API, large part via CDN), CDN storage operations, and Docker image archive decryption/loading.
|
||||
|
||||
**Architectural Pattern**: Facade — `ApiClient` coordinates CDN, Security, and API calls behind a unified interface.
|
||||
|
||||
**Upstream dependencies**: Core Models (01) — constants, Credentials, User, RoleEnum; Security (02) — encryption, key derivation, hardware fingerprinting
|
||||
|
||||
**Downstream consumers**: HTTP API (04) — `main.py` uses `ApiClient` for all resource operations and `binary_split` for Docker unlock
|
||||
|
||||
## 2. Internal Interfaces
|
||||
|
||||
### Interface: ApiClient
|
||||
|
||||
| Method | Input | Output | Async | Error Types |
|
||||
|------------------------------|-----------------------------------------------------------|--------|-------|--------------------------------|
|
||||
| `set_credentials_from_dict` | `str email, str password` | — | No | API errors, YAML parse errors |
|
||||
| `login` | — | — | No | HTTPError, Exception |
|
||||
| `load_big_small_resource` | `str resource_name, str folder` | bytes | No | Exception (API, CDN, decrypt) |
|
||||
| `upload_big_small_resource` | `bytes resource, str resource_name, str folder` | — | No | Exception (API, CDN, encrypt) |
|
||||
| `upload_to_cdn` | `str bucket, str filename, bytes file_bytes` | — | No | Exception |
|
||||
| `download_from_cdn` | `str bucket, str filename` | bytes | No | Exception |
|
||||
|
||||
Cython-only methods (cdef): `set_credentials`, `set_token`, `get_user`, `request`, `list_files`, `check_resource`, `load_bytes`, `upload_file`, `load_big_file_cdn`
|
||||
|
||||
### Interface: CDNManager
|
||||
|
||||
| Method | Input | Output | Async | Error Types |
|
||||
|------------|----------------------------------------------|--------|-------|------------------|
|
||||
| `upload` | `str bucket, str filename, bytes file_bytes` | bool | No | boto3 exceptions |
|
||||
| `download` | `str folder, str filename` | bool | No | boto3 exceptions |
|
||||
|
||||
### Interface: binary_split (module-level functions)
|
||||
|
||||
| Function | Input | Output | Async | Error Types |
|
||||
|------------------------|-------------------------------------------------|--------|-------|-----------------------|
|
||||
| `download_key_fragment`| `str resource_api_url, str token` | bytes | No | requests.HTTPError |
|
||||
| `decrypt_archive` | `str encrypted_path, bytes key_fragment, str output_path` | — | No | crypto/IO errors |
|
||||
| `docker_load` | `str tar_path` | — | No | subprocess.CalledProcessError |
|
||||
| `check_images_loaded` | `str version` | bool | No | — |
|
||||
|
||||
## 3. External API Specification
|
||||
|
||||
N/A — this component is consumed by HTTP API (04), not directly exposed.
|
||||
|
||||
## 4. Data Access Patterns
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
| Data | Cache Type | TTL | Invalidation |
|
||||
|----------------------|---------------------|------------------|---------------------------------|
|
||||
| CDN config (cdn.yaml)| In-memory (CDNManager) | Process lifetime | On re-authentication |
|
||||
| JWT token | In-memory | Until 401/403 | Auto-refresh on auth error |
|
||||
| Big file parts | Local filesystem | Until version mismatch | Overwritten on new upload |
|
||||
|
||||
### Storage Estimates
|
||||
|
||||
| Location | Description | Growth Rate |
|
||||
|--------------------|------------------------------------|------------------------|
|
||||
| `{folder}/{name}.big` | Cached large resource parts | Per resource upload |
|
||||
| Logs/ | Loguru log files | ~daily rotation, 30-day retention |
|
||||
|
||||
## 5. Implementation Details
|
||||
|
||||
**State Management**: `ApiClient` is a stateful singleton (token, credentials, CDN manager). `binary_split` is stateless.
|
||||
|
||||
**Key Dependencies**:
|
||||
|
||||
| Library | Version | Purpose |
|
||||
|--------------|---------|--------------------------------------|
|
||||
| requests | 2.32.4 | HTTP client for API calls |
|
||||
| pyjwt | 2.10.1 | JWT token decoding (no verification) |
|
||||
| boto3 | 1.40.9 | S3-compatible CDN operations |
|
||||
| pyyaml | 6.0.2 | CDN config parsing |
|
||||
| cryptography | 44.0.2 | AES-256-CBC for archive decryption |
|
||||
|
||||
**Error Handling Strategy**:
|
||||
- `request()` auto-retries on 401/403 (re-login then retry once)
|
||||
- 500 errors raise `Exception` with response text
|
||||
- 409 (Conflict) errors raise with parsed ErrorCode/Message
|
||||
- CDN operations return bool (True/False) — swallow exceptions, log error
|
||||
- `binary_split` functions propagate all errors to caller
|
||||
|
||||
**Big/Small Resource Split Protocol**:
|
||||
- **Download**: small part (encrypted per-user+hw key) from API + big part from local cache or CDN → concatenate → decrypt with shared resource key
|
||||
- **Upload**: encrypt entire resource with shared key → split at `min(3KB, 30%)` → small part to API, big part to CDN + local copy
|
||||
|
||||
## 6. Extensions and Helpers
|
||||
|
||||
None.
|
||||
|
||||
## 7. Caveats & Edge Cases
|
||||
|
||||
**Known limitations**:
|
||||
- JWT token decoded without signature verification — trusts the API server
|
||||
- CDN manager initialization requires a successful encrypted download (bootstrapping: credentials must already work for the login call that precedes CDN config download)
|
||||
- `load_big_small_resource` attempts local cache first; on decrypt failure (version mismatch), silently falls through to CDN download — the error is logged but not surfaced to caller
|
||||
- `API_SERVICES` list in `binary_split` is hardcoded — adding a new service requires code change
|
||||
- `docker_load` and `check_images_loaded` shell out to Docker CLI — requires Docker CLI in the container
|
||||
|
||||
**Potential race conditions**:
|
||||
- `api_client` singleton in `main.py` is initialized without locking; concurrent first requests could create multiple instances (only one is kept)
|
||||
|
||||
**Performance bottlenecks**:
|
||||
- Large resource encryption/decryption is synchronous and in-memory
|
||||
- CDN downloads are synchronous (blocking the thread)
|
||||
|
||||
## 8. Dependency Graph
|
||||
|
||||
**Must be implemented after**: Core Models (01), Security (02)
|
||||
|
||||
**Can be implemented in parallel with**: —
|
||||
|
||||
**Blocks**: HTTP API (04)
|
||||
|
||||
## 9. Logging Strategy
|
||||
|
||||
| Log Level | When | Example |
|
||||
|-----------|------|---------|
|
||||
| INFO | File downloaded | `"Downloaded file: cdn.yaml, 1234 bytes"` |
|
||||
| INFO | File uploaded | `"Uploaded model.bin to api.azaion.com/models successfully: 200."` |
|
||||
| INFO | CDN operation | `"downloaded model.big from the models"` |
|
||||
| INFO | Big file check | `"checking on existence for models/model.big"` |
|
||||
| ERROR | Upload failure | `"Upload fail: ConnectionError(...)"` |
|
||||
| ERROR | API error | `"{'ErrorCode': 409, 'Message': '...'}"` |
|
||||
|
||||
**Log format**: Via `constants.log()` / `constants.logerror()`
|
||||
|
||||
**Log storage**: Same as Core Models logging configuration
|
||||
@@ -0,0 +1,144 @@
|
||||
# HTTP API
|
||||
|
||||
## 1. High-Level Overview
|
||||
|
||||
**Purpose**: FastAPI application that exposes HTTP endpoints for health monitoring, user authentication, encrypted resource loading/uploading, and a background Docker image unlock workflow.
|
||||
|
||||
**Architectural Pattern**: Thin controller — delegates all business logic to Resource Management (03) and binary_split.
|
||||
|
||||
**Upstream dependencies**: Core Models (01) — UnlockState enum; Resource Management (03) — ApiClient, binary_split functions
|
||||
|
||||
**Downstream consumers**: None — this is the system entry point, consumed by external HTTP clients.
|
||||
|
||||
## 2. Internal Interfaces
|
||||
|
||||
### Interface: Module-level Functions
|
||||
|
||||
| Function | Input | Output | Description |
|
||||
|-------------------|---------------------------------|----------------|---------------------------------|
|
||||
| `get_api_client` | — | ApiClient | Lazy singleton accessor |
|
||||
| `_run_unlock` | `str email, str password` | — | Background task: full unlock flow |
|
||||
|
||||
## 3. External API Specification
|
||||
|
||||
| Endpoint | Method | Auth | Rate Limit | Description |
|
||||
|--------------------|--------|----------|------------|------------------------------------------|
|
||||
| `/health` | GET | Public | — | Liveness probe |
|
||||
| `/status` | GET | Public | — | Auth status + model cache dir |
|
||||
| `/login` | POST | Public | — | Set user credentials |
|
||||
| `/load/{filename}` | POST | Implicit | — | Download + decrypt resource |
|
||||
| `/upload/{filename}`| POST | Implicit | — | Encrypt + upload resource (big/small) |
|
||||
| `/unlock` | POST | Public | — | Start background Docker unlock |
|
||||
| `/unlock/status` | GET | Public | — | Poll unlock workflow progress |
|
||||
|
||||
"Implicit" auth = credentials must have been set via `/login` first; enforced by ApiClient's auto-login on token absence.
|
||||
|
||||
### Request/Response Schemas
|
||||
|
||||
**POST /login**
|
||||
```json
|
||||
// Request
|
||||
{"email": "user@example.com", "password": "secret"}
|
||||
// Response 200
|
||||
{"status": "ok"}
|
||||
// Response 401
|
||||
{"detail": "error message"}
|
||||
```
|
||||
|
||||
**POST /load/{filename}**
|
||||
```json
|
||||
// Request
|
||||
{"filename": "model.bin", "folder": "models"}
|
||||
// Response 200 — binary octet-stream
|
||||
// Response 500
|
||||
{"detail": "error message"}
|
||||
```
|
||||
|
||||
**POST /upload/{filename}**
|
||||
```
|
||||
// Request — multipart/form-data
|
||||
data: <file>
|
||||
folder: "models" (form field, default "models")
|
||||
// Response 200
|
||||
{"status": "ok"}
|
||||
```
|
||||
|
||||
**POST /unlock**
|
||||
```json
|
||||
// Request
|
||||
{"email": "user@example.com", "password": "secret"}
|
||||
// Response 200
|
||||
{"state": "authenticating"}
|
||||
// Response 404
|
||||
{"detail": "Encrypted archive not found"}
|
||||
```
|
||||
|
||||
**GET /unlock/status**
|
||||
```json
|
||||
// Response 200
|
||||
{"state": "decrypting", "error": null}
|
||||
```
|
||||
|
||||
## 4. Data Access Patterns
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
| Data | Cache Type | TTL | Invalidation |
|
||||
|---------------|---------------------|---------------|---------------------|
|
||||
| ApiClient | In-memory singleton | Process life | Never |
|
||||
| unlock_state | Module global | Until next unlock | State machine transition |
|
||||
|
||||
## 5. Implementation Details
|
||||
|
||||
**State Management**: Module-level globals (`api_client`, `unlock_state`, `unlock_error`) protected by `threading.Lock` for unlock state mutations.
|
||||
|
||||
**Key Dependencies**:
|
||||
|
||||
| Library | Version | Purpose |
|
||||
|----------------|---------|------------------------------|
|
||||
| fastapi | latest | HTTP framework |
|
||||
| uvicorn | latest | ASGI server |
|
||||
| pydantic | (via fastapi) | Request/response models |
|
||||
| python-multipart| latest | File upload support |
|
||||
|
||||
**Error Handling Strategy**:
|
||||
- `/login` — catches all exceptions, returns 401
|
||||
- `/load`, `/upload` — catches all exceptions, returns 500
|
||||
- `/unlock` — checks preconditions (archive exists, not already in progress), then delegates to background task
|
||||
- Background task (`_run_unlock`) catches all exceptions, sets `unlock_state = error` with error message
|
||||
|
||||
## 6. Extensions and Helpers
|
||||
|
||||
None.
|
||||
|
||||
## 7. Caveats & Edge Cases
|
||||
|
||||
**Known limitations**:
|
||||
- No authentication middleware — endpoints rely on prior `/login` call having set credentials on the singleton
|
||||
- `get_api_client()` uses a global without locking — race on first concurrent access
|
||||
- `/load/{filename}` has a path parameter `filename` but also takes `req.filename` from the body — the path param is unused
|
||||
- `_run_unlock` silently ignores `OSError` when removing tar file (acceptable cleanup behavior)
|
||||
|
||||
**Potential race conditions**:
|
||||
- `unlock_state` mutations are lock-protected, but `api_client` singleton creation is not
|
||||
- Concurrent `/unlock` calls: the lock check prevents duplicate starts, but there's a small TOCTOU window between the check and the `background_tasks.add_task` call
|
||||
|
||||
**Performance bottlenecks**:
|
||||
- `/load` and `/upload` are synchronous — large files block the worker thread
|
||||
- `_run_unlock` runs as a background task (single thread) — only one unlock can run at a time
|
||||
|
||||
## 8. Dependency Graph
|
||||
|
||||
**Must be implemented after**: Core Models (01), Resource Management (03)
|
||||
|
||||
**Can be implemented in parallel with**: —
|
||||
|
||||
**Blocks**: — (entry point)
|
||||
|
||||
## 9. Logging Strategy
|
||||
|
||||
No direct logging in this component — all logging is handled by downstream components via `constants.log()` / `constants.logerror()`.
|
||||
|
||||
**Log format**: N/A (delegates)
|
||||
|
||||
**Log storage**: N/A (delegates)
|
||||
Reference in New Issue
Block a user