mirror of
https://github.com/azaion/loader.git
synced 2026-04-22 08:06:34 +00:00
Add E2E tests, fix bugs
Made-with: Cursor
This commit is contained in:
@@ -0,0 +1,38 @@
|
||||
# Acceptance Criteria
|
||||
|
||||
## Functional Criteria
|
||||
|
||||
| # | Criterion | Measurable Target | Source |
|
||||
|---|-----------|-------------------|--------|
|
||||
| AC-1 | Health endpoint responds | GET `/health` returns `{"status": "healthy"}` with HTTP 200 | `main.py:54-55` |
|
||||
| AC-2 | Login sets credentials | POST `/login` with valid email/password returns `{"status": "ok"}` | `main.py:69-75` |
|
||||
| AC-3 | Login rejects invalid credentials | POST `/login` with bad credentials returns HTTP 401 | `main.py:74-75` |
|
||||
| AC-4 | Resource download returns decrypted bytes | POST `/load/{filename}` returns binary content (application/octet-stream) | `main.py:79-85` |
|
||||
| AC-5 | Resource upload succeeds | POST `/upload/{filename}` with file returns `{"status": "ok"}` | `main.py:89-100` |
|
||||
| AC-6 | Unlock starts background workflow | POST `/unlock` with credentials returns `{"state": "authenticating"}` | `main.py:158-181` |
|
||||
| AC-7 | Unlock detects already-loaded images | POST `/unlock` when images are loaded returns `{"state": "ready"}` | `main.py:163-164` |
|
||||
| AC-8 | Unlock status reports progress | GET `/unlock/status` returns current state and error | `main.py:184-187` |
|
||||
| AC-9 | Unlock completes full cycle | Background task transitions: authenticating → downloading_key → decrypting → loading_images → ready | `main.py:103-155` |
|
||||
| AC-10 | Unlock handles missing archive | POST `/unlock` when archive missing and images not loaded returns HTTP 404 | `main.py:168-174` |
|
||||
|
||||
## Security Criteria
|
||||
|
||||
| # | Criterion | Measurable Target | Source |
|
||||
|---|-----------|-------------------|--------|
|
||||
| AC-11 | Resources encrypted at rest | AES-256-CBC encryption with per-user or shared key | `security.pyx` |
|
||||
| AC-12 | Hardware-bound key derivation | API download key incorporates hardware fingerprint | `security.pyx:54-55` |
|
||||
| AC-13 | Binary split prevents single-source compromise | Small part on API + big part on CDN required for decryption | `api_client.pyx:166-186` |
|
||||
| AC-14 | JWT token obtained from trusted API | Login via POST to Azaion Resource API with credentials | `api_client.pyx:43-55` |
|
||||
| AC-15 | Auto-retry on expired token | 401/403 triggers re-login and retry | `api_client.pyx:140-146` |
|
||||
|
||||
## Operational Criteria
|
||||
|
||||
| # | Criterion | Measurable Target | Source |
|
||||
|---|-----------|-------------------|--------|
|
||||
| AC-16 | Docker images verified | All 7 API_SERVICES images checked via `docker image inspect` | `binary_split.py:60-69` |
|
||||
| AC-17 | Logs rotate daily | File sink rotates every 1 day, retains 30 days | `constants.pyx:19-26` |
|
||||
| AC-18 | Container builds on ARM64 | Woodpecker CI produces `loader:arm` image | `.woodpecker/build-arm.yml` |
|
||||
|
||||
## Non-Functional Criteria
|
||||
|
||||
No explicit performance targets (latency, throughput, concurrency) are defined in the codebase. Resource download/upload latency depends on file size and network conditions.
|
||||
@@ -0,0 +1,44 @@
|
||||
# Input Data Parameters
|
||||
|
||||
## API Request Schemas
|
||||
|
||||
### Login
|
||||
- `email`: string — user email address
|
||||
- `password`: string — user password (plaintext)
|
||||
|
||||
### Load Resource
|
||||
- `filename`: string — resource name (without `.big`/`.small` suffix)
|
||||
- `folder`: string — resource folder/bucket name
|
||||
|
||||
### Upload Resource
|
||||
- `data`: binary file (multipart upload)
|
||||
- `filename`: string — resource name (path parameter)
|
||||
- `folder`: string — destination folder (form field, defaults to `"models"`)
|
||||
|
||||
### Unlock
|
||||
- `email`: string — user email
|
||||
- `password`: string — user password
|
||||
|
||||
## Configuration Files
|
||||
|
||||
### cdn.yaml (downloaded encrypted from API)
|
||||
- `host`: string — S3 endpoint URL
|
||||
- `downloader_access_key`: string — read-only S3 access key
|
||||
- `downloader_access_secret`: string — read-only S3 secret key
|
||||
- `uploader_access_key`: string — write S3 access key
|
||||
- `uploader_access_secret`: string — write S3 secret key
|
||||
|
||||
## JWT Token Claims
|
||||
- `nameid`: string — user GUID
|
||||
- `unique_name`: string — user email
|
||||
- `role`: string — one of: ApiAdmin, Admin, ResourceUploader, Validator, Operator
|
||||
|
||||
## External Data Sources
|
||||
|
||||
| Source | Data | Format | Direction |
|
||||
|--------|------|--------|-----------|
|
||||
| Azaion Resource API | JWT tokens, encrypted resources (small parts), CDN config, key fragments | JSON / binary | Download |
|
||||
| S3 CDN | Large resource parts (.big files) | Binary | Upload / Download |
|
||||
| Local filesystem | Encrypted Docker archive (`images.enc`), cached `.big` files | Binary | Read / Write |
|
||||
| Docker daemon | Image loading, image inspection | CLI stdout | Read |
|
||||
| Host OS | Hardware fingerprint (CPU, GPU, RAM, drive serial) | Text (subprocess) | Read |
|
||||
@@ -0,0 +1,80 @@
|
||||
# Expected Results
|
||||
|
||||
Maps every input data item to its quantifiable expected result.
|
||||
Tests use this mapping to compare actual system output against known-correct answers.
|
||||
|
||||
## Result Format Legend
|
||||
|
||||
| Result Type | When to Use | Example |
|
||||
|-------------|-------------|---------|
|
||||
| Exact value | Output must match precisely | `status_code: 200`, `key: "healthy"` |
|
||||
| Threshold | Output must exceed or stay below a limit | `latency < 2000ms` |
|
||||
| Pattern match | Output must match a string/regex pattern | `error contains "invalid"` |
|
||||
| Schema match | Output structure must conform to a schema | `response has keys: status, authenticated, modelCacheDir` |
|
||||
|
||||
## Input → Expected Result Mapping
|
||||
|
||||
### Health & Status Endpoints
|
||||
|
||||
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|
||||
|---|-------|-------------------|-----------------|------------|-----------|---------------|
|
||||
| 1 | `GET /health` | Liveness probe, no auth needed | HTTP 200, body: `{"status": "healthy"}` | exact | N/A | N/A |
|
||||
| 2 | `GET /status` (no prior login) | Status before authentication | HTTP 200, body: `{"status": "healthy", "authenticated": false, "modelCacheDir": "models"}` | exact | N/A | N/A |
|
||||
| 3 | `GET /status` (after login) | Status after valid authentication | HTTP 200, body has `"authenticated": true` | exact (status), exact (authenticated field) | N/A | N/A |
|
||||
|
||||
### Authentication
|
||||
|
||||
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|
||||
|---|-------|-------------------|-----------------|------------|-----------|---------------|
|
||||
| 4 | `POST /login {"email": "valid@test.com", "password": "validpass"}` | Valid credentials | HTTP 200, body: `{"status": "ok"}` | exact | N/A | N/A |
|
||||
| 5 | `POST /login {"email": "bad@test.com", "password": "wrongpass"}` | Invalid credentials | HTTP 401, body has `"detail"` key with error string | exact (status), schema (body has detail) | N/A | N/A |
|
||||
| 6 | `POST /login {}` | Missing fields | HTTP 422 (validation error) | exact (status) | N/A | N/A |
|
||||
|
||||
### Resource Download
|
||||
|
||||
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|
||||
|---|-------|-------------------|-----------------|------------|-----------|---------------|
|
||||
| 7 | `POST /load/testfile {"filename": "testfile", "folder": "models"}` (after valid login) | Download existing resource | HTTP 200, Content-Type: `application/octet-stream`, body is non-empty bytes | exact (status), exact (content-type), threshold_min (body length > 0) | N/A | N/A |
|
||||
| 8 | `POST /load/nonexistent {"filename": "nonexistent", "folder": "models"}` (after valid login) | Download missing resource | HTTP 500, body has `"detail"` key | exact (status), schema (body has detail) | N/A | N/A |
|
||||
| 9 | `POST /load/testfile {"filename": "testfile", "folder": "models"}` (no login) | Download without authentication | HTTP 500, body has `"detail"` key (ApiClient has no credentials) | exact (status), schema (body has detail) | N/A | N/A |
|
||||
|
||||
### Resource Upload
|
||||
|
||||
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|
||||
|---|-------|-------------------|-----------------|------------|-----------|---------------|
|
||||
| 10 | `POST /upload/testfile` multipart: file=binary, folder="models" (after valid login) | Upload resource | HTTP 200, body: `{"status": "ok"}` | exact | N/A | N/A |
|
||||
| 11 | `POST /upload/testfile` no file attached | Upload without file | HTTP 422 (validation error) | exact (status) | N/A | N/A |
|
||||
|
||||
### Unlock Workflow
|
||||
|
||||
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|
||||
|---|-------|-------------------|-----------------|------------|-----------|---------------|
|
||||
| 12 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (archive exists, images not loaded) | Start unlock workflow | HTTP 200, body: `{"state": "authenticating"}` | exact | N/A | N/A |
|
||||
| 13 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (images already loaded) | Unlock when already ready | HTTP 200, body: `{"state": "ready"}` | exact | N/A | N/A |
|
||||
| 14 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (no archive, images not loaded) | Unlock without archive | HTTP 404, body has `"detail"` containing "Encrypted archive not found" | exact (status), substring (detail) | N/A | N/A |
|
||||
| 15 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (unlock already in progress) | Duplicate unlock request | HTTP 200, body has `"state"` field with current in-progress state | exact (status), schema (body has state) | N/A | N/A |
|
||||
| 16 | `GET /unlock/status` (unlock in progress) | Poll unlock status | HTTP 200, body: `{"state": "<current_state>", "error": null}` | exact (status), schema (body has state + error) | N/A | N/A |
|
||||
| 17 | `GET /unlock/status` (unlock failed) | Poll after failure | HTTP 200, body has `"state": "error"` and `"error"` is non-null string | exact (state), threshold_min (error string length > 0) | N/A | N/A |
|
||||
| 18 | `GET /unlock/status` (idle, no unlock started) | Poll before any unlock | HTTP 200, body: `{"state": "idle", "error": null}` | exact | N/A | N/A |
|
||||
|
||||
### Security — Encryption Round-Trip
|
||||
|
||||
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|
||||
|---|-------|-------------------|-----------------|------------|-----------|---------------|
|
||||
| 19 | encrypt_to(b"hello world", "testkey") then decrypt_to(result, "testkey") | Encrypt/decrypt round-trip | Decrypted output equals original: `b"hello world"` | exact | N/A | N/A |
|
||||
| 20 | decrypt_to(encrypted_bytes, "wrong_key") | Decrypt with wrong key | Raises exception or returns garbled data ≠ original | pattern (exception raised or output ≠ input) | N/A | N/A |
|
||||
|
||||
### Security — Key Derivation
|
||||
|
||||
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|
||||
|---|-------|-------------------|-----------------|------------|-----------|---------------|
|
||||
| 21 | get_resource_encryption_key() called twice | Deterministic shared key | Both calls return identical string | exact | N/A | N/A |
|
||||
| 22 | get_hw_hash("CPU: test") | Hardware hash derivation | Returns non-empty base64 string | threshold_min (length > 0), pattern (base64 charset) | N/A | N/A |
|
||||
| 23 | get_api_encryption_key(creds1, hw_hash) vs get_api_encryption_key(creds2, hw_hash) | Different credentials produce different keys | key1 ≠ key2 | exact (inequality) | N/A | N/A |
|
||||
|
||||
### Binary Split — Archive Decryption
|
||||
|
||||
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|
||||
|---|-------|-------------------|-----------------|------------|-----------|---------------|
|
||||
| 24 | decrypt_archive(test_encrypted_file, known_key, output_path) | Decrypt test archive | Output file matches original plaintext content | exact (file content) | N/A | N/A |
|
||||
| 25 | check_images_loaded("nonexistent-version") | Check for missing Docker images | Returns `False` | exact | N/A | N/A |
|
||||
@@ -0,0 +1,27 @@
|
||||
# Problem Statement
|
||||
|
||||
## What is this system?
|
||||
|
||||
Azaion.Loader is a secure resource distribution service for Azaion's edge computing platform. It runs on edge devices (ARM64) to manage the lifecycle of encrypted AI model resources and Docker service images.
|
||||
|
||||
## What problem does it solve?
|
||||
|
||||
Azaion distributes proprietary AI models and Docker-based services to edge devices deployed in the field. These assets must be:
|
||||
|
||||
1. **Protected in transit and at rest** — models and service images are intellectual property that must not be extractable if a device is compromised
|
||||
2. **Bound to authorized hardware** — decryption keys are derived from the device's hardware fingerprint, preventing resource extraction to unauthorized machines
|
||||
3. **Efficiently distributed** — large model files are split between an authenticated API (small encrypted part) and a CDN (large part), reducing API bandwidth costs while maintaining security
|
||||
4. **Self-service deployable** — edge devices need to authenticate, download, decrypt, and load Docker images autonomously via a single unlock workflow
|
||||
|
||||
## Who are the users?
|
||||
|
||||
- **Edge devices** — autonomous ARM64 systems running Azaion services (drones, companion PCs, ground stations)
|
||||
- **Operators/Admins** — human users who trigger authentication and unlock via HTTP API
|
||||
- **Other Azaion services** — co-located containers that call the loader API to fetch model resources
|
||||
|
||||
## How does it work (high level)?
|
||||
|
||||
1. A client authenticates via `/login` with email/password → the loader obtains a JWT from the Azaion Resource API
|
||||
2. For resource access: the loader downloads an encrypted "small" part from the API (using a per-user, per-machine key) and a "big" part from CDN, reassembles them, and decrypts with a shared resource key
|
||||
3. For initial deployment: the `/unlock` endpoint triggers a background workflow that downloads a key fragment, decrypts a pre-deployed encrypted Docker image archive, and loads all service images into the local Docker daemon
|
||||
4. All security-sensitive logic is compiled as Cython native extensions for IP protection
|
||||
@@ -0,0 +1,37 @@
|
||||
# Restrictions
|
||||
|
||||
## Hardware
|
||||
|
||||
| Restriction | Source | Details |
|
||||
|-------------|--------|---------|
|
||||
| ARM64 architecture | `.woodpecker/build-arm.yml` | CI builds ARM64-only Docker images |
|
||||
| Docker daemon access | `Dockerfile`, `main.py` | Requires Docker socket mount for `docker load` and `docker image inspect` |
|
||||
| Hardware fingerprint availability | `hardware_service.pyx` | Requires `lscpu`, `lspci`, `/sys/block/sda` on Linux; PowerShell on Windows |
|
||||
|
||||
## Software
|
||||
|
||||
| Restriction | Source | Details |
|
||||
|-------------|--------|---------|
|
||||
| Python 3.11 | `Dockerfile` | Base image is `python:3.11-slim` |
|
||||
| Cython 3.1.3 | `requirements.txt` | Pinned version for compilation |
|
||||
| GCC compiler | `Dockerfile` | Required at build time for Cython extension compilation |
|
||||
| Docker CLI | `Dockerfile` | `docker-ce-cli` installed inside the container |
|
||||
|
||||
## Environment
|
||||
|
||||
| Restriction | Source | Details |
|
||||
|-------------|--------|---------|
|
||||
| `RESOURCE_API_URL` env var | `main.py` | Defaults to `https://api.azaion.com` |
|
||||
| `IMAGES_PATH` env var | `main.py` | Defaults to `/opt/azaion/images.enc` — encrypted archive must be pre-deployed |
|
||||
| `API_VERSION` env var | `main.py` | Defaults to `latest` — determines expected Docker image tags |
|
||||
| CDN config file | `api_client.pyx` | `cdn.yaml` downloaded encrypted from API at credential setup time |
|
||||
| Network access | `api_client.pyx`, `cdn_manager.pyx` | Must reach Azaion Resource API and S3 CDN endpoint |
|
||||
|
||||
## Operational
|
||||
|
||||
| Restriction | Source | Details |
|
||||
|-------------|--------|---------|
|
||||
| Single instance | `main.py` | Module-level singleton `api_client` — not designed for multi-process deployment |
|
||||
| Synchronous I/O | `api_client.pyx` | Large file operations block the worker thread |
|
||||
| No horizontal scaling | Architecture | Stateful singleton pattern prevents running multiple replicas |
|
||||
| Log directory | `constants.pyx` | Hardcoded to `Logs/` — requires writable filesystem at that path |
|
||||
@@ -0,0 +1,68 @@
|
||||
# Security Approach
|
||||
|
||||
## Authentication
|
||||
|
||||
- **Mechanism**: JWT Bearer tokens issued by Azaion Resource API
|
||||
- **Token handling**: Decoded without signature verification (`options={"verify_signature": False}`) — trusts the API server
|
||||
- **Token refresh**: Automatic re-login on 401/403 responses (single retry)
|
||||
- **Credential storage**: In-memory only (Credentials object); not persisted to disk
|
||||
|
||||
## Authorization
|
||||
|
||||
- **Model**: Role-based (RoleEnum with 7 levels: NONE through ApiAdmin)
|
||||
- **Enforcement**: Roles are parsed from JWT and stored on the User object, but **no endpoint-level authorization is enforced** by the loader. All endpoints are accessible once credentials are set.
|
||||
|
||||
## Encryption
|
||||
|
||||
### Resource Encryption (binary-split scheme)
|
||||
- **Algorithm**: AES-256-CBC with PKCS7 padding
|
||||
- **Key expansion**: SHA-256 hash of string key → 32-byte AES key
|
||||
- **IV**: Random 16-byte IV prepended to ciphertext
|
||||
|
||||
### Key Derivation
|
||||
|
||||
| Key Type | Derivation | Scope |
|
||||
|----------|------------|-------|
|
||||
| API download key | `SHA-384(email + password + hw_hash + salt)` | Per-user, per-machine |
|
||||
| Hardware hash | `SHA-384("Azaion_" + hardware_fingerprint + salt)` | Per-machine |
|
||||
| Resource encryption key | `SHA-384(fixed_salt_string)` | Global (shared across all users) |
|
||||
| Archive decryption key | `SHA-256(key_fragment_from_api)` | Per-unlock operation |
|
||||
|
||||
### Binary Split
|
||||
- Resources encrypted with shared resource key, then split into:
|
||||
- **Small part** (≤3KB or 30%): uploaded to authenticated API
|
||||
- **Big part** (remainder): uploaded to CDN
|
||||
- Decryption requires both parts — compromise of either storage alone is insufficient
|
||||
|
||||
## Hardware Binding
|
||||
|
||||
- Hardware fingerprint: CPU model, GPU, memory size, drive serial number
|
||||
- Used to derive per-machine encryption keys for API resource downloads
|
||||
- Prevents extraction of downloaded resources to different hardware
|
||||
|
||||
## IP Protection
|
||||
|
||||
- Security-sensitive modules (security, api_client, credentials, etc.) are Cython `.pyx` files compiled to native `.so` extensions
|
||||
- Key derivation salts and logic are in compiled code, not readable Python
|
||||
|
||||
## Secrets Management
|
||||
|
||||
- CDN credentials stored in `cdn.yaml`, downloaded encrypted from the API
|
||||
- User credentials exist only in memory
|
||||
- JWT tokens exist only in memory
|
||||
- No `.env` file or secrets manager — environment variables for runtime config
|
||||
|
||||
## Input Validation
|
||||
|
||||
- Pydantic models validate request structure (LoginRequest, LoadRequest)
|
||||
- No additional input sanitization beyond Pydantic type checking
|
||||
- No rate limiting on any endpoint
|
||||
|
||||
## Known Security Gaps
|
||||
|
||||
1. JWT decoded without signature verification
|
||||
2. No endpoint-level authorization enforcement
|
||||
3. No rate limiting
|
||||
4. Resource encryption key is static/shared — not per-user
|
||||
5. `subprocess` with `shell=True` in hardware_service (not user-input-driven, but still a risk pattern)
|
||||
6. No HTTPS termination within the service (assumes reverse proxy or direct Docker network)
|
||||
Reference in New Issue
Block a user