Add E2E tests, fix bugs

Made-with: Cursor
2026-04-22 08:06:34 +00:00 · 2026-04-13 05:17:48 +03:00
parent 1f98b5e958
commit 8f7deb3fca
71 changed files with 4740 additions and 29 deletions
@@ -0,0 +1,38 @@
+# Acceptance Criteria
+
+## Functional Criteria
+
+| # | Criterion | Measurable Target | Source |
+|---|-----------|-------------------|--------|
+| AC-1 | Health endpoint responds | GET `/health` returns `{"status": "healthy"}` with HTTP 200 | `main.py:54-55` |
+| AC-2 | Login sets credentials | POST `/login` with valid email/password returns `{"status": "ok"}` | `main.py:69-75` |
+| AC-3 | Login rejects invalid credentials | POST `/login` with bad credentials returns HTTP 401 | `main.py:74-75` |
+| AC-4 | Resource download returns decrypted bytes | POST `/load/{filename}` returns binary content (application/octet-stream) | `main.py:79-85` |
+| AC-5 | Resource upload succeeds | POST `/upload/{filename}` with file returns `{"status": "ok"}` | `main.py:89-100` |
+| AC-6 | Unlock starts background workflow | POST `/unlock` with credentials returns `{"state": "authenticating"}` | `main.py:158-181` |
+| AC-7 | Unlock detects already-loaded images | POST `/unlock` when images are loaded returns `{"state": "ready"}` | `main.py:163-164` |
+| AC-8 | Unlock status reports progress | GET `/unlock/status` returns current state and error | `main.py:184-187` |
+| AC-9 | Unlock completes full cycle | Background task transitions: authenticating → downloading_key → decrypting → loading_images → ready | `main.py:103-155` |
+| AC-10 | Unlock handles missing archive | POST `/unlock` when archive missing and images not loaded returns HTTP 404 | `main.py:168-174` |
+
+## Security Criteria
+
+| # | Criterion | Measurable Target | Source |
+|---|-----------|-------------------|--------|
+| AC-11 | Resources encrypted at rest | AES-256-CBC encryption with per-user or shared key | `security.pyx` |
+| AC-12 | Hardware-bound key derivation | API download key incorporates hardware fingerprint | `security.pyx:54-55` |
+| AC-13 | Binary split prevents single-source compromise | Small part on API + big part on CDN required for decryption | `api_client.pyx:166-186` |
+| AC-14 | JWT token obtained from trusted API | Login via POST to Azaion Resource API with credentials | `api_client.pyx:43-55` |
+| AC-15 | Auto-retry on expired token | 401/403 triggers re-login and retry | `api_client.pyx:140-146` |
+
+## Operational Criteria
+
+| # | Criterion | Measurable Target | Source |
+|---|-----------|-------------------|--------|
+| AC-16 | Docker images verified | All 7 API_SERVICES images checked via `docker image inspect` | `binary_split.py:60-69` |
+| AC-17 | Logs rotate daily | File sink rotates every 1 day, retains 30 days | `constants.pyx:19-26` |
+| AC-18 | Container builds on ARM64 | Woodpecker CI produces `loader:arm` image | `.woodpecker/build-arm.yml` |
+
+## Non-Functional Criteria
+
+No explicit performance targets (latency, throughput, concurrency) are defined in the codebase. Resource download/upload latency depends on file size and network conditions.
@@ -0,0 +1,44 @@
+# Input Data Parameters
+
+## API Request Schemas
+
+### Login
+- `email`: string — user email address
+- `password`: string — user password (plaintext)
+
+### Load Resource
+- `filename`: string — resource name (without `.big`/`.small` suffix)
+- `folder`: string — resource folder/bucket name
+
+### Upload Resource
+- `data`: binary file (multipart upload)
+- `filename`: string — resource name (path parameter)
+- `folder`: string — destination folder (form field, defaults to `"models"`)
+
+### Unlock
+- `email`: string — user email
+- `password`: string — user password
+
+## Configuration Files
+
+### cdn.yaml (downloaded encrypted from API)
+- `host`: string — S3 endpoint URL
+- `downloader_access_key`: string — read-only S3 access key
+- `downloader_access_secret`: string — read-only S3 secret key
+- `uploader_access_key`: string — write S3 access key
+- `uploader_access_secret`: string — write S3 secret key
+
+## JWT Token Claims
+- `nameid`: string — user GUID
+- `unique_name`: string — user email
+- `role`: string — one of: ApiAdmin, Admin, ResourceUploader, Validator, Operator
+
+## External Data Sources
+
+| Source | Data | Format | Direction |
+|--------|------|--------|-----------|
+| Azaion Resource API | JWT tokens, encrypted resources (small parts), CDN config, key fragments | JSON / binary | Download |
+| S3 CDN | Large resource parts (.big files) | Binary | Upload / Download |
+| Local filesystem | Encrypted Docker archive (`images.enc`), cached `.big` files | Binary | Read / Write |
+| Docker daemon | Image loading, image inspection | CLI stdout | Read |
+| Host OS | Hardware fingerprint (CPU, GPU, RAM, drive serial) | Text (subprocess) | Read |
@@ -0,0 +1,80 @@
+# Expected Results
+
+Maps every input data item to its quantifiable expected result.
+Tests use this mapping to compare actual system output against known-correct answers.
+
+## Result Format Legend
+
+| Result Type | When to Use | Example |
+|-------------|-------------|---------|
+| Exact value | Output must match precisely | `status_code: 200`, `key: "healthy"` |
+| Threshold | Output must exceed or stay below a limit | `latency < 2000ms` |
+| Pattern match | Output must match a string/regex pattern | `error contains "invalid"` |
+| Schema match | Output structure must conform to a schema | `response has keys: status, authenticated, modelCacheDir` |
+
+## Input → Expected Result Mapping
+
+### Health & Status Endpoints
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | `GET /health` | Liveness probe, no auth needed | HTTP 200, body: `{"status": "healthy"}` | exact | N/A | N/A |
+| 2 | `GET /status` (no prior login) | Status before authentication | HTTP 200, body: `{"status": "healthy", "authenticated": false, "modelCacheDir": "models"}` | exact | N/A | N/A |
+| 3 | `GET /status` (after login) | Status after valid authentication | HTTP 200, body has `"authenticated": true` | exact (status), exact (authenticated field) | N/A | N/A |
+
+### Authentication
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 4 | `POST /login {"email": "valid@test.com", "password": "validpass"}` | Valid credentials | HTTP 200, body: `{"status": "ok"}` | exact | N/A | N/A |
+| 5 | `POST /login {"email": "bad@test.com", "password": "wrongpass"}` | Invalid credentials | HTTP 401, body has `"detail"` key with error string | exact (status), schema (body has detail) | N/A | N/A |
+| 6 | `POST /login {}` | Missing fields | HTTP 422 (validation error) | exact (status) | N/A | N/A |
+
+### Resource Download
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 7 | `POST /load/testfile {"filename": "testfile", "folder": "models"}` (after valid login) | Download existing resource | HTTP 200, Content-Type: `application/octet-stream`, body is non-empty bytes | exact (status), exact (content-type), threshold_min (body length > 0) | N/A | N/A |
+| 8 | `POST /load/nonexistent {"filename": "nonexistent", "folder": "models"}` (after valid login) | Download missing resource | HTTP 500, body has `"detail"` key | exact (status), schema (body has detail) | N/A | N/A |
+| 9 | `POST /load/testfile {"filename": "testfile", "folder": "models"}` (no login) | Download without authentication | HTTP 500, body has `"detail"` key (ApiClient has no credentials) | exact (status), schema (body has detail) | N/A | N/A |
+
+### Resource Upload
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 10 | `POST /upload/testfile` multipart: file=binary, folder="models" (after valid login) | Upload resource | HTTP 200, body: `{"status": "ok"}` | exact | N/A | N/A |
+| 11 | `POST /upload/testfile` no file attached | Upload without file | HTTP 422 (validation error) | exact (status) | N/A | N/A |
+
+### Unlock Workflow
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 12 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (archive exists, images not loaded) | Start unlock workflow | HTTP 200, body: `{"state": "authenticating"}` | exact | N/A | N/A |
+| 13 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (images already loaded) | Unlock when already ready | HTTP 200, body: `{"state": "ready"}` | exact | N/A | N/A |
+| 14 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (no archive, images not loaded) | Unlock without archive | HTTP 404, body has `"detail"` containing "Encrypted archive not found" | exact (status), substring (detail) | N/A | N/A |
+| 15 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (unlock already in progress) | Duplicate unlock request | HTTP 200, body has `"state"` field with current in-progress state | exact (status), schema (body has state) | N/A | N/A |
+| 16 | `GET /unlock/status` (unlock in progress) | Poll unlock status | HTTP 200, body: `{"state": "<current_state>", "error": null}` | exact (status), schema (body has state + error) | N/A | N/A |
+| 17 | `GET /unlock/status` (unlock failed) | Poll after failure | HTTP 200, body has `"state": "error"` and `"error"` is non-null string | exact (state), threshold_min (error string length > 0) | N/A | N/A |
+| 18 | `GET /unlock/status` (idle, no unlock started) | Poll before any unlock | HTTP 200, body: `{"state": "idle", "error": null}` | exact | N/A | N/A |
+
+### Security — Encryption Round-Trip
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 19 | encrypt_to(b"hello world", "testkey") then decrypt_to(result, "testkey") | Encrypt/decrypt round-trip | Decrypted output equals original: `b"hello world"` | exact | N/A | N/A |
+| 20 | decrypt_to(encrypted_bytes, "wrong_key") | Decrypt with wrong key | Raises exception or returns garbled data ≠ original | pattern (exception raised or output ≠ input) | N/A | N/A |
+
+### Security — Key Derivation
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 21 | get_resource_encryption_key() called twice | Deterministic shared key | Both calls return identical string | exact | N/A | N/A |
+| 22 | get_hw_hash("CPU: test") | Hardware hash derivation | Returns non-empty base64 string | threshold_min (length > 0), pattern (base64 charset) | N/A | N/A |
+| 23 | get_api_encryption_key(creds1, hw_hash) vs get_api_encryption_key(creds2, hw_hash) | Different credentials produce different keys | key1 ≠ key2 | exact (inequality) | N/A | N/A |
+
+### Binary Split — Archive Decryption
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 24 | decrypt_archive(test_encrypted_file, known_key, output_path) | Decrypt test archive | Output file matches original plaintext content | exact (file content) | N/A | N/A |
+| 25 | check_images_loaded("nonexistent-version") | Check for missing Docker images | Returns `False` | exact | N/A | N/A |
@@ -0,0 +1,27 @@
+# Problem Statement
+
+## What is this system?
+
+Azaion.Loader is a secure resource distribution service for Azaion's edge computing platform. It runs on edge devices (ARM64) to manage the lifecycle of encrypted AI model resources and Docker service images.
+
+## What problem does it solve?
+
+Azaion distributes proprietary AI models and Docker-based services to edge devices deployed in the field. These assets must be:
+
+1. **Protected in transit and at rest** — models and service images are intellectual property that must not be extractable if a device is compromised
+2. **Bound to authorized hardware** — decryption keys are derived from the device's hardware fingerprint, preventing resource extraction to unauthorized machines
+3. **Efficiently distributed** — large model files are split between an authenticated API (small encrypted part) and a CDN (large part), reducing API bandwidth costs while maintaining security
+4. **Self-service deployable** — edge devices need to authenticate, download, decrypt, and load Docker images autonomously via a single unlock workflow
+
+## Who are the users?
+
+- **Edge devices** — autonomous ARM64 systems running Azaion services (drones, companion PCs, ground stations)
+- **Operators/Admins** — human users who trigger authentication and unlock via HTTP API
+- **Other Azaion services** — co-located containers that call the loader API to fetch model resources
+
+## How does it work (high level)?
+
+1. A client authenticates via `/login` with email/password → the loader obtains a JWT from the Azaion Resource API
+2. For resource access: the loader downloads an encrypted "small" part from the API (using a per-user, per-machine key) and a "big" part from CDN, reassembles them, and decrypts with a shared resource key
+3. For initial deployment: the `/unlock` endpoint triggers a background workflow that downloads a key fragment, decrypts a pre-deployed encrypted Docker image archive, and loads all service images into the local Docker daemon
+4. All security-sensitive logic is compiled as Cython native extensions for IP protection
@@ -0,0 +1,37 @@
+# Restrictions
+
+## Hardware
+
+| Restriction | Source | Details |
+|-------------|--------|---------|
+| ARM64 architecture | `.woodpecker/build-arm.yml` | CI builds ARM64-only Docker images |
+| Docker daemon access | `Dockerfile`, `main.py` | Requires Docker socket mount for `docker load` and `docker image inspect` |
+| Hardware fingerprint availability | `hardware_service.pyx` | Requires `lscpu`, `lspci`, `/sys/block/sda` on Linux; PowerShell on Windows |
+
+## Software
+
+| Restriction | Source | Details |
+|-------------|--------|---------|
+| Python 3.11 | `Dockerfile` | Base image is `python:3.11-slim` |
+| Cython 3.1.3 | `requirements.txt` | Pinned version for compilation |
+| GCC compiler | `Dockerfile` | Required at build time for Cython extension compilation |
+| Docker CLI | `Dockerfile` | `docker-ce-cli` installed inside the container |
+
+## Environment
+
+| Restriction | Source | Details |
+|-------------|--------|---------|
+| `RESOURCE_API_URL` env var | `main.py` | Defaults to `https://api.azaion.com` |
+| `IMAGES_PATH` env var | `main.py` | Defaults to `/opt/azaion/images.enc` — encrypted archive must be pre-deployed |
+| `API_VERSION` env var | `main.py` | Defaults to `latest` — determines expected Docker image tags |
+| CDN config file | `api_client.pyx` | `cdn.yaml` downloaded encrypted from API at credential setup time |
+| Network access | `api_client.pyx`, `cdn_manager.pyx` | Must reach Azaion Resource API and S3 CDN endpoint |
+
+## Operational
+
+| Restriction | Source | Details |
+|-------------|--------|---------|
+| Single instance | `main.py` | Module-level singleton `api_client` — not designed for multi-process deployment |
+| Synchronous I/O | `api_client.pyx` | Large file operations block the worker thread |
+| No horizontal scaling | Architecture | Stateful singleton pattern prevents running multiple replicas |
+| Log directory | `constants.pyx` | Hardcoded to `Logs/` — requires writable filesystem at that path |
@@ -0,0 +1,68 @@
+# Security Approach
+
+## Authentication
+
+- **Mechanism**: JWT Bearer tokens issued by Azaion Resource API
+- **Token handling**: Decoded without signature verification (`options={"verify_signature": False}`) — trusts the API server
+- **Token refresh**: Automatic re-login on 401/403 responses (single retry)
+- **Credential storage**: In-memory only (Credentials object); not persisted to disk
+
+## Authorization
+
+- **Model**: Role-based (RoleEnum with 7 levels: NONE through ApiAdmin)
+- **Enforcement**: Roles are parsed from JWT and stored on the User object, but **no endpoint-level authorization is enforced** by the loader. All endpoints are accessible once credentials are set.
+
+## Encryption
+
+### Resource Encryption (binary-split scheme)
+- **Algorithm**: AES-256-CBC with PKCS7 padding
+- **Key expansion**: SHA-256 hash of string key → 32-byte AES key
+- **IV**: Random 16-byte IV prepended to ciphertext
+
+### Key Derivation
+
+| Key Type | Derivation | Scope |
+|----------|------------|-------|
+| API download key | `SHA-384(email + password + hw_hash + salt)` | Per-user, per-machine |
+| Hardware hash | `SHA-384("Azaion_" + hardware_fingerprint + salt)` | Per-machine |
+| Resource encryption key | `SHA-384(fixed_salt_string)` | Global (shared across all users) |
+| Archive decryption key | `SHA-256(key_fragment_from_api)` | Per-unlock operation |
+
+### Binary Split
+- Resources encrypted with shared resource key, then split into:
+  - **Small part** (≤3KB or 30%): uploaded to authenticated API
+  - **Big part** (remainder): uploaded to CDN
+- Decryption requires both parts — compromise of either storage alone is insufficient
+
+## Hardware Binding
+
+- Hardware fingerprint: CPU model, GPU, memory size, drive serial number
+- Used to derive per-machine encryption keys for API resource downloads
+- Prevents extraction of downloaded resources to different hardware
+
+## IP Protection
+
+- Security-sensitive modules (security, api_client, credentials, etc.) are Cython `.pyx` files compiled to native `.so` extensions
+- Key derivation salts and logic are in compiled code, not readable Python
+
+## Secrets Management
+
+- CDN credentials stored in `cdn.yaml`, downloaded encrypted from the API
+- User credentials exist only in memory
+- JWT tokens exist only in memory
+- No `.env` file or secrets manager — environment variables for runtime config
+
+## Input Validation
+
+- Pydantic models validate request structure (LoginRequest, LoadRequest)
+- No additional input sanitization beyond Pydantic type checking
+- No rate limiting on any endpoint
+
+## Known Security Gaps
+
+1. JWT decoded without signature verification
+2. No endpoint-level authorization enforcement
+3. No rate limiting
+4. Resource encryption key is static/shared — not per-user
+5. `subprocess` with `shell=True` in hardware_service (not user-input-driven, but still a risk pattern)
+6. No HTTPS termination within the service (assumes reverse proxy or direct Docker network)