Add E2E tests, fix bugs

Made-with: Cursor
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-13 05:17:48 +03:00
parent 1f98b5e958
commit 8f7deb3fca
71 changed files with 4740 additions and 29 deletions
+38
View File
@@ -0,0 +1,38 @@
# Acceptance Criteria
## Functional Criteria
| # | Criterion | Measurable Target | Source |
|---|-----------|-------------------|--------|
| AC-1 | Health endpoint responds | GET `/health` returns `{"status": "healthy"}` with HTTP 200 | `main.py:54-55` |
| AC-2 | Login sets credentials | POST `/login` with valid email/password returns `{"status": "ok"}` | `main.py:69-75` |
| AC-3 | Login rejects invalid credentials | POST `/login` with bad credentials returns HTTP 401 | `main.py:74-75` |
| AC-4 | Resource download returns decrypted bytes | POST `/load/{filename}` returns binary content (application/octet-stream) | `main.py:79-85` |
| AC-5 | Resource upload succeeds | POST `/upload/{filename}` with file returns `{"status": "ok"}` | `main.py:89-100` |
| AC-6 | Unlock starts background workflow | POST `/unlock` with credentials returns `{"state": "authenticating"}` | `main.py:158-181` |
| AC-7 | Unlock detects already-loaded images | POST `/unlock` when images are loaded returns `{"state": "ready"}` | `main.py:163-164` |
| AC-8 | Unlock status reports progress | GET `/unlock/status` returns current state and error | `main.py:184-187` |
| AC-9 | Unlock completes full cycle | Background task transitions: authenticating → downloading_key → decrypting → loading_images → ready | `main.py:103-155` |
| AC-10 | Unlock handles missing archive | POST `/unlock` when archive missing and images not loaded returns HTTP 404 | `main.py:168-174` |
## Security Criteria
| # | Criterion | Measurable Target | Source |
|---|-----------|-------------------|--------|
| AC-11 | Resources encrypted at rest | AES-256-CBC encryption with per-user or shared key | `security.pyx` |
| AC-12 | Hardware-bound key derivation | API download key incorporates hardware fingerprint | `security.pyx:54-55` |
| AC-13 | Binary split prevents single-source compromise | Small part on API + big part on CDN required for decryption | `api_client.pyx:166-186` |
| AC-14 | JWT token obtained from trusted API | Login via POST to Azaion Resource API with credentials | `api_client.pyx:43-55` |
| AC-15 | Auto-retry on expired token | 401/403 triggers re-login and retry | `api_client.pyx:140-146` |
## Operational Criteria
| # | Criterion | Measurable Target | Source |
|---|-----------|-------------------|--------|
| AC-16 | Docker images verified | All 7 API_SERVICES images checked via `docker image inspect` | `binary_split.py:60-69` |
| AC-17 | Logs rotate daily | File sink rotates every 1 day, retains 30 days | `constants.pyx:19-26` |
| AC-18 | Container builds on ARM64 | Woodpecker CI produces `loader:arm` image | `.woodpecker/build-arm.yml` |
## Non-Functional Criteria
No explicit performance targets (latency, throughput, concurrency) are defined in the codebase. Resource download/upload latency depends on file size and network conditions.
@@ -0,0 +1,44 @@
# Input Data Parameters
## API Request Schemas
### Login
- `email`: string — user email address
- `password`: string — user password (plaintext)
### Load Resource
- `filename`: string — resource name (without `.big`/`.small` suffix)
- `folder`: string — resource folder/bucket name
### Upload Resource
- `data`: binary file (multipart upload)
- `filename`: string — resource name (path parameter)
- `folder`: string — destination folder (form field, defaults to `"models"`)
### Unlock
- `email`: string — user email
- `password`: string — user password
## Configuration Files
### cdn.yaml (downloaded encrypted from API)
- `host`: string — S3 endpoint URL
- `downloader_access_key`: string — read-only S3 access key
- `downloader_access_secret`: string — read-only S3 secret key
- `uploader_access_key`: string — write S3 access key
- `uploader_access_secret`: string — write S3 secret key
## JWT Token Claims
- `nameid`: string — user GUID
- `unique_name`: string — user email
- `role`: string — one of: ApiAdmin, Admin, ResourceUploader, Validator, Operator
## External Data Sources
| Source | Data | Format | Direction |
|--------|------|--------|-----------|
| Azaion Resource API | JWT tokens, encrypted resources (small parts), CDN config, key fragments | JSON / binary | Download |
| S3 CDN | Large resource parts (.big files) | Binary | Upload / Download |
| Local filesystem | Encrypted Docker archive (`images.enc`), cached `.big` files | Binary | Read / Write |
| Docker daemon | Image loading, image inspection | CLI stdout | Read |
| Host OS | Hardware fingerprint (CPU, GPU, RAM, drive serial) | Text (subprocess) | Read |
@@ -0,0 +1,80 @@
# Expected Results
Maps every input data item to its quantifiable expected result.
Tests use this mapping to compare actual system output against known-correct answers.
## Result Format Legend
| Result Type | When to Use | Example |
|-------------|-------------|---------|
| Exact value | Output must match precisely | `status_code: 200`, `key: "healthy"` |
| Threshold | Output must exceed or stay below a limit | `latency < 2000ms` |
| Pattern match | Output must match a string/regex pattern | `error contains "invalid"` |
| Schema match | Output structure must conform to a schema | `response has keys: status, authenticated, modelCacheDir` |
## Input → Expected Result Mapping
### Health & Status Endpoints
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `GET /health` | Liveness probe, no auth needed | HTTP 200, body: `{"status": "healthy"}` | exact | N/A | N/A |
| 2 | `GET /status` (no prior login) | Status before authentication | HTTP 200, body: `{"status": "healthy", "authenticated": false, "modelCacheDir": "models"}` | exact | N/A | N/A |
| 3 | `GET /status` (after login) | Status after valid authentication | HTTP 200, body has `"authenticated": true` | exact (status), exact (authenticated field) | N/A | N/A |
### Authentication
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 4 | `POST /login {"email": "valid@test.com", "password": "validpass"}` | Valid credentials | HTTP 200, body: `{"status": "ok"}` | exact | N/A | N/A |
| 5 | `POST /login {"email": "bad@test.com", "password": "wrongpass"}` | Invalid credentials | HTTP 401, body has `"detail"` key with error string | exact (status), schema (body has detail) | N/A | N/A |
| 6 | `POST /login {}` | Missing fields | HTTP 422 (validation error) | exact (status) | N/A | N/A |
### Resource Download
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 7 | `POST /load/testfile {"filename": "testfile", "folder": "models"}` (after valid login) | Download existing resource | HTTP 200, Content-Type: `application/octet-stream`, body is non-empty bytes | exact (status), exact (content-type), threshold_min (body length > 0) | N/A | N/A |
| 8 | `POST /load/nonexistent {"filename": "nonexistent", "folder": "models"}` (after valid login) | Download missing resource | HTTP 500, body has `"detail"` key | exact (status), schema (body has detail) | N/A | N/A |
| 9 | `POST /load/testfile {"filename": "testfile", "folder": "models"}` (no login) | Download without authentication | HTTP 500, body has `"detail"` key (ApiClient has no credentials) | exact (status), schema (body has detail) | N/A | N/A |
### Resource Upload
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 10 | `POST /upload/testfile` multipart: file=binary, folder="models" (after valid login) | Upload resource | HTTP 200, body: `{"status": "ok"}` | exact | N/A | N/A |
| 11 | `POST /upload/testfile` no file attached | Upload without file | HTTP 422 (validation error) | exact (status) | N/A | N/A |
### Unlock Workflow
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 12 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (archive exists, images not loaded) | Start unlock workflow | HTTP 200, body: `{"state": "authenticating"}` | exact | N/A | N/A |
| 13 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (images already loaded) | Unlock when already ready | HTTP 200, body: `{"state": "ready"}` | exact | N/A | N/A |
| 14 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (no archive, images not loaded) | Unlock without archive | HTTP 404, body has `"detail"` containing "Encrypted archive not found" | exact (status), substring (detail) | N/A | N/A |
| 15 | `POST /unlock {"email": "valid@test.com", "password": "validpass"}` (unlock already in progress) | Duplicate unlock request | HTTP 200, body has `"state"` field with current in-progress state | exact (status), schema (body has state) | N/A | N/A |
| 16 | `GET /unlock/status` (unlock in progress) | Poll unlock status | HTTP 200, body: `{"state": "<current_state>", "error": null}` | exact (status), schema (body has state + error) | N/A | N/A |
| 17 | `GET /unlock/status` (unlock failed) | Poll after failure | HTTP 200, body has `"state": "error"` and `"error"` is non-null string | exact (state), threshold_min (error string length > 0) | N/A | N/A |
| 18 | `GET /unlock/status` (idle, no unlock started) | Poll before any unlock | HTTP 200, body: `{"state": "idle", "error": null}` | exact | N/A | N/A |
### Security — Encryption Round-Trip
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 19 | encrypt_to(b"hello world", "testkey") then decrypt_to(result, "testkey") | Encrypt/decrypt round-trip | Decrypted output equals original: `b"hello world"` | exact | N/A | N/A |
| 20 | decrypt_to(encrypted_bytes, "wrong_key") | Decrypt with wrong key | Raises exception or returns garbled data ≠ original | pattern (exception raised or output ≠ input) | N/A | N/A |
### Security — Key Derivation
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 21 | get_resource_encryption_key() called twice | Deterministic shared key | Both calls return identical string | exact | N/A | N/A |
| 22 | get_hw_hash("CPU: test") | Hardware hash derivation | Returns non-empty base64 string | threshold_min (length > 0), pattern (base64 charset) | N/A | N/A |
| 23 | get_api_encryption_key(creds1, hw_hash) vs get_api_encryption_key(creds2, hw_hash) | Different credentials produce different keys | key1 ≠ key2 | exact (inequality) | N/A | N/A |
### Binary Split — Archive Decryption
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------|
| 24 | decrypt_archive(test_encrypted_file, known_key, output_path) | Decrypt test archive | Output file matches original plaintext content | exact (file content) | N/A | N/A |
| 25 | check_images_loaded("nonexistent-version") | Check for missing Docker images | Returns `False` | exact | N/A | N/A |
+27
View File
@@ -0,0 +1,27 @@
# Problem Statement
## What is this system?
Azaion.Loader is a secure resource distribution service for Azaion's edge computing platform. It runs on edge devices (ARM64) to manage the lifecycle of encrypted AI model resources and Docker service images.
## What problem does it solve?
Azaion distributes proprietary AI models and Docker-based services to edge devices deployed in the field. These assets must be:
1. **Protected in transit and at rest** — models and service images are intellectual property that must not be extractable if a device is compromised
2. **Bound to authorized hardware** — decryption keys are derived from the device's hardware fingerprint, preventing resource extraction to unauthorized machines
3. **Efficiently distributed** — large model files are split between an authenticated API (small encrypted part) and a CDN (large part), reducing API bandwidth costs while maintaining security
4. **Self-service deployable** — edge devices need to authenticate, download, decrypt, and load Docker images autonomously via a single unlock workflow
## Who are the users?
- **Edge devices** — autonomous ARM64 systems running Azaion services (drones, companion PCs, ground stations)
- **Operators/Admins** — human users who trigger authentication and unlock via HTTP API
- **Other Azaion services** — co-located containers that call the loader API to fetch model resources
## How does it work (high level)?
1. A client authenticates via `/login` with email/password → the loader obtains a JWT from the Azaion Resource API
2. For resource access: the loader downloads an encrypted "small" part from the API (using a per-user, per-machine key) and a "big" part from CDN, reassembles them, and decrypts with a shared resource key
3. For initial deployment: the `/unlock` endpoint triggers a background workflow that downloads a key fragment, decrypts a pre-deployed encrypted Docker image archive, and loads all service images into the local Docker daemon
4. All security-sensitive logic is compiled as Cython native extensions for IP protection
+37
View File
@@ -0,0 +1,37 @@
# Restrictions
## Hardware
| Restriction | Source | Details |
|-------------|--------|---------|
| ARM64 architecture | `.woodpecker/build-arm.yml` | CI builds ARM64-only Docker images |
| Docker daemon access | `Dockerfile`, `main.py` | Requires Docker socket mount for `docker load` and `docker image inspect` |
| Hardware fingerprint availability | `hardware_service.pyx` | Requires `lscpu`, `lspci`, `/sys/block/sda` on Linux; PowerShell on Windows |
## Software
| Restriction | Source | Details |
|-------------|--------|---------|
| Python 3.11 | `Dockerfile` | Base image is `python:3.11-slim` |
| Cython 3.1.3 | `requirements.txt` | Pinned version for compilation |
| GCC compiler | `Dockerfile` | Required at build time for Cython extension compilation |
| Docker CLI | `Dockerfile` | `docker-ce-cli` installed inside the container |
## Environment
| Restriction | Source | Details |
|-------------|--------|---------|
| `RESOURCE_API_URL` env var | `main.py` | Defaults to `https://api.azaion.com` |
| `IMAGES_PATH` env var | `main.py` | Defaults to `/opt/azaion/images.enc` — encrypted archive must be pre-deployed |
| `API_VERSION` env var | `main.py` | Defaults to `latest` — determines expected Docker image tags |
| CDN config file | `api_client.pyx` | `cdn.yaml` downloaded encrypted from API at credential setup time |
| Network access | `api_client.pyx`, `cdn_manager.pyx` | Must reach Azaion Resource API and S3 CDN endpoint |
## Operational
| Restriction | Source | Details |
|-------------|--------|---------|
| Single instance | `main.py` | Module-level singleton `api_client` — not designed for multi-process deployment |
| Synchronous I/O | `api_client.pyx` | Large file operations block the worker thread |
| No horizontal scaling | Architecture | Stateful singleton pattern prevents running multiple replicas |
| Log directory | `constants.pyx` | Hardcoded to `Logs/` — requires writable filesystem at that path |
+68
View File
@@ -0,0 +1,68 @@
# Security Approach
## Authentication
- **Mechanism**: JWT Bearer tokens issued by Azaion Resource API
- **Token handling**: Decoded without signature verification (`options={"verify_signature": False}`) — trusts the API server
- **Token refresh**: Automatic re-login on 401/403 responses (single retry)
- **Credential storage**: In-memory only (Credentials object); not persisted to disk
## Authorization
- **Model**: Role-based (RoleEnum with 7 levels: NONE through ApiAdmin)
- **Enforcement**: Roles are parsed from JWT and stored on the User object, but **no endpoint-level authorization is enforced** by the loader. All endpoints are accessible once credentials are set.
## Encryption
### Resource Encryption (binary-split scheme)
- **Algorithm**: AES-256-CBC with PKCS7 padding
- **Key expansion**: SHA-256 hash of string key → 32-byte AES key
- **IV**: Random 16-byte IV prepended to ciphertext
### Key Derivation
| Key Type | Derivation | Scope |
|----------|------------|-------|
| API download key | `SHA-384(email + password + hw_hash + salt)` | Per-user, per-machine |
| Hardware hash | `SHA-384("Azaion_" + hardware_fingerprint + salt)` | Per-machine |
| Resource encryption key | `SHA-384(fixed_salt_string)` | Global (shared across all users) |
| Archive decryption key | `SHA-256(key_fragment_from_api)` | Per-unlock operation |
### Binary Split
- Resources encrypted with shared resource key, then split into:
- **Small part** (≤3KB or 30%): uploaded to authenticated API
- **Big part** (remainder): uploaded to CDN
- Decryption requires both parts — compromise of either storage alone is insufficient
## Hardware Binding
- Hardware fingerprint: CPU model, GPU, memory size, drive serial number
- Used to derive per-machine encryption keys for API resource downloads
- Prevents extraction of downloaded resources to different hardware
## IP Protection
- Security-sensitive modules (security, api_client, credentials, etc.) are Cython `.pyx` files compiled to native `.so` extensions
- Key derivation salts and logic are in compiled code, not readable Python
## Secrets Management
- CDN credentials stored in `cdn.yaml`, downloaded encrypted from the API
- User credentials exist only in memory
- JWT tokens exist only in memory
- No `.env` file or secrets manager — environment variables for runtime config
## Input Validation
- Pydantic models validate request structure (LoginRequest, LoadRequest)
- No additional input sanitization beyond Pydantic type checking
- No rate limiting on any endpoint
## Known Security Gaps
1. JWT decoded without signature verification
2. No endpoint-level authorization enforcement
3. No rate limiting
4. Resource encryption key is static/shared — not per-user
5. `subprocess` with `shell=True` in hardware_service (not user-input-driven, but still a risk pattern)
6. No HTTPS termination within the service (assumes reverse proxy or direct Docker network)