mirror of
https://github.com/azaion/loader.git
synced 2026-04-22 18:06:42 +00:00
[AZ-187] Rules & cleanup
Made-with: Cursor
This commit is contained in:
@@ -0,0 +1,46 @@
|
||||
# Acceptance Criteria Assessment
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
|
||||
|-----------|-----------|-------------------|---------------------|--------|
|
||||
| AC1: AI models not extractable | Binary-split: model split across API+CDN, requires both keys to reconstruct | TPM: models encrypted with device-sealed key, only decryptable on provisioned hardware. Industry standard for edge AI (SecEdge, NVIDIA Zero-Trust). Stronger guarantee than split-storage. | Medium — requires fTPM provisioning in manufacturing pipeline | Modified |
|
||||
| AC2: Device authentication | Email/password → JWT → hardware-hashed key derivation | TPM attestation: device proves identity via EK certificate. Can coexist with existing JWT auth. Stronger — hardware fuse-derived, not software-computed. | Low — additive to existing auth | Modified |
|
||||
| AC3: Keys bound to hardware | SHA-384(email+password+hw_hash+salt) from subprocess-collected CPU/GPU info | TPM-sealed keys bound to device fuses (MB2 bootloader seed). Significantly stronger — cannot be replicated by spoofing hardware strings. | Low — TPM key sealing replaces software key derivation | Modified |
|
||||
| AC4: Existing API contracts preserved | F1-F6 flows must not break | Achievable — TPM changes are internal to the loader's security layer. API endpoints and contracts remain the same. | None | Unchanged |
|
||||
| AC5: ARM64 Jetson Orin Nano support | Required | fTPM available on all Orin series (JetPack 6.1+). Python tooling (tpm2-pytss) supports ARM64. | None — natively supported | Unchanged |
|
||||
| AC6: Works inside Docker containers | Docker socket mount | TPM accessible via --device /dev/tpm0 --device /dev/tpmrm0. No --privileged needed. | Low — add device mounts to docker-compose | Unchanged |
|
||||
| AC7: Cython compilation remains | .pyx → .so for IP protection | tpm2-pytss is pure Python calling native tpm2-tss. Can be wrapped in Cython modules same as existing crypto code. | Low | Unchanged |
|
||||
| AC8: Migration path exists | N/A (new requirement) | TPM+standard download and legacy binary-split can coexist via feature flag. TPM-provisioned devices use sealed keys; non-provisioned use legacy scheme. | Medium — dual code path during transition | Added |
|
||||
|
||||
## Restrictions Assessment
|
||||
|
||||
| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
|
||||
|-------------|-----------|-------------------|---------------------|--------|
|
||||
| R1: ARM64 Jetson Orin Nano | Hard requirement | fTPM fully supported on Orin Nano (JetPack 6.1+) | None | Unchanged |
|
||||
| R2: Docker container | Socket mount for Docker-in-Docker | TPM device mount is separate from Docker socket. Both can coexist. | None | Unchanged |
|
||||
| R3: fTPM provisioning at manufacturing | N/A (new) | Only offline provisioning supported (per-device during manufacturing). Requires: KDK0 gen, fuse burn, EK cert via CA, EKB encoding. This is a significant operational requirement. | High — new manufacturing step | Added |
|
||||
| R4: fTPM maturity concerns | N/A (new) | PCR persistence issues reported on forums (PCR7 not resetting, NV handles lost after reboot). Not production-hardened for all use cases yet. | Medium — risk of instability | Added |
|
||||
| R5: SaaS + Edge dual deployment | Both SaaS web servers and Jetson edge | TPM is machine-specific. Works perfectly for fixed edge devices. For SaaS/cloud VMs, need vTPM or alternative key management. Dual strategy may be needed. | Medium — different security models per deployment type | Added |
|
||||
|
||||
## Key Findings
|
||||
|
||||
1. **fTPM on Jetson Orin Nano is real and capable** — JetPack 6.1+ provides TPM 2.0 with hardware root of trust from device fuses. The security guarantees are stronger than the current software-computed hash-based scheme.
|
||||
|
||||
2. **Binary-split can be simplified but not immediately eliminated** — TPM provides device-bound encryption (model only decryptable on provisioned hardware). This makes the split-storage model unnecessary for the anti-extraction threat. However, the CDN offloading benefit of big/small split (bandwidth optimization) is orthogonal to security.
|
||||
|
||||
3. **Manufacturing pipeline impact is significant** — fTPM provisioning requires per-device fuse burning and EK certificate enrollment during manufacturing. This is a business process change, not just a code change.
|
||||
|
||||
4. **Known stability issues** — Forum reports of PCR values and NV handles not persisting across reboots. This needs investigation before production reliance.
|
||||
|
||||
5. **Docker integration is straightforward** — Device mount, no privileged mode needed. Python tooling (tpm2-pytss) is mature and supports the required Python version.
|
||||
|
||||
6. **Dual deployment model needs consideration** — Jetson edge devices get TPM. SaaS web servers likely don't have TPM. Need a strategy that works for both.
|
||||
|
||||
## Sources
|
||||
- NVIDIA Jetson Linux Developer Guide r36.4.4 (L1)
|
||||
- NVIDIA JetPack 6.1 Blog (L2)
|
||||
- NVIDIA Developer Forums — PCR/NV persistence issues (L4)
|
||||
- tpm2-pytss GitHub/PyPI (L1)
|
||||
- SecEdge/TCG — Edge AI Trusted Computing (L3)
|
||||
- DevOps StackExchange — Docker TPM access (L4)
|
||||
+73
@@ -0,0 +1,73 @@
|
||||
# Question Decomposition
|
||||
|
||||
## Original Question
|
||||
Can TPM-based security on Jetson Orin Nano replace the binary-split resource scheme, simplifying the loader to a standard authenticated resource downloader?
|
||||
|
||||
## Active Mode
|
||||
Mode A Phase 1 — AC & Restrictions Assessment
|
||||
|
||||
## Question Type
|
||||
Decision Support — weighing trade-offs of TPM vs binary-split security models
|
||||
|
||||
## Problem Context Summary
|
||||
The Azaion Loader uses a binary-split scheme (ADR-002) designed for untrusted end-user laptops. The deployment model shifted to SaaS/Jetson Orin Nano edge devices where TPM provides hardware-rooted trust. The question is whether TPM makes binary-split obsolete.
|
||||
|
||||
## Research Subject Boundary Definition
|
||||
|
||||
| Dimension | Boundary |
|
||||
|-----------|----------|
|
||||
| Population | Jetson Orin Nano edge devices running containerized AI workloads |
|
||||
| Geography | Global (no geographic restriction) |
|
||||
| Timeframe | JetPack 6.1+ (July 2024 onwards, when fTPM was introduced) |
|
||||
| Level | Production deployment (not development/prototyping) |
|
||||
|
||||
## Sub-Questions
|
||||
|
||||
### SQ1: What are the fTPM capabilities on Jetson Orin Nano?
|
||||
- "Jetson Orin Nano TPM capabilities security JetPack 6.1"
|
||||
- "NVIDIA fTPM OP-TEE architecture Orin"
|
||||
- "Jetson Orin TPM 2.0 key sealing PCR operations"
|
||||
- "Jetson fTPM provisioning manufacturing process"
|
||||
- "Jetson Orin fTPM limitations known issues forums"
|
||||
|
||||
### SQ2: Can TPM-sealed keys replace the current key derivation scheme?
|
||||
- "TPM key sealing vs SHA-384 key derivation comparison"
|
||||
- "tpm2-pytss seal unseal Python example"
|
||||
- "TPM sealed key Docker container access /dev/tpm0"
|
||||
- "TPM hardware-bound encryption key management edge AI"
|
||||
|
||||
### SQ3: Is the binary-split storage model still needed with TPM?
|
||||
- "binary split key fragment security model vs TPM hardware root of trust"
|
||||
- "AI model protection TPM-based vs split storage"
|
||||
- "edge device model protection TPM encryption vs distributed key"
|
||||
- "when is split-key security necessary vs hardware security module"
|
||||
|
||||
### SQ4: What's the migration path?
|
||||
- "TPM security migration coexist legacy encryption"
|
||||
- "gradual TPM adoption edge devices existing fleet"
|
||||
|
||||
### SQ5: What are the implementation requirements?
|
||||
- "tpm2-pytss ARM64 Jetson Linux Docker"
|
||||
- "Jetson Orin fTPM LUKS disk encryption Docker container"
|
||||
- "TPM2 tools Cython integration"
|
||||
|
||||
## Chosen Perspectives
|
||||
|
||||
1. **Implementer/Engineer**: Technical integration complexity, library maturity, Docker constraints, Cython compatibility
|
||||
2. **Domain Expert (Security)**: Threat model comparison, attack surface analysis, defense-in-depth considerations
|
||||
3. **Practitioner**: Real-world fTPM experiences on Jetson, known issues, production readiness
|
||||
|
||||
## Timeliness Sensitivity Assessment
|
||||
|
||||
- **Research Topic**: fTPM on Jetson Orin Nano for AI model protection
|
||||
- **Sensitivity Level**: High
|
||||
- **Rationale**: NVIDIA Jetson ecosystem updates frequently; fTPM introduced in JetPack 6.1 (July 2024); PCR persistence issues reported
|
||||
- **Source Time Window**: 12 months
|
||||
- **Priority official sources**:
|
||||
1. NVIDIA Jetson Linux Developer Guide (r36.4.4+)
|
||||
2. TCG TPM 2.0 Specification
|
||||
3. tpm2-software GitHub (tpm2-tss, tpm2-tools, tpm2-pytss)
|
||||
- **Key version information to verify**:
|
||||
- JetPack: 6.1+ (r36.4+)
|
||||
- tpm2-pytss: latest (supports Python 3.11)
|
||||
- tpm2-tss: 2.4.0+
|
||||
@@ -0,0 +1,139 @@
|
||||
# Source Registry
|
||||
|
||||
## Source #1
|
||||
- **Title**: NVIDIA JetPack 6.1 — fTPM Introduction Blog
|
||||
- **Link**: https://developer.nvidia.com/blog/nvidia-jetpack-6-1-boosts-performance-and-security-through-camera-stack-optimizations-and-introduction-of-firmware-tpm/
|
||||
- **Tier**: L2
|
||||
- **Publication Date**: 2024-07
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Version Info**: JetPack 6.1
|
||||
- **Target Audience**: Jetson developers/OEMs
|
||||
- **Research Boundary Match**: Full match
|
||||
- **Summary**: fTPM introduced in JetPack 6.1 for Orin series; provides device attestation and secure key storage without discrete TPM hardware.
|
||||
- **Related Sub-question**: SQ1
|
||||
|
||||
## Source #2
|
||||
- **Title**: Firmware TPM — NVIDIA Jetson Linux Developer Guide (r36.4.4)
|
||||
- **Link**: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/FirmwareTPM.html
|
||||
- **Tier**: L1
|
||||
- **Publication Date**: 2025-06
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Version Info**: r36.4.4 / JetPack 6.1
|
||||
- **Target Audience**: Jetson device manufacturers and developers
|
||||
- **Research Boundary Match**: Full match
|
||||
- **Summary**: Comprehensive fTPM docs: architecture (OP-TEE + TrustZone), provisioning (offline only), PCR measured boot, key derivation from hardware fuse, EK certificate management. Per-device unique seed from MB2 bootloader.
|
||||
- **Related Sub-question**: SQ1, SQ2
|
||||
|
||||
## Source #3
|
||||
- **Title**: Security — NVIDIA Jetson Linux Developer Guide (r36.4.3)
|
||||
- **Link**: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security.html
|
||||
- **Tier**: L1
|
||||
- **Publication Date**: 2025
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Version Info**: r36.4.3
|
||||
- **Target Audience**: Jetson device manufacturers and developers
|
||||
- **Research Boundary Match**: Full match
|
||||
- **Summary**: Overview of Jetson security: Secure Boot, Disk Encryption (LUKS), OP-TEE, fTPM. Chain of trust from BootROM through fuses.
|
||||
- **Related Sub-question**: SQ1, SQ3
|
||||
|
||||
## Source #4
|
||||
- **Title**: Access ftpm pcr registers — NVIDIA Developer Forums
|
||||
- **Link**: https://forums.developer.nvidia.com/t/access-ftpm-pcr-registers/328636
|
||||
- **Tier**: L4
|
||||
- **Publication Date**: 2024-2025
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Version Info**: JetPack 6.x / Debian-based
|
||||
- **Target Audience**: Jetson Orin Nano developers
|
||||
- **Research Boundary Match**: Full match
|
||||
- **Summary**: Users report PCR7 values not persisting/resetting across reboots when using fTPM for disk encryption. Issues with cryptsetup integration.
|
||||
- **Related Sub-question**: SQ1, SQ5
|
||||
|
||||
## Source #5
|
||||
- **Title**: fTPM handles don't persist after reboot — NVIDIA Developer Forums
|
||||
- **Link**: https://forums.developer.nvidia.com/t/ftpm-handles-dont-persist-after-a-reboot/344424
|
||||
- **Tier**: L4
|
||||
- **Publication Date**: 2024-2025
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Target Audience**: Jetson Orin NX developers
|
||||
- **Research Boundary Match**: Full match (same Orin family)
|
||||
- **Summary**: fTPM NV handles not persisting across reboots on Orin NX. Suggests broader persistence issues across Orin variants.
|
||||
- **Related Sub-question**: SQ1, SQ5
|
||||
|
||||
## Source #6
|
||||
- **Title**: Accessing TPM from inside a Docker Container — DevOps StackExchange
|
||||
- **Link**: https://devops.stackexchange.com/questions/8509/accessing-tpm-from-inside-a-docker-container
|
||||
- **Tier**: L4
|
||||
- **Publication Date**: Various
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Target Audience**: DevOps engineers
|
||||
- **Research Boundary Match**: Partial overlap (general Docker, not Jetson-specific)
|
||||
- **Summary**: Mount /dev/tpm0 and /dev/tpmrm0 via --device flag. TPM is for key wrapping, not storage. Machine-specific binding.
|
||||
- **Related Sub-question**: SQ2, SQ5
|
||||
|
||||
## Source #7
|
||||
- **Title**: Docker container accessing virtual TPM — Medium
|
||||
- **Link**: https://medium.com/@eng.fernandosilva/docker-container-accessing-virtual-tpm-device-from-vm-running-on-windows-11-hyper-v-6c1bbb0f0c5d
|
||||
- **Tier**: L3
|
||||
- **Publication Date**: 2024
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Target Audience**: Docker/DevOps practitioners
|
||||
- **Research Boundary Match**: Partial overlap (Windows vTPM, but Docker access patterns apply)
|
||||
- **Summary**: Docker --device /dev/tpm0:/dev/tpm0 --device /dev/tpmrm0:/dev/tpmrm0 for TPM access. No --privileged needed for device-based access.
|
||||
- **Related Sub-question**: SQ5
|
||||
|
||||
## Source #8
|
||||
- **Title**: Securing Edge AI through Trusted Computing — SecEdge/TCG Blog
|
||||
- **Link**: https://www.secedge.com/tcg-blog-securing-edge-ai-through-trusted-computing/
|
||||
- **Tier**: L3
|
||||
- **Publication Date**: 2024-2025
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Target Audience**: Edge AI security architects
|
||||
- **Research Boundary Match**: Full match
|
||||
- **Summary**: TPM-based device trust for edge AI: device-bound encryption, model binding to specific hardware, attestation. Addresses unauthorized copying, tampering, and cloning threats.
|
||||
- **Related Sub-question**: SQ3
|
||||
|
||||
## Source #9
|
||||
- **Title**: tpm2-software/tpm2-pytss — GitHub
|
||||
- **Link**: https://github.com/tpm2-software/tpm2-pytss
|
||||
- **Tier**: L1
|
||||
- **Publication Date**: 2026-02 (last update)
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Version Info**: Latest, supports Python 3.10-3.14
|
||||
- **Target Audience**: Python developers using TPM
|
||||
- **Research Boundary Match**: Full match
|
||||
- **Summary**: Python bindings for TPM2 TSS. ESAPI, FAPI, marshaling support. Requires tpm2-tss >= 2.4.0. Available on PyPI.
|
||||
- **Related Sub-question**: SQ5
|
||||
|
||||
## Source #10
|
||||
- **Title**: Building a Zero-Trust Architecture for Confidential AI Factories — NVIDIA Blog
|
||||
- **Link**: https://developer.nvidia.com/blog/building-a-zero-trust-architecture-for-confidential-ai-factories/
|
||||
- **Tier**: L2
|
||||
- **Publication Date**: 2024-2025
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Target Audience**: AI infrastructure architects
|
||||
- **Research Boundary Match**: Reference only (cloud/data center focus, not edge)
|
||||
- **Summary**: Zero-trust with TEEs and attestation for AI model protection. Hardware-enforced trust, model binding, three-way trust dilemma. Industry direction for AI model security.
|
||||
- **Related Sub-question**: SQ3
|
||||
|
||||
## Source #11
|
||||
- **Title**: OP-TEE — NVIDIA Jetson Linux Developer Guide (r36.4.4)
|
||||
- **Link**: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/OpTee.html
|
||||
- **Tier**: L1
|
||||
- **Publication Date**: 2025
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Version Info**: r36.4.4
|
||||
- **Target Audience**: Jetson developers building Trusted Applications
|
||||
- **Research Boundary Match**: Full match
|
||||
- **Summary**: OP-TEE on Jetson Orin: TrustZone-based TEE, Client Application ↔ Trusted Application communication via libteec, crypto services available. Custom TAs can be built.
|
||||
- **Related Sub-question**: SQ1, SQ2
|
||||
|
||||
## Source #12
|
||||
- **Title**: LUKS Full Disk Encryption on Jetson Orin Nano — Piveral
|
||||
- **Link**: https://nvidia-jetson.piveral.com/jetson-orin-nano/implementing-password-protected-luks-full-disk-encryption-on-jetson-orin-nano/
|
||||
- **Tier**: L3
|
||||
- **Publication Date**: 2024-2025
|
||||
- **Timeliness Status**: Currently valid
|
||||
- **Target Audience**: Jetson Orin Nano practitioners
|
||||
- **Research Boundary Match**: Full match
|
||||
- **Summary**: LUKS encryption on Orin Nano. Default auto-decrypt on boot defeats purpose. Must modify LUKS service for password prompts. gen_luks_passphrase script for key generation.
|
||||
- **Related Sub-question**: SQ2, SQ5
|
||||
@@ -0,0 +1,161 @@
|
||||
# Fact Cards
|
||||
|
||||
## Fact #1
|
||||
- **Statement**: Jetson Orin Nano series has firmware TPM (fTPM) support, introduced in JetPack 6.1 (July 2024). It implements TPM 2.0 via the TCG reference implementation running in OP-TEE.
|
||||
- **Source**: Source #1, #2
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Jetson Orin Nano developers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: TPM capability
|
||||
|
||||
## Fact #2
|
||||
- **Statement**: The fTPM seed is derived from hardware fuses by the MB2 secure bootloader. It is a per-device, unique, secure value — establishing hardware root of trust.
|
||||
- **Source**: Source #2
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Jetson Orin Nano developers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Hardware binding strength
|
||||
|
||||
## Fact #3
|
||||
- **Statement**: fTPM provisioning currently supports offline method only (per-device during manufacturing). Online provisioning "will be available in a future release" (as of r36.4.4).
|
||||
- **Source**: Source #2
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Jetson device manufacturers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Implementation complexity
|
||||
|
||||
## Fact #4
|
||||
- **Statement**: fTPM provisioning requires: per-device KDK0 generation, fuse burning, EK certificate generation via CA server, EKB encoding. This is a manufacturing-time process.
|
||||
- **Source**: Source #2
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Jetson device manufacturers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Implementation complexity
|
||||
|
||||
## Fact #5
|
||||
- **Statement**: Users report fTPM PCR register values (specifically PCR7) not persisting/resetting correctly across reboots on Jetson Orin Nano with Debian-based systems.
|
||||
- **Source**: Source #4
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Jetson Orin Nano users attempting disk encryption
|
||||
- **Confidence**: ⚠️ Medium (forum reports, not officially confirmed as bug vs. misconfiguration)
|
||||
- **Related Dimension**: Production readiness
|
||||
|
||||
## Fact #6
|
||||
- **Statement**: fTPM NV handles don't persist after reboot on Jetson Orin NX, suggesting broader persistence issues across the Orin family.
|
||||
- **Source**: Source #5
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Jetson Orin developers
|
||||
- **Confidence**: ⚠️ Medium (forum reports from multiple users)
|
||||
- **Related Dimension**: Production readiness
|
||||
|
||||
## Fact #7
|
||||
- **Statement**: Docker containers can access host TPM via --device /dev/tpm0:/dev/tpm0 --device /dev/tpmrm0:/dev/tpmrm0. No --privileged flag needed for device-based mount.
|
||||
- **Source**: Source #6, #7
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Docker/container developers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Docker integration
|
||||
|
||||
## Fact #8
|
||||
- **Statement**: TPM is a key wrapping/sealing device, not a storage device. Minimal storage capacity and slow. Proper pattern: seal encryption keys in TPM, store encrypted data elsewhere.
|
||||
- **Source**: Source #6
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: General TPM users
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Architecture pattern
|
||||
|
||||
## Fact #9
|
||||
- **Statement**: tpm2-pytss (Python TPM2 bindings) is available on PyPI, supports Python 3.10-3.14, requires tpm2-tss >= 2.4.0. Provides ESAPI and FAPI interfaces.
|
||||
- **Source**: Source #9
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Python developers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Implementation tooling
|
||||
|
||||
## Fact #10
|
||||
- **Statement**: Industry trend: hardware-enforced TEEs and attestation for AI model protection. Device-bound encryption ties models to specific devices, preventing unauthorized copying.
|
||||
- **Source**: Source #8, #10
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Edge AI security architects
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Industry direction
|
||||
|
||||
## Fact #11
|
||||
- **Statement**: TPM binding is machine-specific. If workloads migrate across hardware, TPM-sealed keys become inaccessible. This is a feature for edge devices (prevents extraction) but a constraint for SaaS/cloud deployments.
|
||||
- **Source**: Source #6
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Infrastructure architects
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Deployment model compatibility
|
||||
|
||||
## Fact #12
|
||||
- **Statement**: The current loader's binary-split scheme splits resources into small part (API, per-user/hw key) + big part (CDN, shared key). Designed to prevent model extraction on untrusted laptops.
|
||||
- **Source**: Problem context (architecture.md, ADR-002)
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Azaion team
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Current architecture
|
||||
|
||||
## Fact #13
|
||||
- **Statement**: The loader currently derives hardware-bound keys via SHA-384(email + password + hw_hash + salt). The hw_hash is SHA-384 of hardware fingerprint collected by HardwareService (CPU/GPU info via subprocess).
|
||||
- **Source**: Problem context (architecture.md, security module docs)
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Azaion team
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Current key management
|
||||
|
||||
## Fact #14
|
||||
- **Statement**: OP-TEE on Jetson Orin supports custom Trusted Applications that can perform cryptographic operations in the secure world (ARM TrustZone S-EL0).
|
||||
- **Source**: Source #11
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Jetson security developers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: TPM capability
|
||||
|
||||
## Fact #15
|
||||
- **Statement**: Jetson Orin LUKS disk encryption defaults to auto-decrypt on boot (defeating purpose). Requires modification to LUKS service for password-protected operation.
|
||||
- **Source**: Source #12
|
||||
- **Phase**: Phase 1
|
||||
- **Target Audience**: Jetson Orin Nano practitioners
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Disk encryption readiness
|
||||
|
||||
## Fact #16
|
||||
- **Statement**: Orin Nano only supports REE FS for OP-TEE secure storage (file-system-based). RPMB (hardware replay-protected memory) is AGX Orin only. REE FS stores encrypted data at /data/tee/ on the normal world filesystem.
|
||||
- **Source**: NVIDIA Jetson Linux Developer Guide — Secure Storage (r38.2)
|
||||
- **Phase**: Phase 2
|
||||
- **Target Audience**: Jetson Orin Nano developers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Storage security
|
||||
|
||||
## Fact #17
|
||||
- **Statement**: tpm2-pytss FAPI provides create_seal(path, data), unseal(path), encrypt(path, plaintext), decrypt(path, ciphertext) — high-level Python API for TPM key operations.
|
||||
- **Source**: tpm2-pytss documentation (readthedocs)
|
||||
- **Phase**: Phase 2
|
||||
- **Target Audience**: Python TPM developers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Implementation tooling
|
||||
|
||||
## Fact #18
|
||||
- **Statement**: Alternative AI model protection without TPM: signed manifests with payload hashes, asymmetric signature verification on-device, dm-verity for runtime integrity. These work on any hardware.
|
||||
- **Source**: Thistle Technologies, Tinfoil Containers blogs
|
||||
- **Phase**: Phase 2
|
||||
- **Target Audience**: Edge AI security architects
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Non-TPM alternatives
|
||||
|
||||
## Fact #19
|
||||
- **Statement**: TPM key sealing workflow: tpm2_createprimary → tpm2_create (with optional PCR policy) → tpm2_load → tpm2_startauthsession → tpm2_policypcr → tpm2_unseal. Keys are bound to device and optionally to boot state.
|
||||
- **Source**: tpm2-tools tutorial, GitHub issues
|
||||
- **Phase**: Phase 2
|
||||
- **Target Audience**: TPM developers
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Implementation workflow
|
||||
|
||||
## Fact #20
|
||||
- **Statement**: The binary-split CDN offloading (big part on CDN, small part on API) serves a bandwidth/cost purpose separate from its security purpose. Even if security is handled by TPM, CDN offloading for large models may still be valuable.
|
||||
- **Source**: Architecture analysis (ADR-002 rationale)
|
||||
- **Phase**: Phase 2
|
||||
- **Target Audience**: Azaion team
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Architecture separation of concerns
|
||||
@@ -0,0 +1,34 @@
|
||||
# Comparison Framework
|
||||
|
||||
## Selected Framework Type
|
||||
Decision Support
|
||||
|
||||
## Selected Dimensions
|
||||
|
||||
1. Solution overview
|
||||
2. Threat model coverage
|
||||
3. Hardware binding strength
|
||||
4. Implementation cost
|
||||
5. Maintenance cost
|
||||
6. Risk assessment
|
||||
7. Migration difficulty
|
||||
8. Applicable scenarios
|
||||
|
||||
## Compared Solutions
|
||||
|
||||
- **A: Current binary-split scheme** (status quo)
|
||||
- **B: TPM-only** (full replacement — eliminate binary-split)
|
||||
- **C: Hybrid** (TPM for device binding + simplified download without split)
|
||||
|
||||
## Initial Population
|
||||
|
||||
| Dimension | A: Binary-Split (current) | B: TPM-Only | C: Hybrid (recommended) | Factual Basis |
|
||||
|-----------|--------------------------|-------------|------------------------|---------------|
|
||||
| Solution overview | Encrypt resource, split small (API) + big (CDN), per-user+hw key + shared key | TPM-sealed master key, single encrypted download, device-bound decryption | TPM-sealed key for device binding; single authenticated download from API/CDN; no split | Fact #12, #2, #8 |
|
||||
| Threat model | Prevents extraction by requiring two servers; hardware fingerprint (software hash) ties to device | Prevents extraction via hardware fuse-derived key; attestation proves device identity; tamper-evident boot chain | Combines TPM device binding with authenticated download; single download point acceptable because device itself is trusted | Fact #2, #10, #11 |
|
||||
| Hardware binding | SHA-384(email+password+hw_hash+salt) — software-computed, spoofable if hw strings are replicated | fTPM seed from hardware fuses — per-device unique, not software-spoofable | Same as B for binding; key sealed in TPM | Fact #2, #13 |
|
||||
| Implementation cost | Already implemented | High: fTPM provisioning pipeline, tpm2-pytss integration, new security module, Docker device mounts, dual-path for SaaS | Medium: same TPM integration as B, but simpler download logic (remove split/merge code) | Fact #3, #4, #7, #9 |
|
||||
| Maintenance cost | Moderate: two download paths (API+CDN), split/merge logic, two key types | Lower: single download path, single key type, but TPM provisioning infrastructure | Lowest: single download, TPM key management; CDN used for bandwidth only (no security split) | Fact #20 |
|
||||
| Risk | Low (proven, in production) | High: fTPM persistence bugs (#5,#6), offline-only provisioning, REE FS (no RPMB on Nano) | Medium: same TPM risks as B, but fallback to legacy scheme mitigates | Fact #5, #6, #16 |
|
||||
| Migration difficulty | N/A | Very high: all devices must be re-provisioned; no backward compatibility | Medium: feature-flag based; TPM-provisioned devices use new path, others use legacy | Fact #11 |
|
||||
| Applicable scenarios | All current: laptops, edge, SaaS | Jetson Orin Nano (with fTPM) only; SaaS needs separate solution | Jetson Orin Nano gets TPM path; SaaS/non-TPM devices get simplified authenticated download (no split needed if server is trusted) | Fact #11, #18 |
|
||||
@@ -0,0 +1,111 @@
|
||||
# Reasoning Chain
|
||||
|
||||
## Dimension 1: Is binary-split still necessary for security?
|
||||
|
||||
### Fact Confirmation
|
||||
The binary-split was designed for untrusted laptops (Fact #12): if an attacker compromises the CDN, they get 99% of the model but cannot reconstruct it without the API-held 1%. The threat is physical access to an untrusted device.
|
||||
|
||||
### Reference Comparison
|
||||
On Jetson Orin Nano with fTPM (Fact #2): the encryption key is derived from hardware fuses. Even with full disk access, the attacker cannot extract the key without the specific TPM hardware. The device itself is the trust anchor, not the storage distribution.
|
||||
|
||||
### Conclusion
|
||||
For TPM-equipped devices, split-storage adds complexity without adding security. The TPM hardware binding is strictly stronger than distributing fragments across servers. Binary-split's security purpose is obsolete on TPM devices.
|
||||
|
||||
### Confidence
|
||||
✅ High — hardware-fuse-derived keys are fundamentally stronger than software-computed hashes.
|
||||
|
||||
---
|
||||
|
||||
## Dimension 2: Is CDN offloading still valuable without split?
|
||||
|
||||
### Fact Confirmation
|
||||
ADR-002 lists two reasons for binary-split (Fact #20): (1) security (prevent single-point compromise) and (2) bandwidth/cost (large files on CDN, small metadata on API).
|
||||
|
||||
### Reference Comparison
|
||||
If security is handled by TPM device binding, the CDN offloading benefit remains valid for large AI models. But the *splitting* mechanism (small+big parts) is unnecessary — a single encrypted file on CDN with an authenticated download URL achieves the same bandwidth benefit.
|
||||
|
||||
### Conclusion
|
||||
CDN usage should remain for bandwidth optimization. But the split-and-merge encryption scheme can be replaced by a simpler pattern: encrypt the whole resource with a TPM-sealed key, store on CDN, download as single file.
|
||||
|
||||
### Confidence
|
||||
✅ High — bandwidth and security are orthogonal concerns.
|
||||
|
||||
---
|
||||
|
||||
## Dimension 3: Can tpm2-pytss integrate with the Cython codebase?
|
||||
|
||||
### Fact Confirmation
|
||||
tpm2-pytss (Fact #9, #17) is a Python library calling native tpm2-tss via CFFI. It provides FAPI with create_seal, unseal, encrypt, decrypt. The loader's security module is Cython (.pyx) calling Python cryptographic libraries.
|
||||
|
||||
### Reference Comparison
|
||||
The current security.pyx already calls Python libraries (cryptography.hazmat). tpm2-pytss follows the same pattern — Python calls to a native library. Cython can call tpm2-pytss the same way.
|
||||
|
||||
### Conclusion
|
||||
No architectural barrier. tpm2-pytss integrates naturally alongside existing cryptography library usage.
|
||||
|
||||
### Confidence
|
||||
✅ High — same integration pattern as existing code.
|
||||
|
||||
---
|
||||
|
||||
## Dimension 4: What about SaaS/non-TPM deployments?
|
||||
|
||||
### Fact Confirmation
|
||||
The loader now runs on both Jetson edge devices and SaaS web servers (Fact #11). TPM is machine-specific — works for fixed edge devices but SaaS VMs may not have TPM (or have vTPM with different trust properties).
|
||||
|
||||
### Reference Comparison
|
||||
Alternative approaches exist for non-TPM environments (Fact #18): signed manifests, asymmetric signature verification, authenticated downloads. For SaaS servers that the company controls, the threat model is different — the server is trusted, so split-storage is unnecessary even without TPM.
|
||||
|
||||
### Conclusion
|
||||
Two-tier strategy: (1) Jetson devices use TPM-sealed keys for strongest binding; (2) SaaS servers use standard authenticated download (no split needed since server is trusted infrastructure). The binary-split complexity is needed for neither scenario.
|
||||
|
||||
### Confidence
|
||||
✅ High — different deployment contexts have different threat models.
|
||||
|
||||
---
|
||||
|
||||
## Dimension 5: fTPM production readiness
|
||||
|
||||
### Fact Confirmation
|
||||
Forum reports (Fact #5, #6): PCR7 values not persisting across reboots; NV handles lost after reboot. RPMB not available on Orin Nano (Fact #16) — only REE FS.
|
||||
|
||||
### Reference Comparison
|
||||
The proposed design does NOT rely on PCR-sealed keys or NV indexes. The key workflow uses FAPI create_seal/unseal with the Storage Root Key (SRK) hierarchy, which derives from the hardware fuse seed (Fact #2). This is independent of PCR persistence and NV storage issues.
|
||||
|
||||
### Conclusion
|
||||
The PCR/NV persistence bugs are not blocking for this use case. FAPI seal/unseal under the SRK hierarchy uses the persistent primary key derived from fuses, not PCR-gated policies. However, this should be validated on actual hardware before committing.
|
||||
|
||||
### Confidence
|
||||
⚠️ Medium — reasoning is sound but needs hardware validation.
|
||||
|
||||
---
|
||||
|
||||
## Dimension 6: Manufacturing pipeline impact
|
||||
|
||||
### Fact Confirmation
|
||||
fTPM provisioning requires (Fact #3, #4): per-device KDK0 generation, fuse burning, EK certificate via CA, EKB encoding. Only offline provisioning supported.
|
||||
|
||||
### Reference Comparison
|
||||
The current loader requires no manufacturing-time setup — credentials are provided at runtime. Adding fTPM provisioning is a significant operational change.
|
||||
|
||||
### Conclusion
|
||||
fTPM provisioning is the biggest non-code cost. However, if Jetson devices are already manufactured by an OEM partner, fTPM provisioning can be integrated into the existing flashing pipeline. For development/testing, a simulated TPM (swtpm) can be used.
|
||||
|
||||
### Confidence
|
||||
⚠️ Medium — depends on OEM manufacturing pipeline.
|
||||
|
||||
---
|
||||
|
||||
## Dimension 7: Migration path
|
||||
|
||||
### Fact Confirmation
|
||||
Existing deployments use binary-split. New deployments can use TPM. Both must coexist during transition.
|
||||
|
||||
### Reference Comparison
|
||||
Feature-flag pattern: detect at startup whether /dev/tpm0 exists and is provisioned. If yes, use TPM key path. If no, fall back to legacy binary-split. The API contracts (F1-F6) remain unchanged — the security layer is internal.
|
||||
|
||||
### Conclusion
|
||||
A SecurityProvider abstraction (interface) with two implementations (LegacySecurityProvider, TpmSecurityProvider) enables clean coexistence. Detection is automatic. No API changes required.
|
||||
|
||||
### Confidence
|
||||
✅ High — standard abstraction pattern, no external dependencies on migration.
|
||||
@@ -0,0 +1,46 @@
|
||||
# Validation Log
|
||||
|
||||
## Validation Scenario
|
||||
A Jetson Orin Nano edge device with fTPM provisioned needs to download an AI model, decrypt it, and load it. A SaaS web server without TPM needs the same model.
|
||||
|
||||
## Expected Based on Conclusions
|
||||
|
||||
### Jetson Orin Nano (TPM path):
|
||||
1. Loader starts, detects /dev/tpm0 → TpmSecurityProvider
|
||||
2. POST /login → JWT auth (unchanged)
|
||||
3. POST /load/{model} → single encrypted download from CDN via authenticated URL
|
||||
4. TPM unseals the device-specific decryption key
|
||||
5. Model decrypted and returned to caller
|
||||
|
||||
### SaaS web server (no-TPM path):
|
||||
1. Loader starts, no /dev/tpm0 → LegacySecurityProvider (or SimplifiedSecurityProvider)
|
||||
2. POST /login → JWT auth (unchanged)
|
||||
3. POST /load/{model} → single authenticated download (no split needed — server is trusted)
|
||||
4. Standard key derivation from credentials
|
||||
5. Model decrypted and returned to caller
|
||||
|
||||
### Docker unlock (Jetson):
|
||||
1. POST /unlock → authenticate
|
||||
2. Download key → TPM-sealed key used instead of key fragment download
|
||||
3. Decrypt archive → same as current but with TPM-derived key
|
||||
4. docker load → unchanged
|
||||
|
||||
## Actual Validation Results
|
||||
The scenario is consistent with the proposed architecture. Key observations:
|
||||
- API endpoints remain identical (F1-F6 contracts preserved)
|
||||
- The security layer change is internal — callers don't know which provider is active
|
||||
- CDN is still used for bandwidth (large model storage) but serves single files, not split parts
|
||||
- Upload flow (F3) simplifies: encrypt whole file, upload to CDN + register on API (no split)
|
||||
|
||||
## Counterexamples
|
||||
1. **What if a device needs to be re-provisioned?** — fTPM provisioning is manufacturing-time. If a device's fTPM state is corrupted, it needs re-flashing. This is acceptable for edge devices (they're managed hardware) but must be documented.
|
||||
2. **What if the same model needs to work across TPM and non-TPM devices?** — Models are encrypted per-deployment. TPM devices get a device-specific encrypted copy. Non-TPM devices get a credentials-encrypted copy. The API server handles the distinction.
|
||||
|
||||
## Review Checklist
|
||||
- [x] Draft conclusions consistent with fact cards
|
||||
- [x] No important dimensions missed
|
||||
- [x] No over-extrapolation
|
||||
- [x] Conclusions actionable/verifiable
|
||||
|
||||
## Conclusions Requiring Revision
|
||||
None. The hybrid approach (Solution C) is validated as feasible and superior to both status quo and full-TPM-only.
|
||||
Reference in New Issue
Block a user