mirror of
https://github.com/azaion/loader.git
synced 2026-04-22 10:16:32 +00:00
[AZ-187] Rules & cleanup
Made-with: Cursor
This commit is contained in:
@@ -0,0 +1,66 @@
|
||||
# Security Analysis: TPM-Based Security Replacing Binary-Split
|
||||
|
||||
## Threat Model
|
||||
|
||||
### Asset Inventory
|
||||
|
||||
| Asset | Value | Current Protection | Proposed Protection (TPM) |
|
||||
|-------|-------|--------------------|--------------------------|
|
||||
| AI model files | High — core IP | AES-256-CBC, split storage (API+CDN), per-user+hw key | AES-256-CBC, TPM-sealed device key, single encrypted storage |
|
||||
| Docker image archive | High — service IP | AES-256-CBC, key fragment from API | AES-256-CBC, TPM-sealed key (no network key download) |
|
||||
| User credentials | Medium | In-memory only | In-memory only (unchanged) |
|
||||
| JWT tokens | Medium | In-memory, no signature verification | In-memory (unchanged; signature verification is a separate concern) |
|
||||
| CDN credentials | Medium | Encrypted cdn.yaml from API | Same (unchanged) |
|
||||
| Encryption keys | Critical | SHA-384 derived, in memory | TPM-sealed, never in user-space memory in plaintext |
|
||||
|
||||
### Threat Actors
|
||||
|
||||
| Actor | Capability | Motivation |
|
||||
|-------|-----------|-----------|
|
||||
| Physical attacker (edge) | Physical access to Jetson device, can extract storage | Steal AI models |
|
||||
| Network attacker | MITM, API/CDN compromise | Intercept models in transit |
|
||||
| Insider (compromised server) | Access to API or CDN backend | Extract stored model fragments |
|
||||
| Reverse engineer | Access to loader binary (.so files) | Extract key derivation logic, salts |
|
||||
|
||||
### Attack Vectors — Current vs Proposed
|
||||
|
||||
| Attack Vector | Current (Binary-Split) | Proposed (TPM) | Delta |
|
||||
|--------------|----------------------|----------------|-------|
|
||||
| **Extract model from disk** | Must obtain both CDN big part + API small part. If attacker has disk, big part is local. Need API access for small part. | Model encrypted with TPM-sealed key. Key cannot be extracted without the specific TPM hardware. | **Stronger** — hardware binding vs. server-side fragmentation |
|
||||
| **Clone device** | Replicate hardware fingerprint strings (CPU model, GPU, etc.) → derive same SHA-384 key | Cannot clone fTPM — seed derived from hardware fuses, unique per chip | **Stronger** — fuse-based vs. string-based identity |
|
||||
| **Compromise CDN** | Get big parts only — useless without small parts from API | Get encrypted files — useless without TPM-sealed key on target device | **Equivalent** — both require a second factor |
|
||||
| **Compromise API** | Get small parts + key fragments. Combined with CDN data = full model | Get encrypted metadata. Key is TPM-sealed, not on API server | **Stronger** — API no longer holds key material |
|
||||
| **Reverse-engineer loader binary** | Extract salt strings from .so → reconstruct SHA-384 key derivation → derive keys for any known email+password+hw combo | TPM key derivation is in hardware. Even with full .so source, keys are not reconstructable | **Stronger** — hardware vs. software key protection |
|
||||
| **Memory dump at runtime** | Keys exist in Python process memory during encrypt/decrypt operations | With FAPI: encryption happens via TPM — key never enters user-space memory | **Stronger** — key stays in TPM |
|
||||
| **Stolen credentials** | Attacker with email+password can derive all keys if they also know hw fingerprint | Credentials alone are insufficient — TPM-sealed key requires the physical device | **Stronger** — credentials are not sufficient |
|
||||
|
||||
## Per-Component Security Requirements
|
||||
|
||||
| Component | Requirement | Risk Level | Proposed Control |
|
||||
|-----------|------------|------------|-----------------|
|
||||
| SecurityProvider detection | Must correctly identify TPM availability; false positive → crash; false negative → weaker security | Medium | Check /dev/tpm0 existence + attempt TPM connection; fall back to legacy on any failure |
|
||||
| TPM key sealing | Sealed key must only be unsealable on the provisioned device | High | Use FAPI create_seal under SRK hierarchy; no PCR policy (avoids persistence bugs); auth password optional |
|
||||
| Docker device mount | /dev/tpm0 and /dev/tpmrm0 must be accessible in container | Medium | docker-compose.yml --device mounts; no --privileged |
|
||||
| Legacy fallback | Must remain fully functional for non-TPM devices | High | Existing security module unchanged; SecurityProvider delegates to it |
|
||||
| Key rotation | TPM-sealed keys should be rotatable without re-provisioning | Medium | Seal a wrapping key in TPM; actual resource keys wrapped by it; rotate resource keys independently |
|
||||
| CDN authenticated download | Single-file download must use authenticated URLs (not public) | High | Signed S3 URLs with expiration; existing CDN auth mechanism |
|
||||
|
||||
## Security Controls Summary
|
||||
|
||||
### Authentication
|
||||
- **Unchanged**: JWT Bearer tokens from Azaion Resource API
|
||||
- **Enhanced (TPM path)**: Device attestation possible via EK certificate (future enhancement, not in initial scope)
|
||||
|
||||
### Data Protection
|
||||
- **At rest**: AES-256-CBC encrypted resources. Key sealed in TPM (Jetson) or derived from credentials (legacy).
|
||||
- **In transit**: HTTPS for all API/CDN calls (unchanged)
|
||||
- **In TPM**: Encryption key never enters user-space memory. FAPI handles encrypt/decrypt within TPM boundary.
|
||||
|
||||
### Key Management
|
||||
- **TPM path**: Master key sealed at provisioning time → stored in TPM NV or as sealed blob in REE FS → unsealed at runtime via FAPI → used to derive/unwrap resource-specific keys
|
||||
- **Legacy path**: SHA-384 key derivation from email+password+hw_hash+salt (unchanged)
|
||||
- **Key rotation**: Wrap resource keys with TPM-sealed master key; rotate resource keys without re-provisioning TPM
|
||||
|
||||
### Logging & Monitoring
|
||||
- **Unchanged**: Loguru file + stdout/stderr logging
|
||||
- **Addition**: Log SecurityProvider selection at startup (which path was chosen and why)
|
||||
@@ -0,0 +1,112 @@
|
||||
# Solution Draft: TPM-Based Security Replacing Binary-Split
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
Replace the binary-split resource scheme with a TPM-aware security architecture that uses hardware-rooted keys on Jetson Orin Nano devices and simplified authenticated downloads elsewhere. The loader gains a `SecurityProvider` abstraction with two implementations: `TpmSecurityProvider` (fTPM-based, for provisioned Jetson devices) and `LegacySecurityProvider` (current scheme, for backward compatibility). The binary-split upload/download logic is simplified to single-file encrypted resources stored on CDN, with the split mechanism retained only in the legacy path.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ Loader (FastAPI) │
|
||||
│ ┌────────────┐ ┌─────────────────────┐ │
|
||||
│ │ HTTP API │───▶│ SecurityProvider │ │
|
||||
│ │ (F1-F6) │ │ (interface) │ │
|
||||
│ └────────────┘ └──────┬──────────────┘ │
|
||||
│ ┌─────┴──────┐ │
|
||||
│ ┌──────┴──┐ ┌──────┴───────┐ │
|
||||
│ │ TpmSec │ │ LegacySec │ │
|
||||
│ │ Provider│ │ Provider │ │
|
||||
│ └────┬────┘ └──────┬──-────┘ │
|
||||
│ │ │ │
|
||||
│ /dev/tpm0 SHA-384 keys │
|
||||
│ (fTPM) (current scheme) │
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
|
||||
| Solution | Approach | Applicability |
|
||||
|----------|----------|---------------|
|
||||
| SecEdge SEC-TPM | Firmware TPM for edge AI device trust, model binding, attestation | Directly applicable — same problem space |
|
||||
| Tinfoil Containers | TEE-based (Intel TDX / AMD SEV-SNP) with attestation | Cloud/data center focus; not applicable to Jetson ARM64 |
|
||||
| Thistle OTA | Signed manifests + asymmetric verification, no hardware binding | Weaker than TPM but works without hardware support |
|
||||
| Amulet (TEE-shielded inference) | OP-TEE based model obfuscation for ARM TrustZone | Interesting for inference protection; complementary to our approach |
|
||||
| NVIDIA Confidential Computing | H200/B200 GPU TEEs | Data center only; not applicable to Orin Nano |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: Security Provider Abstraction
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||||
| Python ABC + runtime detection | abc module, os.path.exists("/dev/tpm0") | Simple, no deps, auto-selects at startup | Detection is binary (TPM or not) | None | N/A | Zero | Best |
|
||||
| Config-file based selection | YAML/env var SECURITY_PROVIDER=tpm\|legacy | Explicit control, testable | Manual configuration per device | Config management | N/A | Zero | Good |
|
||||
|
||||
**Recommendation**: Runtime detection with config override. Check /dev/tpm0 by default; allow SECURITY_PROVIDER env var to force a specific provider.
|
||||
|
||||
### Component: TPM Key Management
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||||
| tpm2-pytss FAPI | tpm2-pytss (PyPI), tpm2-tss native lib | High-level Python API (create_seal, unseal, encrypt, decrypt); mature project | Requires tpm2-tss native lib installed; FAPI config needed | tpm2-tss >= 2.4.0, Python 3.11 | Hardware-rooted keys from device fuses | Low (open source) | Best |
|
||||
| tpm2-tools via subprocess | tpm2-tools CLI, subprocess calls | No Python bindings needed; well-documented CLI | Subprocess overhead; harder to test; string parsing | tpm2-tools installed in container | Same | Low | Acceptable |
|
||||
| Custom OP-TEE TA | C TA in OP-TEE, Python CA via libteec | Maximum control; no dependency on TPM stack | Very high development effort; C code in secure world | OP-TEE dev environment, ARM toolchain | Strongest (code runs in TrustZone) | High | Overkill |
|
||||
|
||||
**Recommendation**: tpm2-pytss FAPI. High-level API, Python-native, same pattern as existing cryptography library usage.
|
||||
|
||||
### Component: Resource Download (simplified)
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||||
| Single encrypted file on CDN | boto3 (existing), CDN signed URLs | Removes split/merge complexity; single download | Larger download per request (no partial caching) | CDN config | Encrypted at rest + in transit | Same CDN cost | Best |
|
||||
| Keep CDN big + API small (current) | Existing code | No migration needed | Unnecessary complexity for TPM path | Both API and CDN | Split-key defense | Same | Legacy only |
|
||||
|
||||
**Recommendation**: Single-file download for TPM path. Legacy path retains split for backward compatibility.
|
||||
|
||||
### Component: Docker Unlock (TPM-enhanced)
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||||
| TPM-sealed archive key | fTPM, tpm2-pytss | Key never leaves TPM; no network download needed for key | Requires provisioned fTPM | fTPM provisioned with sealed key | Strongest — offline decryption possible | Low | Best |
|
||||
| Key fragment from API (current) | HTTPS download | Works without TPM | Requires network; key fragment in memory | API reachable | Current level | Zero | Legacy only |
|
||||
|
||||
**Recommendation**: TPM-sealed archive key for provisioned devices. The key can be sealed into the TPM during device provisioning, eliminating the need to download a key fragment at unlock time.
|
||||
|
||||
### Component: Migration/Coexistence
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||||
| Feature flag + SecurityProvider abstraction | ABC, env var, /dev/tpm0 detection | Clean separation; zero risk to existing deployments | Two code paths to maintain during transition | None | Both paths maintain security | Low | Best |
|
||||
| Hard cutover | N/A | Simple (one path) | Breaks non-TPM devices | All devices must have TPM | N/A | High risk | Poor |
|
||||
|
||||
**Recommendation**: Feature flag with auto-detection. Gradual rollout.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- SecurityProvider auto-detection: with and without /dev/tpm0
|
||||
- TpmSecurityProvider: seal/unseal round-trip (requires TPM simulator — swtpm)
|
||||
- LegacySecurityProvider: all existing tests pass unchanged
|
||||
- Single-file download: encrypt → upload → download → decrypt round-trip
|
||||
- Docker unlock with TPM-sealed key: decrypt archive without network key download
|
||||
- Migration: same resource accessible via both providers (different encryption)
|
||||
|
||||
### Non-Functional Tests
|
||||
- Performance: TPM seal/unseal latency vs current SHA-384 key derivation
|
||||
- Performance: single-file download vs split download (expect improvement)
|
||||
- Security: verify TPM-sealed key cannot be extracted without hardware
|
||||
- Security: verify legacy path still works identically to current behavior
|
||||
|
||||
## References
|
||||
- NVIDIA Jetson Linux Developer Guide r36.4.4 — Firmware TPM: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/FirmwareTPM.html
|
||||
- NVIDIA JetPack 6.1 Blog: https://developer.nvidia.com/blog/nvidia-jetpack-6-1-boosts-performance-and-security-through-camera-stack-optimizations-and-introduction-of-firmware-tpm/
|
||||
- tpm2-pytss: https://github.com/tpm2-software/tpm2-pytss
|
||||
- tpm2-pytss FAPI docs: https://tpm2-pytss.readthedocs.io/en/latest/fapi.html
|
||||
- SecEdge — Securing Edge AI through Trusted Computing: https://www.secedge.com/tcg-blog-securing-edge-ai-through-trusted-computing/
|
||||
- Thistle Technologies — Securing AI Models on Edge Devices: https://thistle.tech/blog/securing-ai-models-on-edge-devices
|
||||
- NVIDIA Developer Forums — fTPM PCR issues: https://forums.developer.nvidia.com/t/access-ftpm-pcr-registers/328636
|
||||
- Docker TPM access: https://devops.stackexchange.com/questions/8509/accessing-tpm-from-inside-a-docker-container
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/02_task_plans/tpm-replaces-binary-split/00_research/00_ac_assessment.md`
|
||||
- Fact Cards: `_docs/02_task_plans/tpm-replaces-binary-split/00_research/02_fact_cards.md`
|
||||
- Reasoning Chain: `_docs/02_task_plans/tpm-replaces-binary-split/00_research/04_reasoning_chain.md`
|
||||
@@ -0,0 +1,798 @@
|
||||
# Solution Draft 02: TPM Security Implementation Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This document is a comprehensive implementation guide for replacing the binary-split resource scheme with TPM-based hardware-rooted security on Jetson Orin Nano devices. It covers fTPM provisioning, full-disk encryption, OS hardening, tamper-responsive enclosures, the simplified loader architecture, and a phased implementation plan.
|
||||
|
||||
Prerequisite reading: `solution_draft01.md` (architecture overview), `security_analysis.md` (threat model).
|
||||
|
||||
---
|
||||
|
||||
## 1. fTPM Fusing and Provisioning
|
||||
|
||||
### 1.1 Hardware Required
|
||||
|
||||
|
||||
| Item | Purpose | Cost |
|
||||
| --------------------------------------- | ----------------------------------------- | -------- |
|
||||
| x86 Ubuntu host PC (20.04 or 22.04 LTS) | Runs NVIDIA flaekshing/fusing tools | Existing |
|
||||
| USB-C cable (data-capable) | Connects host to Jetson in recovery mode | ~$10 |
|
||||
| Jetson Orin Nano dev kit (expendable) | First fuse target; fusing is irreversible | ~$250 |
|
||||
| Jetson Orin Nano dev kit (kept unfused) | Ongoing development and debugging | ~$250 |
|
||||
|
||||
|
||||
No specialized lab equipment, JTAG probes, or custom tooling is required. The entire fusing and provisioning process runs on a standard PC.
|
||||
|
||||
### 1.2 Roles: ODM vs OEM
|
||||
|
||||
NVIDIA's fTPM docs describe two separate entities:
|
||||
|
||||
- **ODM (Original Design Manufacturer)**: Designs the fTPM integration, generates KDK0 per device, runs the CA server, signs EK certificates, creates firmware packages.
|
||||
- **OEM (Original Equipment Manufacturer)**: Adds disk encryption keys, assembles hardware, burns fuses at the factory, ships the final product.
|
||||
|
||||
In large-scale manufacturing these are different companies with a formal key handoff. **In our case, we are both ODM and OEM** — we design, provision, flash, and deploy ourselves. NVIDIA covers this in their fTPM guide Appendix B with a **simplified single-entity flow** that eliminates the cross-company handoff and roughly halves the provisioning complexity.
|
||||
|
||||
### 1.3 Key Derivation Chain
|
||||
|
||||
The full derivation from hardware fuses to usable keys:
|
||||
|
||||
```
|
||||
KDK0 (256-bit random, burned into SoC fuses at manufacturing)
|
||||
│
|
||||
├── Silicon_ID = KDF(key=KDK0, info=Device_SN)
|
||||
│ Device_SN = OEM_ID || SN (unique per device)
|
||||
│
|
||||
├── fTPM_Seed = KDF(key=Silicon_ID, constant_str1)
|
||||
│ Passed from MB2 bootloader to OP-TEE via encrypted TrustZone memory
|
||||
│
|
||||
├── fTPM_Root_Seed = KDF(key=fTPM_Seed, constant_str)
|
||||
│
|
||||
├── EPS = KDF(key=fTPM_Root_Seed, info=Device_SN, salt=EPS_Seed)
|
||||
│ EPS_Seed is a 256-bit random number from odm_ekb_gen.py, stored in EKB
|
||||
│ EPS (Endorsement Primary Seed) is the root identity of the fTPM entity
|
||||
│
|
||||
├── SRK = TPM2_CreatePrimary(EPS)
|
||||
│ Deterministic — re-derived from EPS on every boot
|
||||
│ Never stored persistently, never leaves the secure world
|
||||
│
|
||||
└── Sealed blobs (your encryption keys)
|
||||
Encrypted under SRK, stored as files on disk
|
||||
Only unsealable on this specific device
|
||||
```
|
||||
|
||||
Every KDF step is one-way. Knowing a derived value does not reveal its parent. Two devices with different KDK0 values produce entirely different key trees.
|
||||
|
||||
### 1.4 Provisioning Process (Single-Entity / ODM+OEM Flow)
|
||||
|
||||
#### Step 1: Install BSP and FSKP Packages
|
||||
|
||||
```
|
||||
mkdir ${BSP_TOP} && cd ${BSP_TOP}
|
||||
tar jvxf jetson_linux_${rel_ver}_aarch64.tbz2
|
||||
tar jvxf public_sources.tbz2
|
||||
cd Linux_for_Tegra/rootfs
|
||||
sudo tar jvxpf tegra_linux_sample-root-filesystem_${rel_ver}_aarch64.tbz2
|
||||
cd ${BSP_TOP}/Linux_for_Tegra
|
||||
sudo ./apply_binaries.sh
|
||||
cd ${BSP_TOP}
|
||||
tar jvxf fskp_partner_t234_${rel_ver}_aarch64.tbz2
|
||||
```
|
||||
|
||||
#### Step 2: Generate PKC and SBK Keys (Secure Boot)
|
||||
|
||||
```
|
||||
openssl genrsa -out pkc.pem 3072
|
||||
python3 gen_sbk_key.py --out sbk.key
|
||||
```
|
||||
|
||||
PKC (Public Key Cryptography) key signs all boot chain images. SBK (Secure Boot Key) encrypts them. Both are burned into fuses and used for every subsequent flash.
|
||||
|
||||
#### Step 3: Generate Per-Device KDK0 and Silicon_ID
|
||||
|
||||
```
|
||||
python3 kdk_gen.py \
|
||||
--oem-id ${OEM_ID} \
|
||||
--sn ${DEVICE_SN} \
|
||||
--output-dir ${KDK_DB}
|
||||
```
|
||||
|
||||
Outputs per device: KDK0 (256-bit), Device_SN, Silicon_ID public key. **KDK0 must be discarded after the fuseblob and EKB are generated** — keeping it in storage risks leaks.
|
||||
|
||||
#### Step 4: Generate Fuseblob
|
||||
|
||||
```
|
||||
python3 fskp_fuseburn.py \
|
||||
--kdk-db ${KDK_DB} \
|
||||
--pkc-key pkc.pem \
|
||||
--sbk-key sbk.key \
|
||||
--fuse-xml fuse_config.xml \
|
||||
--output-dir ${FUSEBLOB_DB}
|
||||
```
|
||||
|
||||
The fuse config XML specifies which fuses to burn: KDK0, PKC hash, SBK, OEM_K1, SECURITY_MODE, ARM_JTAG_DISABLE, etc.
|
||||
|
||||
#### Step 5: Generate fTPM EKB (EK Certificates + EPS Seed)
|
||||
|
||||
```
|
||||
python3 odm_ekb_gen.py \
|
||||
--kdk-db ${KDK_DB} \
|
||||
--output-dir ${EKB_FTPM_DB}
|
||||
```
|
||||
|
||||
This generates EK CSRs, signs them with your CA, and packages the EPS Seed + EK certificates into per-device EKB images. In the single-entity flow, you run your own CA:
|
||||
|
||||
```
|
||||
python3 ftpm_manufacturer_ca_simulator.sh # Replace with real CA in production
|
||||
```
|
||||
|
||||
Then merge with disk encryption keys:
|
||||
|
||||
```
|
||||
python3 oem_ekb_gen.py \
|
||||
--ekb-ftpm-db ${EKB_FTPM_DB} \
|
||||
--user-keys sym2_t234.key \
|
||||
--oem-k1 oem_k1.key \
|
||||
--output-dir ${EKB_FINAL_DB}
|
||||
```
|
||||
|
||||
#### Step 6: Burn Fuses (IRREVERSIBLE)
|
||||
|
||||
Put the device in USB recovery mode:
|
||||
|
||||
- If powered off: connect DC power (device enters recovery automatically on some carrier boards)
|
||||
- If powered on: `sudo reboot --force forced-recover`
|
||||
- Verify: `lsusb` shows NVIDIA device
|
||||
|
||||
Test first (dry run):
|
||||
|
||||
```
|
||||
sudo ./odmfuse.sh --test -X fuse_config.xml -i 0x23 jetson-orin-nano-devkit
|
||||
```
|
||||
|
||||
Burn for real:
|
||||
|
||||
```
|
||||
sudo ./odmfuse.sh -X fuse_config.xml -i 0x23 jetson-orin-nano-devkit
|
||||
```
|
||||
|
||||
After `SECURITY_MODE` fuse is burned (value 0x1), **all further fuse writes are blocked permanently** (except a few ODM-reserved fuses).
|
||||
|
||||
#### Step 7: Flash Signed + Encrypted Images
|
||||
|
||||
```
|
||||
sudo ROOTFS_ENC=1 ./flash.sh \
|
||||
-u pkc.pem \
|
||||
-v sbk.key \
|
||||
-i ./sym2_t234.key \
|
||||
--ekb ${EKB_FINAL_DB}/ekb-${DEVICE_SN}.signed \
|
||||
jetson-orin-nano-devkit \
|
||||
nvme0n1p1
|
||||
```
|
||||
|
||||
#### Step 8: On-Device fTPM Provisioning (One-Time)
|
||||
|
||||
After first boot, run the provisioning script on the device:
|
||||
|
||||
```
|
||||
sudo ./ftpm_provisioning.sh
|
||||
```
|
||||
|
||||
This queries EK certificates from the EKB, stores them in fTPM NV memory, takes fTPM ownership, and creates EK handles. Only needs to run once per device.
|
||||
|
||||
### 1.5 Difficulty Assessment
|
||||
|
||||
|
||||
| Aspect | Difficulty | Notes |
|
||||
| ----------------------------- | ------------------- | ---------------------------------------------------- |
|
||||
| First device (learning curve) | Medium-High | NVIDIA docs are detailed but dense. Budget 2-3 days. |
|
||||
| Subsequent devices (scripted) | Low | Same pipeline, different KDK0/SN per device. |
|
||||
| Risk | High (irreversible) | Always test on expendable dev board first. |
|
||||
| Automation potential | High | Entire pipeline is scriptable for factory floor. |
|
||||
|
||||
|
||||
### 1.6 Known Issues
|
||||
|
||||
- `odmfuseread.sh` has a Python 3 compatibility bug: `getiterator()` deprecated. Fix: replace line 1946 in `tegraflash_impl_t234.py` with `xml_tree.iter('file')`.
|
||||
- Forum reports of PCR7 values not persisting across reboots. Our design deliberately avoids PCR-sealed keys — we use FAPI seal/unseal under SRK hierarchy only.
|
||||
- Forum reports of NV handle loss after reboot on some Orin devices. Not blocking for our use case (SRK is re-derived from fuses, not stored in NV).
|
||||
|
||||
---
|
||||
|
||||
## 2. Storage Encryption
|
||||
|
||||
### 2.1 Recommendation: Full-Disk Encryption
|
||||
|
||||
Encrypt the entire NVMe rootfs partition, not just selected model files.
|
||||
|
||||
**Why full disk instead of selective encryption:**
|
||||
|
||||
|
||||
| Approach | Protects models | Protects logs/config/temp files | Custom code needed | Performance |
|
||||
| ---------------------------- | --------------- | ------------------------------------------------ | --------------------------------------- | ---------------------------- |
|
||||
| Selective (model files only) | Yes | No — metadata, logs, decrypted artifacts exposed | Yes — application-level encrypt/decrypt | Minimal |
|
||||
| Full disk (LUKS) | Yes | Yes — everything on disk is ciphertext | No — kernel handles it transparently | Minimal (HW-accelerated AES) |
|
||||
|
||||
|
||||
Full-disk encryption is built into NVIDIA's Jetson Linux stack. No application code changes needed for the disk layer.
|
||||
|
||||
### 2.2 How Full-Disk Encryption Works
|
||||
|
||||
```
|
||||
Flashing (host PC):
|
||||
gen_ekb → sym2_t234.key (DEK) + eks_t234.img (EKB image)
|
||||
ROOTFS_ENC=1 flash.sh → rootfs encrypted with DEK, DEK packaged in EKB
|
||||
|
||||
Boot (on device):
|
||||
MB2 reads KDK0 from fuses
|
||||
→ derives K1
|
||||
→ decrypts EKB
|
||||
→ extracts DEK
|
||||
→ passes DEK to dm-crypt kernel module
|
||||
dm-crypt + LUKS mounts rootfs transparently
|
||||
Application sees a normal filesystem — encryption is invisible
|
||||
```
|
||||
|
||||
The application never touches the disk encryption key. It's handled entirely in the kernel, initialized before the OS starts.
|
||||
|
||||
### 2.3 Double Encryption (Defense in Depth)
|
||||
|
||||
For AI model files, two independent encryption layers:
|
||||
|
||||
1. **Layer 1 — Full Disk LUKS** (kernel): Protects everything on disk. Key derived from fuses via EKB. Transparent to applications.
|
||||
2. **Layer 2 — Application-level TPM-sealed encryption**: Model files encrypted with a key sealed in the fTPM. Decrypted by the loader at runtime.
|
||||
|
||||
An attacker who somehow bypasses disk encryption (e.g., cold boot while the filesystem is mounted) still faces the application-level encryption. And vice versa.
|
||||
|
||||
### 2.4 Setup Steps
|
||||
|
||||
1. Generate encryption keys from OP-TEE source:
|
||||
```
|
||||
cd ${BSP_TOP}/Linux_for_Tegra/source/nvidia-jetson-optee-source
|
||||
cd optee/samples/hwkey-agent/host/tool/gen_ekb/
|
||||
sudo chmod +x example.sh && ./example.sh
|
||||
```
|
||||
Outputs: `sym2_t234.key` (DEK) and `eks_t234.img` (EKB image).
|
||||
2. Place keys:
|
||||
```
|
||||
cp sym2_t234.key ${BSP_TOP}/Linux_for_Tegra/
|
||||
cp eks_t234.img ${BSP_TOP}/Linux_for_Tegra/bootloader/
|
||||
```
|
||||
3. Verify EKB integrity:
|
||||
```
|
||||
hexdump -C -n 4 -s 0x24 eks_t234.img
|
||||
# Must show magic bytes "EEKB"
|
||||
```
|
||||
4. Configure NVMe partition size in `flash_l4t_t234_nvme_rootfs_enc.xml`:
|
||||
- Set `NUM_SECTORS` based on NVMe capacity (e.g., 900000000 for 500GB)
|
||||
- Set `encrypted="true"` for the rootfs partition
|
||||
5. Flash with encryption:
|
||||
```
|
||||
sudo ROOTFS_ENC=1 ./tools/kernel_flash/l4t_initrd_flash.sh \
|
||||
--external-device nvme0n1p1 \
|
||||
-c flash_l4t_t234_nvme_rootfs_enc.xml \
|
||||
-i ./sym2_t234.key \
|
||||
-u pkc.pem -v sbk.key \
|
||||
jetson-orin-nano-devkit \
|
||||
nvme0n1p1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Debug Access Strategy
|
||||
|
||||
### 3.1 The Problem
|
||||
|
||||
After Secure Boot fusing, JTAG disabling, and OS hardening, the device has no interactive access. How do you develop, debug, and perform field maintenance?
|
||||
|
||||
### 3.2 Solution: Dual-Image Approach
|
||||
|
||||
Standard embedded Linux practice: maintain two OS images, both signed with the same PKC key.
|
||||
|
||||
|
||||
| Property | Development Image | Production Image |
|
||||
| ---------------------------------- | --------------------------------- | ----------------------- |
|
||||
| Secure Boot signature | Signed with PKC key | Signed with PKC key |
|
||||
| Boots on fused device | Yes | Yes |
|
||||
| SSH access | Yes (key-based only, no password) | No (sshd not installed) |
|
||||
| Serial console | Enabled | Disabled |
|
||||
| ptrace / /dev/mem | Allowed | Blocked (lockdown mode) |
|
||||
| Debug tools (gdb, strace, tcpdump) | Installed | Not present |
|
||||
| Getty on TTY | Running | Not spawned |
|
||||
| Desktop environment | Optional | Not installed |
|
||||
| Application | Your loader + inference | Your loader + inference |
|
||||
|
||||
|
||||
Secure Boot verifies the **signature**, not the **contents** of the image. Both images are valid as long as they're signed with your PKC key. An attacker cannot create either image without the private key.
|
||||
|
||||
### 3.3 Workflow
|
||||
|
||||
**During development:**
|
||||
|
||||
1. Flash the dev image to a fused device
|
||||
2. SSH in via key-based authentication
|
||||
3. Develop, debug, iterate
|
||||
4. When done, flash the prod image for deployment
|
||||
|
||||
**Production deployment:**
|
||||
|
||||
1. Flash the prod image at the factory
|
||||
2. Device boots directly into your application
|
||||
3. No shell, no SSH, no serial — only your FastAPI endpoints
|
||||
|
||||
**Field debug (emergency):**
|
||||
|
||||
1. Connect host PC via USB-C
|
||||
2. Put device in USB recovery mode (silicon ROM, always available)
|
||||
3. Reflash with the dev image (requires PKC private key to sign)
|
||||
4. SSH in, diagnose, fix
|
||||
5. Reflash with prod image, redeploy
|
||||
|
||||
USB recovery mode is hardwired in silicon. It always works regardless of what OS is installed. But after Secure Boot fusing, it **only accepts images signed with your PKC key**. An attacker who enters recovery mode but lacks the signing key is stuck.
|
||||
|
||||
### 3.4 Optional: Hardware Debug Jumper
|
||||
|
||||
A physical GPIO pin on the carrier board that, when shorted at boot, tells the init system to start SSH:
|
||||
|
||||
```
|
||||
Boot → systemd reads GPIO pin → if HIGH: start sshd.service
|
||||
→ if LOW: sshd not started (production behavior)
|
||||
```
|
||||
|
||||
Opening the case to access the jumper triggers the tamper enclosure → keys are zeroized. So this is only useful during controlled maintenance with the tamper system temporarily disarmed.
|
||||
|
||||
### 3.5 PKC Key Security
|
||||
|
||||
The PKC private key is the crown jewel. Whoever holds it can create signed images that boot on any of your fused devices. Protect it accordingly:
|
||||
|
||||
- Store on an air-gapped machine or HSM (Hardware Security Module)
|
||||
- Never store in git, CI/CD pipelines, or cloud storage
|
||||
- Limit access to 1-2 people
|
||||
- Consider splitting with Shamir's Secret Sharing for key ceremonies
|
||||
|
||||
---
|
||||
|
||||
## 4. Tamper Enclosure
|
||||
|
||||
### 4.1 Threat Model for Physical Access
|
||||
|
||||
|
||||
| Attack | Without enclosure | With tamper-responsive enclosure |
|
||||
| ---------------------------------- | ----------------------------- | ------------------------------------------------ |
|
||||
| Unscrew case, desolder eMMC/NVMe | Easy (minutes) | Mesh breaks → key destroyed → data irrecoverable |
|
||||
| Probe DRAM bus with logic analyzer | Moderate (requires soldering) | Case opening triggers zeroization first |
|
||||
| Cold boot (freeze RAM) | Moderate | Temperature sensor triggers zeroization |
|
||||
| Connect to board debug headers | Easy | Case must be opened → zeroization |
|
||||
|
||||
|
||||
### 4.2 Option A: Zymkey HSM4 + Custom Enclosure (~$150-250/unit)
|
||||
|
||||
**Recommended for initial production runs (up to ~500 units).**
|
||||
|
||||
**Bill of Materials:**
|
||||
|
||||
|
||||
| Component | Unit Cost | Source |
|
||||
| -------------------------------------- | ------------- | ----------------------------- |
|
||||
| Zymkey HSM4 (I2C security module) | ~$71 | zymbit.com |
|
||||
| Custom aluminum enclosure | ~$30-80 | CNC shop / Alibaba at volume |
|
||||
| Flex PCB tamper mesh panels (set of 6) | ~$10-30 | JLCPCB / PCBWay |
|
||||
| CR2032 coin cell battery | ~$2 | Standard electronics supplier |
|
||||
| 30 AWG perimeter wire (~2 ft) | ~$1 | Standard electronics supplier |
|
||||
| Assembly labor + connectors | ~$20-40 | — |
|
||||
| **Total** | **~$134-224** | — |
|
||||
|
||||
|
||||
**How it works:**
|
||||
|
||||
```
|
||||
┌──────── Aluminum Enclosure ────────┐
|
||||
│ │
|
||||
│ All inner walls lined with flex │
|
||||
│ PCB tamper mesh (conductive traces │
|
||||
│ in space-filling curve pattern) │
|
||||
│ │
|
||||
│ Mesh traces connect to Zymkey │
|
||||
│ HSM4's 2 perimeter circuits │
|
||||
│ │
|
||||
│ ┌───────────┐ ┌───────────────┐ │
|
||||
│ │ Zymkey │ │ Jetson Orin │ │
|
||||
│ │ HSM4 │ │ Nano │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ I2C ◄─────┤ │ GPIO header │ │
|
||||
│ │ GPIO4 ◄───┤ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ [CR2032] │ │ │ │
|
||||
│ │ (battery │ │ │ │
|
||||
│ │ backup) │ │ │ │
|
||||
│ └───────────┘ └───────────────┘ │
|
||||
│ │
|
||||
│ Tamper event (mesh broken, │
|
||||
│ temperature anomaly, power loss │
|
||||
│ without battery): │
|
||||
│ → Zymkey destroys stored keys │
|
||||
│ → Master encryption key is gone │
|
||||
│ → Encrypted disk is permanently │
|
||||
│ unrecoverable │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Zymkey HSM4 features:**
|
||||
|
||||
- 2 independent perimeter breach detection circuits (connect to mesh)
|
||||
- Accelerometer (shock/orientation tamper detection)
|
||||
- Main power monitor
|
||||
- Battery-backed RTC (36-60 months on CR2032)
|
||||
- Secure key storage (ECC P-256, AES-256, SHA-256)
|
||||
- I2C interface (fits Jetson's 40-pin GPIO header)
|
||||
- Configurable tamper response: notify host, or destroy keys on breach
|
||||
|
||||
**Flex PCB tamper mesh design:**
|
||||
|
||||
- Use the KiCad anti-tamper mesh plugin to generate space-filling curve trace patterns
|
||||
- Order from JLCPCB or PCBWay as flex PCBs (~$5-15 per panel)
|
||||
- Attach to enclosure inner walls with adhesive
|
||||
- Wire to Zymkey's perimeter circuit connectors (Hirose DF40HC)
|
||||
- Any cut, drill, or peel that breaks a trace triggers the tamper event
|
||||
|
||||
### 4.3 Option B: Full DIY (~$80-150/unit)
|
||||
|
||||
**For higher volumes (500+ units) where per-unit cost matters.**
|
||||
|
||||
|
||||
| Component | Unit Cost |
|
||||
| ------------------------------------------------- | ------------ |
|
||||
| STM32G4 microcontroller | ~$5 |
|
||||
| Flex PCB tamper mesh (KiCad plugin) | ~$10-30 |
|
||||
| Battery-backed SRAM (Cypress CY14B101 or similar) | ~$5 |
|
||||
| Custom PCB for STM32 monitor circuit | ~$10-20 |
|
||||
| Aluminum enclosure | ~$30-80 |
|
||||
| Coin cell + holder | ~$3 |
|
||||
| **Total** | **~$63-143** |
|
||||
|
||||
|
||||
The STM32G4's high-resolution timer (sub-200ps) enables Time-Domain Reflectometry (TDR) monitoring of the mesh — sending pulses into the trace and detecting echoes when damage occurs. More sensitive than simple resistance monitoring.
|
||||
|
||||
The master encryption key is stored in battery-backed SRAM (not in the Jetson's fTPM). On tamper detection, the STM32 cuts power to the SRAM — key vanishes in microseconds.
|
||||
|
||||
More engineering effort upfront (firmware for STM32, PCB design, integration testing) but lower per-unit BOM.
|
||||
|
||||
### 4.4 Option C: Epoxy Potting (~$30-50/unit)
|
||||
|
||||
**Minimum viable physical protection.**
|
||||
|
||||
- Encapsulate the entire Jetson board + carrier in hardened epoxy resin
|
||||
- Physical extraction requires grinding/dissolving the epoxy, which destroys the board and traces
|
||||
- No active zeroization — if the attacker is patient and skilled enough, they can extract components
|
||||
- Best combined with Options A or B: epoxy + active tamper mesh
|
||||
|
||||
### 4.5 Recommendation
|
||||
|
||||
|
||||
| Production volume | Recommendation | Per-unit cost |
|
||||
| -------------------- | ----------------------------------------- | ------------- |
|
||||
| Prototype / first 10 | Option A (Zymkey HSM4) + Option C (epoxy) | ~$180-270 |
|
||||
| 10-500 units | Option A (Zymkey HSM4) | ~$150-250 |
|
||||
| 500+ units | Option B (custom STM32) | ~$80-150 |
|
||||
|
||||
|
||||
All options fit within the $300/unit budget.
|
||||
|
||||
---
|
||||
|
||||
## 5. Simplified Loader Architecture
|
||||
|
||||
### 5.1 Current Architecture
|
||||
|
||||
```
|
||||
main.py (FastAPI)
|
||||
│
|
||||
├── POST /login
|
||||
│ → api_client.pyx: set_credentials, login()
|
||||
│ → credentials.pyx: email, password
|
||||
│ → security.pyx: get_hw_hash(hardware_info)
|
||||
│ → hardware_service.pyx: CPU/GPU/RAM/serial strings
|
||||
│
|
||||
├── POST /load/{filename}
|
||||
│ → api_client.pyx: load_big_small_resource(filename, folder)
|
||||
│ 1. Fetch SMALL part from API (POST /resources/get/{folder})
|
||||
│ → Decrypt with get_api_encryption_key(email+password+hw_hash+salt)
|
||||
│ 2. Fetch BIG part from CDN (S3 download) or local cache
|
||||
│ 3. Concatenate small + big
|
||||
│ 4. Decrypt merged blob with get_resource_encryption_key() (fixed internal string)
|
||||
│ → Return decrypted bytes
|
||||
│
|
||||
├── POST /upload/{filename}
|
||||
│ → api_client.pyx: upload_big_small_resource(file, folder)
|
||||
│ 1. Encrypt full resource with get_resource_encryption_key()
|
||||
│ 2. Split at min(3KB, 30% of ciphertext)
|
||||
│ 3. Upload big part to CDN
|
||||
│ 4. Upload small part to API
|
||||
│
|
||||
└── POST /unlock
|
||||
→ binary_split.py:
|
||||
1. download_key_fragment(RESOURCE_API_URL, token) — HTTP GET from API
|
||||
2. decrypt_archive(images.enc, SHA256(key_fragment)) — AES-CBC stream
|
||||
3. docker load -i result.tar
|
||||
```
|
||||
|
||||
**Security dependencies in current architecture:**
|
||||
|
||||
- `security.pyx`: SHA-384 key derivation from `email + password + hw_hash + salt`
|
||||
- `hardware_service.pyx`: String-based hardware fingerprint (spoofable)
|
||||
- `binary_split.py`: Key fragment downloaded from API server
|
||||
- Split storage: security depends on attacker not having both API and CDN access
|
||||
|
||||
### 5.2 Proposed TPM Architecture
|
||||
|
||||
```
|
||||
main.py (FastAPI) — routes and request/response contracts unchanged
|
||||
│
|
||||
├── POST /login
|
||||
│ → api_client.pyx: set_credentials, login()
|
||||
│ → credentials.pyx: email, password (unchanged — still needed for API auth)
|
||||
│ → security_provider.pyx: auto-detect TPM or legacy
|
||||
│
|
||||
├── POST /load/{filename}
|
||||
│ → api_client.pyx: load_resource(filename, folder)
|
||||
│ [TPM path]:
|
||||
│ 1. Fetch single encrypted file from CDN (S3 download)
|
||||
│ 2. security_provider.decrypt(data)
|
||||
│ → tpm_security_provider.pyx: FAPI.unseal() → master key → AES decrypt
|
||||
│ 3. Return decrypted bytes
|
||||
│ [Legacy path]:
|
||||
│ (unchanged — load_big_small_resource as before)
|
||||
│
|
||||
├── POST /upload/{filename}
|
||||
│ → api_client.pyx: upload_resource(file, folder)
|
||||
│ [TPM path]:
|
||||
│ 1. security_provider.encrypt(data)
|
||||
│ → tpm_security_provider.pyx: AES encrypt with TPM-derived key
|
||||
│ 2. Upload single file to CDN
|
||||
│ [Legacy path]:
|
||||
│ (unchanged — upload_big_small_resource as before)
|
||||
│
|
||||
└── POST /unlock
|
||||
[TPM path]:
|
||||
1. security_provider.unseal_archive_key()
|
||||
→ tpm_security_provider.pyx: FAPI.unseal() → archive key (no network call)
|
||||
2. decrypt_archive(images.enc, archive_key)
|
||||
3. docker load -i result.tar
|
||||
[Legacy path]:
|
||||
(unchanged — download_key_fragment from API)
|
||||
```
|
||||
|
||||
### 5.3 SecurityProvider Interface
|
||||
|
||||
```python
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
class SecurityProvider(ABC):
|
||||
@abstractmethod
|
||||
def encrypt(self, data: bytes) -> bytes: ...
|
||||
|
||||
@abstractmethod
|
||||
def decrypt(self, data: bytes) -> bytes: ...
|
||||
|
||||
@abstractmethod
|
||||
def get_archive_key(self) -> bytes: ...
|
||||
```
|
||||
|
||||
Two implementations:
|
||||
|
||||
- **TpmSecurityProvider**: Calls `tpm2-pytss` FAPI to unseal master key from TPM. Uses master key for AES-256-CBC encrypt/decrypt. Archive key is also TPM-sealed (no network download).
|
||||
- **LegacySecurityProvider**: Wraps existing `security.pyx` logic unchanged. Key derivation from `email+password+hw_hash+salt`. Archive key downloaded from API.
|
||||
|
||||
### 5.4 Auto-Detection Logic
|
||||
|
||||
At startup:
|
||||
|
||||
```
|
||||
1. Check env var SECURITY_PROVIDER
|
||||
→ if "tpm": use TpmSecurityProvider (fail hard if TPM unavailable)
|
||||
→ if "legacy": use LegacySecurityProvider
|
||||
→ if unset: auto-detect (step 2)
|
||||
|
||||
2. Check os.path.exists("/dev/tpm0")
|
||||
→ if True: attempt TPM connection via FAPI
|
||||
→ if success: use TpmSecurityProvider
|
||||
→ if failure: log warning, fall back to LegacySecurityProvider
|
||||
→ if False: use LegacySecurityProvider
|
||||
|
||||
3. Log which provider was selected and why
|
||||
```
|
||||
|
||||
### 5.5 What Changes, What Stays
|
||||
|
||||
|
||||
| Component | TPM path | Legacy path | Notes |
|
||||
| ------------------------------ | --------------------------------- | ------------------------- | ------------------------------------ |
|
||||
| `main.py` routes | Unchanged | Unchanged | F1-F6 API contract preserved |
|
||||
| JWT authentication | Unchanged | Unchanged | Still needed for API access |
|
||||
| CDN download | Single file | Big/small split | CDN still used for bandwidth |
|
||||
| AES-256-CBC encryption | Unchanged algorithm | Unchanged | Only the key source changes |
|
||||
| Key source | TPM-sealed master key | SHA-384(email+pw+hw+salt) | Core difference |
|
||||
| `hardware_service.pyx` | Not used | Used | TPM replaces string fingerprinting |
|
||||
| `binary_split.py` key download | Eliminated | Used | TPM-sealed key is local |
|
||||
| `security.pyx` | Wrapped in LegacySecurityProvider | Active | Not deleted — legacy devices need it |
|
||||
|
||||
|
||||
### 5.6 Docker Container Changes
|
||||
|
||||
The loader runs in Docker. For TPM access:
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml additions for TPM path
|
||||
services:
|
||||
loader:
|
||||
devices:
|
||||
- /dev/tpm0:/dev/tpm0
|
||||
- /dev/tpmrm0:/dev/tpmrm0
|
||||
environment:
|
||||
- SECURITY_PROVIDER=tpm # or leave unset for auto-detect
|
||||
```
|
||||
|
||||
No `--privileged` flag needed. Device mounts are sufficient.
|
||||
|
||||
Container image needs additional packages:
|
||||
|
||||
- `tpm2-tss` (native library, >= 2.4.0)
|
||||
- `tpm2-pytss` (Python bindings from PyPI)
|
||||
- FAPI configuration file (`/etc/tpm2-tss/fapi-config.json`)
|
||||
|
||||
---
|
||||
|
||||
## 6. Implementation Phases
|
||||
|
||||
### Phase 0: Preparation (1 week)
|
||||
|
||||
|
||||
| Task | Details |
|
||||
| ------------------------ | ------------------------------------------------------------------------------------------------ |
|
||||
| Order hardware | Second Jetson Orin Nano dev kit (expendable for fusing experiments) |
|
||||
| Order Zymkey HSM4 | For tamper enclosure evaluation |
|
||||
| Download NVIDIA packages | BSP (`jetson_linux_*_aarch64.tbz2`), sample rootfs, public sources, FSKP partner package |
|
||||
| Set up host | Ubuntu 22.04 LTS on x86 machine, install `libftdi-dev`, `openssh-server`, `python3-cryptography` |
|
||||
| Study NVIDIA docs | `r36.4.3` Security section: Secure Boot, Disk Encryption, Firmware TPM, FSKP |
|
||||
|
||||
|
||||
### Phase 1: Secure Boot + Disk Encryption (1-2 weeks)
|
||||
|
||||
|
||||
| Task | Details | Validation |
|
||||
| ----------------------------- | ---------------------------------------------------------- | ------------------------------------------------ |
|
||||
| Generate PKC + SBK keys | `openssl genrsa` + `gen_sbk_key.py` | Keys exist, correct format |
|
||||
| Dry-run fuse burning | `odmfuse.sh --test` on expendable dev board | No errors, fuse values logged |
|
||||
| Burn Secure Boot fuses | `odmfuse.sh` for real (PKC, SBK, SECURITY_MODE) | Device only boots signed images |
|
||||
| Generate disk encryption keys | `gen_ekb/example.sh` | `sym2_t234.key` + `eks_t234.img` with EEKB magic |
|
||||
| Flash encrypted rootfs | `ROOTFS_ENC=1 l4t_initrd_flash.sh` | Device boots, `lsblk` shows LUKS partition |
|
||||
| Validate Secure Boot | Attempt to flash unsigned image → must fail | Unsigned flash rejected |
|
||||
| Validate disk encryption | Remove NVMe, mount on another machine → must be ciphertext | Cannot read filesystem |
|
||||
|
||||
|
||||
### Phase 2: fTPM Provisioning (1-2 weeks)
|
||||
|
||||
|
||||
| Task | Details | Validation |
|
||||
| ----------------------------------------- | ---------------------------------------- | ------------------------------------------- |
|
||||
| Generate KDK0 + Silicon_ID | `kdk_gen.py` per device | KDK_DB populated |
|
||||
| Generate fuseblob | `fskp_fuseburn.py` | Signed fuseblob files |
|
||||
| Generate fTPM EKB | `odm_ekb_gen.py` + `oem_ekb_gen.py` | Per-device EKB images |
|
||||
| Burn fTPM fuses | `odmfuse.sh` with KDK0 fuses | Fuses burned |
|
||||
| Flash with fTPM EKB | `flash.sh` with EKB | Device boots with fTPM |
|
||||
| On-device provisioning | `ftpm_provisioning.sh` | EK certificates in NV memory |
|
||||
| Validate fTPM | `tpm2_getcap properties-fixed` | Shows manufacturer, firmware version |
|
||||
| Test seal/unseal | `tpm2_create` + `tpm2_unseal` round-trip | Data sealed → unsealed correctly |
|
||||
| Test seal on device A, unseal on device B | Copy sealed blob between devices | Unseal fails on device B (correct behavior) |
|
||||
|
||||
|
||||
### Phase 3: OS Hardening (1 week)
|
||||
|
||||
|
||||
| Task | Details | Validation |
|
||||
| ---------------------------- | --------------------------------------------------------------- | --------------------------------------- |
|
||||
| Create dev image recipe | SSH (key-only), serial console, ptrace allowed, debug tools | Can SSH in, run gdb |
|
||||
| Create prod image recipe | No SSH, no serial, no ptrace, no shell, no desktop | No interactive access possible |
|
||||
| Kernel config: lockdown mode | `CONFIG_SECURITY_LOCKDOWN_LSM=y`, `lockdown=confidentiality` | `/dev/mem` access denied, kexec blocked |
|
||||
| Kernel config: disable debug | `CONFIG_STRICT_DEVMEM=y`, no `/dev/kmem` | Cannot read physical memory |
|
||||
| Sysctl hardening | `kernel.yama.ptrace_scope=3`, `kernel.core_pattern=|/bin/false` | ptrace attach fails, no core dumps |
|
||||
| Disable serial console | Remove `console=ttyTCU0` from kernel cmdline | No output on serial |
|
||||
| Disable getty | Mask `getty@.service`, `serial-getty@.service` | No login prompt on any TTY |
|
||||
| Sign both images | `flash.sh -u pkc.pem` for dev and prod images | Both boot on fused device |
|
||||
| Validate prod image | Plug in keyboard, monitor, USB, Ethernet → no access | Device is a black box |
|
||||
| Validate dev image | Flash dev image → SSH works | Can debug on fused device |
|
||||
|
||||
|
||||
### Phase 4: Loader Code Changes (2-3 weeks)
|
||||
|
||||
|
||||
| Task | Details | Tests |
|
||||
| -------------------------------------------- | ---------------------------------------------------- | ------------------------------------------------ |
|
||||
| Add `tpm2-tss`, `tpm2-pytss` to requirements | Match versions available in Jetson BSP | Imports work |
|
||||
| Add `swtpm` to dev dependencies | TPM simulator for CI/testing | Simulator starts, `/dev/tpm0` available |
|
||||
| Implement `SecurityProvider` ABC | `security_provider.pxd` + `.pyx` | Interface compiles |
|
||||
| Implement `TpmSecurityProvider` | FAPI `create_seal`, `unseal`, AES encrypt/decrypt | Seal/unseal round-trip with swtpm |
|
||||
| Implement `LegacySecurityProvider` | Wrap existing `security.pyx` | All existing tests pass unchanged |
|
||||
| Add auto-detection logic | `/dev/tpm0` check + env var override | Correct provider selected in both cases |
|
||||
| Refactor `load_resource` (TPM path) | Single file download + TPM decrypt | Download → decrypt → correct bytes |
|
||||
| Refactor `upload_resource` (TPM path) | TPM encrypt + single file upload | Encrypt → upload → download → decrypt round-trip |
|
||||
| Refactor Docker unlock (TPM path) | TPM unseal archive key, no API download | Unlock works without network key fragment |
|
||||
| Update `docker-compose.yml` | Add `/dev/tpm0`, `/dev/tpmrm0` device mounts | Container can access TPM |
|
||||
| Update `Dockerfile` | Install `tpm2-tss` native lib + `tpm2-pytss` | Build succeeds |
|
||||
| Integration tests | Full flow with swtpm: login → load → upload → unlock | All paths work |
|
||||
| Legacy regression tests | All existing e2e tests pass without TPM | No regression |
|
||||
|
||||
|
||||
### Phase 5: Tamper Enclosure (2-4 weeks, parallel with Phase 4)
|
||||
|
||||
|
||||
| Task | Details | Validation |
|
||||
| ------------------------- | --------------------------------------------------------------- | --------------------------- |
|
||||
| Evaluate Zymkey HSM4 | Connect to Orin Nano GPIO header, test I2C communication | Zymkey detected, LED blinks |
|
||||
| Test perimeter circuits | Wire perimeter inputs, break wire → verify detection | Tamper event logged |
|
||||
| Test key zeroization | Enable production mode, trigger tamper → verify key destruction | Key gone, device bricked |
|
||||
| Design tamper mesh panels | KiCad anti-tamper mesh plugin, space-filling curves | Gerber files ready |
|
||||
| Order flex PCBs | JLCPCB or PCBWay | Panels received |
|
||||
| Design/source enclosure | Aluminum case, dimensions for Jetson + Zymkey + mesh panels | Enclosure received |
|
||||
| Assemble prototype | Mount boards, wire mesh to Zymkey perimeter circuits | Physical prototype complete |
|
||||
| Test tamper scenarios | Open case, drill, probe → all trigger zeroization | All breach paths detected |
|
||||
| Temperature test | Cool enclosure below threshold → verify trigger | Cold boot attack prevented |
|
||||
|
||||
|
||||
### Phase 6: Integration Testing (1-2 weeks)
|
||||
|
||||
|
||||
| Test Scenario | Expected Result |
|
||||
| --------------------------------------------------------------------------------- | -------------------------------------------------------- |
|
||||
| Full stack: fused device + encrypted disk + fTPM + hardened OS + tamper enclosure | Device boots, runs inference, all security layers active |
|
||||
| Attempt USB boot | Rejected (Secure Boot) |
|
||||
| Attempt JTAG | No response (fused off) |
|
||||
| Attempt SSH on prod image | Connection refused (no sshd) |
|
||||
| Attempt serial console | No output |
|
||||
| Remove NVMe, read on another machine | Ciphertext only |
|
||||
| Copy sealed blob to different device | Unseal fails |
|
||||
| Open tamper enclosure | Keys destroyed, device permanently bricked |
|
||||
| Legacy device (no TPM) loads resources | Works via LegacySecurityProvider |
|
||||
| Fused device loads resources | Works via TpmSecurityProvider |
|
||||
| Docker unlock on TPM device | Works without network key download |
|
||||
| Docker unlock on legacy device | Works via API key fragment (unchanged) |
|
||||
|
||||
|
||||
### Timeline Summary
|
||||
|
||||
```
|
||||
Week 1 Phase 0: Preparation (order hardware, download BSP)
|
||||
Week 2-3 Phase 1: Secure Boot + Disk Encryption
|
||||
Week 4-5 Phase 2: fTPM Provisioning
|
||||
Week 6 Phase 3: OS Hardening
|
||||
Week 7-9 Phase 4: Loader Code Changes
|
||||
Week 7-10 Phase 5: Tamper Enclosure (parallel with Phase 4)
|
||||
Week 11-12 Phase 6: Integration Testing
|
||||
```
|
||||
|
||||
Total estimated duration: **10-12 weeks** (Phases 4 and 5 overlap).
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- NVIDIA Jetson Linux Developer Guide r36.4.3 — Firmware TPM: [https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/FirmwareTPM.html](https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/FirmwareTPM.html)
|
||||
- NVIDIA Jetson Linux Developer Guide — Secure Boot: [https://docs.nvidia.com/jetson/archives/r36.2/DeveloperGuide/SD/Security/SecureBoot.html](https://docs.nvidia.com/jetson/archives/r36.2/DeveloperGuide/SD/Security/SecureBoot.html)
|
||||
- NVIDIA Jetson Linux Developer Guide — Disk Encryption: [https://docs.nvidia.com/jetson/archives/r38.2.1/DeveloperGuide/SD/Security/DiskEncryption.html](https://docs.nvidia.com/jetson/archives/r38.2.1/DeveloperGuide/SD/Security/DiskEncryption.html)
|
||||
- NVIDIA Jetson Linux Developer Guide — FSKP: [https://docs.nvidia.com/jetson/archives/r38.4/DeveloperGuide/SD/Security/FSKP.html](https://docs.nvidia.com/jetson/archives/r38.4/DeveloperGuide/SD/Security/FSKP.html)
|
||||
- tpm2-pytss: [https://github.com/tpm2-software/tpm2-pytss](https://github.com/tpm2-software/tpm2-pytss)
|
||||
- tpm2-pytss FAPI docs: [https://tpm2-pytss.readthedocs.io/en/latest/fapi.html](https://tpm2-pytss.readthedocs.io/en/latest/fapi.html)
|
||||
- Zymbit HSM4: [https://www.zymbit.com/HSM4/](https://www.zymbit.com/HSM4/)
|
||||
- Zymbit HSM4 perimeter detect: [https://docs.zymbit.com/tutorials/perimeter-detect/hsm4](https://docs.zymbit.com/tutorials/perimeter-detect/hsm4)
|
||||
- KiCad anti-tamper mesh plugin: [https://hackaday.com/2021/03/14/an-anti-tamper-mesh-plugin-for-kicad/](https://hackaday.com/2021/03/14/an-anti-tamper-mesh-plugin-for-kicad/)
|
||||
- Microchip PolarFire security mesh: [https://www.microchip.com/en-us/about/media-center/blog/2026/security-mesh-distributed-defense-across-your-design](https://www.microchip.com/en-us/about/media-center/blog/2026/security-mesh-distributed-defense-across-your-design)
|
||||
- DoD GUARD Secure GPU Module: [https://www.cto.mil/wp-content/uploads/2025/04/Secure-Edge.pdf](https://www.cto.mil/wp-content/uploads/2025/04/Secure-Edge.pdf)
|
||||
- Forecr MILBOX-ORNX (rugged enclosure): [https://forecr.io/products/jetson-orin-nx-orin-nano-rugged-compact-pc-milbox-ornx](https://forecr.io/products/jetson-orin-nx-orin-nano-rugged-compact-pc-milbox-ornx)
|
||||
|
||||
## Related Artifacts
|
||||
|
||||
- Solution Draft 01: `_docs/02_task_plans/tpm-replaces-binary-split/01_solution/solution_draft01.md`
|
||||
- Security Analysis: `_docs/02_task_plans/tpm-replaces-binary-split/01_solution/security_analysis.md`
|
||||
- Fact Cards: `_docs/02_task_plans/tpm-replaces-binary-split/00_research/02_fact_cards.md`
|
||||
- Reasoning Chain: `_docs/02_task_plans/tpm-replaces-binary-split/00_research/04_reasoning_chain.md`
|
||||
- Problem Statement: `_docs/02_task_plans/tpm-replaces-binary-split/problem.md`
|
||||
|
||||
Reference in New Issue
Block a user