03 Trust and Identity
The Problem
The preboot UKI is loaded and running (02 Finding the OS). Now it needs to talk to the org -- but how does the org know this machine is legitimate?
This is the machine identity problem. In web development, you solve identity with usernames, passwords, and session tokens. But there's no human sitting at this machine typing a password during boot. The machine needs to prove its identity autonomously, every boot, without human intervention.
Three things need to happen:
- First boot: The machine has no identity yet. It needs to enroll -- get credentials that prove it belongs to the org.
- Subsequent boots: The machine presents its stored credentials to the org. The org verifies them and grants access.
- Revocation: If a machine is stolen or compromised, the org needs to reject its credentials permanently.
All of this must work without a central authority that could be a single point of failure.
The Building Blocks
Hardware Identity: TPM
A TPM (Trusted Platform Module) is a small chip on the motherboard (or firmware running in the CPU's secure enclave) that stores cryptographic keys and secrets in tamper-resistant hardware. The key properties:
- Secrets never leave the chip. The TPM performs crypto operations internally. You can ask it to sign something, but you can't extract the private key.
- Small persistent storage. 6-8 KB of non-volatile memory for storing secrets, certificates, and counters.
- Measured boot. The TPM records hashes of every boot component into special registers (PCRs). You can seal a secret so the TPM only releases it if the boot chain matches exactly.
- Unique identity. Each TPM has a manufacturer-burned Endorsement Key that uniquely identifies the physical hardware.
For FortrOS, the TPM stores two critical secrets:
- The preboot_secret -- a random 32-byte value delivered during enrollment, used to derive the LUKS disk encryption key
- An Ed25519 keypair -- the preboot's signing identity, used to authenticate to the org on every boot
Both are stored in TPM NV (non-volatile) indices. They persist across reboots but cannot be extracted by software -- even if the OS is compromised, the secrets stay in the TPM.
Network Trust: TLS and Certificates
TLS and Certificates provide the encrypted channel and identity verification between machines. If you've used HTTPS, you understand the core idea: the server presents a certificate, the client verifies it against a trusted CA, and the connection is encrypted.
mTLS (mutual TLS) goes further: both sides present certificates and both verify the other. In normal HTTPS, only the server proves its identity (the client is anonymous at the TLS layer). In mTLS, the server knows cryptographically which client is connecting. This is the standard mechanism for machine-to-machine authentication in clusters.
For FortrOS, the org has its own Certificate Authority (CA). Every node gets a short-lived certificate signed by the org CA. Certificates are renewed via the gossip mesh -- any org member can renew another member's cert.
Revocation is instant, not deferred. When a node is revoked, the revocation hits at the ZTNA/WireGuard conn_auth layer immediately -- every connection attempt is rejected. This is why org VMs communicate through WireGuard loopback: revocation cuts a compromised node off from all services on the next connection, not after a cert expires. Short-lived certs (hours) are a bonus that provides defense in depth -- if conn_auth somehow misses a revocation, the cert expires anyway.
Physical Presence: YubiKey
A YubiKey (or CAC/smart card in government orgs) proves a human was physically present. FortrOS uses them for several purposes:
- Enrollment credential: The one-time secret that bootstraps trust for a new node. YubiKey HMAC-SHA1 challenge-response generates a value only someone holding the physical key could produce.
- Hibernate lock/unlock: The YubiKey + PIN is the authority for
hibernate (something you have + something you know). When hibernating,
the main OS derives a key from the YubiKey HMAC and the node's
identity:
HKDF(yubikey_hmac(PIN), "hibernate", node_id)wherenode_id = H(preboot_secret). The admin touches the YubiKey and enters the PIN to authorize hibernation. On resume, the preboot finds the hibernate partition, prompts for the same YubiKey + PIN, derives the same key, and kexecs into the resumed kernel. The preboot_secret itself is NOT in the hibernate key derivation -- the main OS doesn't know it. Instead,H(preboot_secret)serves as the node identity (the preboot knows this intrinsically from TPM NV). This means: same YubiKey + same PIN + same node = same key. Wrong PIN, different YubiKey, or different node (swapped preboot) = can't resume. Stolen laptop while hibernated = paperweight without the YubiKey AND the PIN. - Disaster recovery: Admin YubiKeys hold LUKS keyslots for /persist. When the org is unreachable, an admin YubiKey can unlock /persist directly. The first nodes recovered with admin YubiKeys form the seed of the org -- gossip is peer-to-peer, so recovered nodes mesh automatically.
- Secrets manager bootstrap: The org's secrets manager (which other services depend on for runtime secrets) can be unlocked by either another secrets manager instance or by an admin YubiKey. In a total org failure, the recovery sequence is: admin YubiKey brings up nodes -> nodes mesh over WireGuard -> once enough nodes with the right storage shards are up, the secrets manager can start -> once the secrets manager is up, the reconciliation loop brings up everything else.
There are two classes of YubiKeys:
- Admin YubiKeys: Enrollment, LUKS keyslots, secrets manager bootstrap. Admin YubiKey material is disseminated through the org while it's healthy (adding/revoking an admin YubiKey = adding/removing LUKS keyslots on nodes via gossip propagation).
- User YubiKeys (or CAC cards): Hibernate lock/unlock, employee identification. No org admin capabilities.
Normal boot authentication uses the TPM (no human needed). YubiKeys are for high-trust operations where physical presence matters.
How Others Do It
Kubernetes: Bootstrap Tokens
A new Kubernetes node joins by presenting a bootstrap token -- a short-lived
bearer token that the admin generates (kubeadm token create). The node uses
this token to request a TLS client certificate from the API server's CSR
(Certificate Signing Request) API. Once approved, the node has a cert and
switches to mTLS for all future communication. The bootstrap token is discarded.
Strength: Simple, well-understood. Weakness: The bootstrap token is a bearer token -- anyone who intercepts it can join a node. It must be transmitted securely. Also requires a functioning API server (single point of failure without HA setup).
Talos Linux: Machine Config as Trust Bundle
Talos delivers the entire trust bundle (CA certs, bootstrap tokens, network config) as a signed YAML document during first boot. The machine config is the identity. No SSH, no shell -- all management is via mTLS gRPC.
Strength: The machine's identity is fully declarative and delivered atomically. Weakness: The config must contain the cluster CA key (for control plane nodes), which is a high-value secret to transmit.
Consul: Gossip Key + mTLS
New Consul agents join by contacting any existing member with a pre-shared
gossip encryption key (symmetric, AES-256-GCM). For RPC, Consul uses mTLS
with an internal CA. The auto_encrypt feature lets new agents obtain TLS
certs automatically using just the gossip key and CA cert.
Strength: Any member can onboard new agents (no central authority needed). Weakness: The gossip key is a shared secret -- compromise it and anyone can join. Rotation requires coordination across all members.
SPIFFE/SPIRE: Platform Attestation
SPIFFE assigns identities based on what the platform says about the workload (node attestation via AWS metadata, Kubernetes service accounts, or TPM), not pre-shared secrets. No credentials are deployed -- identity is inferred.
Strength: Zero-secret deployment, strong attestation. Weakness: Requires a supported platform for attestation. Complexity is high.
The Tradeoffs
| Approach | First-boot trust | Revocation | Central authority needed | Human involvement |
|---|---|---|---|---|
| Bearer tokens (K8s) | Token must be transmitted securely | Explicit (delete token) | Yes (API server) | Admin generates token |
| Machine config (Talos) | Config delivered OOB | Regenerate config | Yes (config source) | Admin creates config |
| Pre-shared key (Consul) | Key distributed in advance | Rotate key cluster-wide | No (any member) | Admin distributes key |
| Platform attestation (SPIFFE) | Platform proves identity | Deregister from platform | Yes (SPIRE server) | None |
| TPM + enrollment token (FortrOS) | One-time token + TPM stores result | Remove hash from org state | No (any member enrolls) | Admin provides token at first boot |
The tension: the less human involvement, the more trust you place in the infrastructure. FortrOS wants zero-touch operation after enrollment but requires a deliberate human act (presenting the enrollment credential) to establish initial trust.
How FortrOS Does It
FortrOS uses a two-tier trust model that separates preboot identity from main OS identity. This separation is baked into the architecture, not a toggle you enable. Like how reinstalling the OS on a modern phone wipes user data by design (the encryption keys are tied to the OS state), FortrOS's two tiers ensure that compromising the main OS cannot compromise the preboot -- and a reboot clobbers the compromise.
Preboot Identity (TPM-based)
The preboot runs before the main OS and has its own credentials stored in the TPM:
- preboot_secret: 32 random bytes, delivered over TLS during enrollment, written to TPM NV index. Used to derive the LUKS encryption key for /persist.
- Ed25519 keypair: Generated by the provisioner, delivered alongside
preboot_secret, stored in a second TPM NV index. The public key is recorded
in org state as
preboot_signing_pubkey.
On every boot, the preboot authenticates to the org by sending
H(preboot_secret) (SHA-256 hash) as a Bearer token. The org never sees the
actual secret -- only the hash. This means:
- Intercepting the hash doesn't reveal the secret
- The org can verify identity without holding the secret
- Revocation = remove the hash from org state
Main OS Identity (Ed25519 on /persist)
The main OS generates its own Ed25519 keypair on first boot, stored on the encrypted /persist partition. This is a completely separate credential from the preboot's TPM-stored key. The main OS uses this key for:
- WireGuard overlay identity (the keypair IS the WireGuard identity)
- conn_auth (Ed25519 challenge-response) with other org members
- Signing gossip messages and CRDT operations
Why two tiers? The chain of trust must extend from the hardware and org all the way through boot before anything sensitive is unlocked. If the main OS is compromised, the attacker gets the /persist key but not the TPM-stored preboot_secret. They can't impersonate the preboot, can't derive the LUKS key directly, and can't survive a reboot (the preboot re-authenticates independently using the org and hardware as trust anchors). A reboot effectively clobbers a main OS compromise -- the same pattern as a phone where reinstalling the OS wipes user data by design, not because of a setting.
Enrollment: The First Boot
A new machine's first boot follows this sequence:
- Physical act: An admin provides an enrollment credential (YubiKey HMAC response or one-time token). This is the deliberate human authorization.
- Preboot connects to the org gateway over TLS. The connection is authenticated by the org's CA certificate (pinned in the preboot UKI's initramfs).
- Preboot sends the enrollment credential to the provisioner via the gateway. The provisioner validates it.
- Provisioner generates identity material: preboot_secret (random 32 bytes), Ed25519 keypair. Returns both over the TLS connection.
- Preboot writes to TPM: preboot_secret to NV index 1, Ed25519 keypair to NV index 2. These persist across reboots.
- Provisioner records in org state: enrollment nonce, source IP,
H(preboot_secret), preboot signing pubkey. Status: pending. - Preboot continues normal boot (LUKS unlock, kexec -- covered in chapters 4 and 5).
- Main OS joins WireGuard mesh and offers the enrollment nonce. Provisioner validates nonce + IP, promotes enrollment from pending -> confirmed. Cert server issues a short-lived certificate.
- Node is a full org member. Confirmed enrollment gossips to all nodes.
Three-State Enrollment
FortrOS is honest about uncertainty. An enrollment is not instant -- it progresses through states:
| State | Meaning |
|---|---|
| Pending | Preboot authenticated, credentials delivered. Node has WireGuard peer but isn't gossiped yet. |
| Confirmed | Main OS presented nonce, cert issued, enrollment gossiped to org. Full member. |
| Revoked | Hash removed from org state. Node cannot authenticate. |
Pending -> confirmed promotion is automatic (the main OS presents the nonce, the provisioner verifies it). No human approval step for the second phase -- the human act was providing the enrollment credential in step 1.
No-TPM Mode (Degraded)
If a machine has no TPM (rare, but possible on some VPS providers or old hardware), FortrOS runs in degraded mode:
- preboot_secret is lost on every reboot (not persisted)
- /persist is reformatted from scratch each boot
- Single-boot-cycle operation: the machine joins the org, works until reboot, then re-enrolls
Why this is necessary, not arbitrary: The TPM stores identity and encryption secrets in tamper-resistant hardware. Without it, preboot_secret would have to live on disk or in RAM -- both accessible to any software with root access. Storing secrets in a compromisable location and calling it "secure" would be dishonest. FortrOS chooses not to store them at all rather than store them insecurely. This is different from Windows 11's TPM requirement (which is about DRM and feature gating) -- FortrOS's requirement is structural: the two-tier trust model needs hardware-isolated secret storage to work.
Mitigations for no-TPM environments:
- Sleep/hibernate could extend a single boot cycle by keeping secrets in RAM (sleep) or in a YubiKey-locked hibernate image. The node loses secrets on power loss but not on lid close. This preserves the security property (secrets aren't stored on unprotected disk) while improving usability.
- External hardware security (USB HSM, smart card with persistent storage) could substitute for the TPM's NV storage role. The preboot would need driver support for the specific hardware.
- VPS with vTPM: Many cloud providers offer virtual TPMs backed by their HSM infrastructure. These aren't hardware TPMs but are better than no TPM -- the secrets are isolated from the guest OS, even if the trust ultimately rests with the cloud provider.
Stage Boundary
What This Stage Produces
After trust establishment:
- preboot_secret is stored in TPM NV (or lost if no TPM)
- Ed25519 keypair is stored in TPM NV
- The org has recorded
H(preboot_secret)and the preboot signing pubkey - The preboot can authenticate to the org on every subsequent boot
What Is Handed Off
The preboot now has:
- Proof of identity (TPM-stored credentials)
- A TLS connection to the org gateway
- Key material for LUKS derivation (preboot_secret)
- Generation information from the org (which kernel to boot)
The next stage -- 04 Disk Encryption -- uses the preboot_secret to derive the LUKS key and unlock /persist.
What This Stage Does NOT Do
- It does not unlock encrypted storage (that's 04 Disk Encryption)
- It does not select or load a kernel (that's 05 Loading the Real OS)
- It does not configure networking beyond the initial TLS connection (that's 07 Overlay Networking)
- It does not make the node an org member (that requires the main OS to join WireGuard and present the enrollment nonce -- 08 Cluster Formation)
Further Reading
Hardware deep dives:
- TPM -- Trusted Platform Module: PCRs, sealing, attestation, NV storage
- YubiKey -- Hardware security key for enrollment and recovery
Concepts:
- TLS and Certificates -- transport TLS, conn_auth (Ed25519), org CA
- Key Derivation -- HKDF and why you derive keys instead of using them raw
- Three-State Pattern -- Pending/confirmed/revoked enrollment tracking
- Portable Node -- YubiKey/CAC auth on untrusted hardware
Services:
- Provisioner -- On-demand enrollment gateway
FortrOS implementation:
- 02-boot-kernel.md -- Preboot identity lifecycle
- 03-networking-security.md -- Two-tier trust model