Merkle Trees
What It Is
A Merkle tree is a tree of hashes. Each leaf node is a hash of a data block. Each non-leaf node is a hash of its children's hashes. The root hash is a fingerprint of the entire dataset.
Named after Ralph Merkle (1979), Merkle trees are used everywhere: Git commits, Bitcoin blocks, BitTorrent piece verification, Amazon Dynamo, and FortrOS state synchronization.
Why It Matters
Merkle trees answer one question extremely efficiently: "Do we have the same data?" Two nodes compare their root hashes (32 bytes). If they match, the datasets are identical. If they differ, the nodes walk down the tree to find exactly which leaves differ -- without transmitting the entire dataset.
For a dataset with 1 million entries where 1 entry changed, comparing root hashes identifies the mismatch in 1 comparison. Walking the tree to find the changed entry takes ~20 comparisons (log2 of 1 million). Without a Merkle tree, you'd need to compare all 1 million entries.
How It Works
Building the Tree
Root: H(H12 || H34)
/ \
H12: H(H1 || H2) H34: H(H3 || H4)
/ \ / \
H1: H(D1) H2: H(D2) H3: H(D3) H4: H(D4)
H() is a hash function (SHA-256). D1-D4 are data blocks. Each leaf is
the hash of its data block. Each internal node is the hash of its children.
The root hash summarizes the entire dataset.
Detecting Changes
If D3 changes:
H3changes (new hash of the new data)H34changes (new hash ofH3 || H4)- Root changes (new hash of
H12 || H34) H12does NOT change (its children didn't change)
By comparing from the root down, you can pinpoint exactly which subtree changed without examining the unchanged half.
Efficient Sync
Two nodes syncing via Merkle trees:
- Compare roots: different -> proceed
- Compare level 1:
H12matches,H34differs -> only sync right subtree - Compare level 2:
H3differs,H4matches -> only transferD3
For FortrOS with 10,000 nodes: syncing 1 changed node's metadata requires ~4KB of Merkle tree exchange instead of ~820KB of full state. The savings grow with cluster size.
How FortrOS Uses It
StateTree<T>: Each replicated state tree wraps its CRDT data with a Merkle tree. The Merkle root is the canonical state hash. When a CRDT is modified, the Merkle tree is rebuilt.
Hash gossip: Nodes broadcast their Merkle root hash (~65 bytes) via gossip. Receivers compare hashes to detect state divergence.
TreeSync: When hashes mismatch, TreeSync uses the Merkle tree to efficiently identify and transfer only the changed leaves. The client sends its top 7 levels (~4KB); the server walks divergent subtrees and returns only changed data.
Generation integrity: The preboot verifies kernel generation images via Ed25519 signatures over a SHA-256 content hash. This is a straightforward hash-and-verify, not a Merkle tree -- there's only one item to verify, not a dataset to sync. Merkle trees are for efficient comparison of large datasets across nodes; generation verification is a simple signature check. If generation images later contain multiple components that need independent verification (kernel, initramfs, microcode, config as separate artifacts), a Merkle tree over those components could make partial updates efficient. Until then, a flat hash is the right tool.
Alternatives
Full-state transfer: Send the entire dataset on every sync. Simple but wasteful. FortrOS's initial gossip implementation did this (650-2400+ byte broadcasts). Replaced with hash gossip + TreeSync.
Vector clocks / version vectors: Track which operations each node has seen. Used by CRDTs internally for causality tracking. Complementary to Merkle trees (vector clocks track causality, Merkle trees detect content divergence).
Bloom filters: Probabilistic data structure for set membership. Can determine "definitely not in set" or "probably in set." Useful for approximate sync but not exact. Merkle trees are exact.
Links
- Merkle tree - Wikipedia
- How Git uses Merkle trees
- Amazon Dynamo Paper (PDF) -- Section 4.7 describes Merkle tree-based anti-entropy