Merkle Trees

What It Is

A Merkle tree is a tree of hashes. Each leaf node is a hash of a data block. Each non-leaf node is a hash of its children's hashes. The root hash is a fingerprint of the entire dataset.

Named after Ralph Merkle (1979), Merkle trees are used everywhere: Git commits, Bitcoin blocks, BitTorrent piece verification, Amazon Dynamo, and FortrOS state synchronization.

Why It Matters

Merkle trees answer one question extremely efficiently: "Do we have the same data?" Two nodes compare their root hashes (32 bytes). If they match, the datasets are identical. If they differ, the nodes walk down the tree to find exactly which leaves differ -- without transmitting the entire dataset.

For a dataset with 1 million entries where 1 entry changed, comparing root hashes identifies the mismatch in 1 comparison. Walking the tree to find the changed entry takes ~20 comparisons (log2 of 1 million). Without a Merkle tree, you'd need to compare all 1 million entries.

How It Works

Building the Tree

        Root: H(H12 || H34)
       /                    \
    H12: H(H1 || H2)     H34: H(H3 || H4)
    /         \           /         \
  H1: H(D1)  H2: H(D2)  H3: H(D3)  H4: H(D4)

H() is a hash function (SHA-256). D1-D4 are data blocks. Each leaf is the hash of its data block. Each internal node is the hash of its children. The root hash summarizes the entire dataset.

Detecting Changes

If D3 changes:

H3 changes (new hash of the new data)
H34 changes (new hash of H3 || H4)
Root changes (new hash of H12 || H34)
H12 does NOT change (its children didn't change)

By comparing from the root down, you can pinpoint exactly which subtree changed without examining the unchanged half.

Efficient Sync

Two nodes syncing via Merkle trees:

Compare roots: different -> proceed
Compare level 1: H12 matches, H34 differs -> only sync right subtree
Compare level 2: H3 differs, H4 matches -> only transfer D3

For FortrOS with 10,000 nodes: syncing 1 changed node's metadata requires ~4KB of Merkle tree exchange instead of ~820KB of full state. The savings grow with cluster size.

How FortrOS Uses It

StateTree<T>: Each replicated state tree wraps its CRDT data with a Merkle tree. The Merkle root is the canonical state hash. When a CRDT is modified, the Merkle tree is rebuilt.

Hash gossip: Nodes broadcast their Merkle root hash (~65 bytes) via gossip. Receivers compare hashes to detect state divergence.

TreeSync: When hashes mismatch, TreeSync uses the Merkle tree to efficiently identify and transfer only the changed leaves. The client sends its top 7 levels (~4KB); the server walks divergent subtrees and returns only changed data.

Generation integrity: The preboot verifies kernel generation images via Ed25519 signatures over a SHA-256 content hash. This is a straightforward hash-and-verify, not a Merkle tree -- there's only one item to verify, not a dataset to sync. Merkle trees are for efficient comparison of large datasets across nodes; generation verification is a simple signature check. If generation images later contain multiple components that need independent verification (kernel, initramfs, microcode, config as separate artifacts), a Merkle tree over those components could make partial updates efficient. Until then, a flat hash is the right tool.

Alternatives

Full-state transfer: Send the entire dataset on every sync. Simple but wasteful. FortrOS's initial gossip implementation did this (650-2400+ byte broadcasts). Replaced with hash gossip + TreeSync.

Vector clocks / version vectors: Track which operations each node has seen. Used by CRDTs internally for causality tracking. Complementary to Merkle trees (vector clocks track causality, Merkle trees detect content divergence).

Bloom filters: Probabilistic data structure for set membership. Can determine "definitely not in set" or "probably in set." Useful for approximate sync but not exact. Merkle trees are exact.