The concept was introduced by Ralph Merkle. Merkle tree is a tree of hash(hash tree). Hash tree are the generalization of hash, in which the leaf node is the data block which is represented has hash, and the non leaf are also labelled hash but by combining the hashes of their child nodes and so on to the parent/root node.
Merkle tree are efficient and secure which allows verification of content across trustless networks. Merkle tree has large number of use cases, such as it can be used in distributed network like peer to peer. Merkle tree are used in blockchain technologies. They have been used in version controlling system such as GitHub and BitBuckets.
Merkle trees are very easy to understand.
Diagram is taken from brilliant org Here, the data the in the leaf node are hashed further their hashes are hashed again till all the non leaf nodes reaches the root nodes. To understand the benefits of doing the hashes is its let us verify the data efficiently in distributed and untrusted networks such as P2P.
Let say we have four chunks of data, which we wanted to distributed on the network, first we will apply the hash function on those data. Further, we continue the hash till the root node.
Root nodes are very important. Let say we wanted check for the validation of files, we only have to perform equality on the root nodes. The root nodes (A) that we received and the root node (B) that we already have. If any change in the leaf (data) node in any of the tree, it will reflects to the top (root node) and thus making the validity check to false. Here, the main advantage becomes that we didn’t need to check for the whole file and perform IO operation for validations. Only thing is needed, is to check for equality of two root node hashes.
Now, when talking about downloading a file from the distributed network, we wanted to make sure that the sender send the file with the root hash first, and then send the associated file with that root hash. Now, we have the files but the problem is that the files came from the untrusted network. Therefore, we have to verify that data.
Remember, we have received the root hash, thus we will reconstruct the Merkle Tree. And if the reconstructed root hash matches the received root hash then file is verified and we are 100% certain that the file received are good.
Second, way to achieve the same verification but this time more precise way is to get the root node, then receive the enough data to generate the non-leaf node hash, and the receive the non-leaf node hashes. Now, match the hash and generated hash to verify.
Lets talk about Git and it’s use of Merkle tree. First initialize git project by
git init Now peek inside object folder in .git folder, you will see only two folders info/ pack/, since we have not committed anything yet. Create any file like ‘something.txt’ and execute command
git add something.txt Now peek again inside the object folder, you will find folder, this folder contains the hash of ‘something.txt’ file. Lets commit the changes, and then again peek inside the objects folder, two folders is created, one contains the tree hash, and another contains hash of the committed with some meta informations.
This can be visualized has
Furthermore, Merkle tree has many other applications. The blockchain technology such as Cryptocurrencies like Bitcoin and Ethereum. In blockchain, all the transaction are store in single block, each blockchain exists on every user’s computer. Thus, Merkle tree are used in Bitcoin are typically hash of single blocks. Same as above, anyone tries to alter the blockchain, will reflects the chain and therefore invalidating the transaction.
In addition, the Merkle tree are also been used in maintaining the consistencies of data in database. Apache Cassandra and other NoSQL database used to detect the inconsistencies.