To fully understand blockchain, it helps to have a good understanding of what is known as a cryptographic hash. It is this hash that is at the very core of how blockchain works.
If you read my introduction to blockchain lesson, you would see that the Hash is part of every block. It serves as a unique identifier for the block. That is the general idea for all Hashes, whether cryptographic or not.
It would actually be more proper for me to refer to it as a Hash Function. It is a function where you pass in text, a document, a picture, anything digital, and the function will return a unique identifying number.
Hash functions were used even before cryptography was an issue. You may have heard the term Hash Tables, this is where computer programmers would store a table full of hashes indicating text or documents, instead of having to store large documents in memory.
The Hash tables worked kind of like this:
You would pass some text to the Hash Function and it would transform it to a hash.
Ben -> 0123 Data Science -> 4871 Analytics is a great field of study -> 2580
These hashes were then placed in a table and when the program needed to access the information, it used the Hash to look it up – kind of like an index in the back of a book.
Now the Hash Functions used by Blockchain are cryptographic, so they are a little different than just a simple hash table. In order for a hash to be cryptographic, it needs to follow some basic rules.
- It has to be 1 way. If you pass the text “Analytics” to a cryptographic hashing function it will return a hash(let’s say 1234). It will return the same hash every time you pass the text “Analytics” to it (again: 1234). So the hash function knows what hash to create for that text. But, we want to ensure that if someone has the hash 1234, they cannot reverse it to obtain the text (they can’t reverse engineer it)
Think of it like a finger print. If I have person, I can always obtain a fingerprint. However, if all I have is a fingerprint, I cannot produce a person from it. A finger print on its own won’t let me derive information like eye color or hair color of the individual who left print behind.
- The hash function needs to be fast. You will understand better when we get to mining, but blockchain miners are passing millions of a hashes a second. With a slow hash function, the entire concept would fail.
- The hash needs to ensure that similar items do not receive similar hashes. The example below shows what we do not want.
With the hashes 1234 and 1235 being so close as well as the text being so close, it would make it possible to reverse engineer a hash. For example, if you knew that Analytics4all was 1236, you might be able to back track the hash until you hit analytics at 1234
Instead we want something like this:
You see the similar text do not get similar hashes. Let’s look at it through numbers
Now different blockchain applications will use different cryptographic hash functions, For example, Etherium uses MD-5. BitCoin, however uses SHA-256, so that is what I will focus on.
SHA-256 was created by the NSA and as of this writing, it has not been cracked. The code for SHA-256 is open source, so anyone can make use of it.
A SHA-256 HASH is 64 hexidecimal digits long (64 digits * 4 bits per digit = 256).
Here is a SHA-256 HASH
If you want to try it out, go to this website
Below, I pass the text Analytics to the hash generator and I get a 64 digit hash returned
Now when I just add the number 4 to the end of my text, the hash completely changes. The two hashes do not look anything alike. Again, this is designed to help reduce the chance of someone reverse engineering the hash.
Just to drive the point home. Here is removed the s from the end of the text. Note the new hash is again completely different.
Go to the website. Try out some hashing for yourself.
In the next lesson I will cover the concept of an imputable ledger. This will take us one step closer to understanding Blockchain.