Python: Create a Blockchain Hash Function

If you are at all like me, reading about a concept is one thing. Actually practicing it though, that helps me to actually understand it. If you have been reading my blockchain tutorial, or if you came from an outside tutorial, then you have undoubtedly read enough about cryptographic hashes.

Enough reading, let’s make one:

( if you are unfamiliar with crytographic hashes, you can reference my tutorial on them here: Blockchain: Cryptographic Hash )

For this example, I am using the Anaconda Python 3 distribution.

Like most things in Python, creating a hash is as simple as importing a library someone has already created for us. In this case, that library is: hashlib

So our first step is to import hashlib

import hashlib

Now let us take a moment to learn the syntax require to create a cryptographic hash with hashlib. In this example, I am using the SHA 256 hashing algorithm. I am using this because it is the same algorithm used by BitCoin.

Here is the syntax used

hashlib.sha256(string.encode()).hexdigest()

To understand the syntax, we are calling the hashlib method sha256(): hashlib.sha256()

Inside the brackets, we are entering the string we want to encode in the hash. Yes it must be a string for this function to work.

Still inside the brackets we use the method .encode() to (surprise, surprise) ENCODE the string as a hash

Finally, I added the method .hexdigest() to have the algorithm return our hash in hexadecimal format. This format will help in understanding future lessons on blockchain mining.

So in the example below, you can see that I assigned the variable x the string ‘doggy’. I then passed x to our hash function. The output can be seen below.

2018-04-19_15-52-05.png

Now a hash can hold much more than just a simple word. Below, I have passed the Gettysburg Address to the hashing function.

(**note the ”’ ”’ triple quotes. Those are used in Python if your string takes up more than one line **)

2018-04-19_15-54-59

Now I try passing a number. You will notice I get an error.

2018-04-19_15-55-41.png

To avoid the error, I turn the integer 8 into a string with the str() function

2018-04-19_15-56-17.png

Below I concatenation a string and an integer.

2018-04-19_15-57-13.png

Last I want to show the avalanche effect of the hash function.

2018-04-19_15-58-01

By simply changing the first letter from an uppercase T to a lowercase t the hash changes completely. This is a requirement for hashing functions. If the hash did not change dramatically from a small change to the string, it would be easy to reverse engineer the hash. This is known as the avalanche effect.

2018-04-19_15-58-27

 

Blockchain: Cryptographic Hash

To fully understand blockchain, it helps to have a good understanding of what is known as a cryptographic hash. It is this hash that is at the very core of how blockchain works.

If you read my introduction to blockchain lesson, you would see that the Hash is part of every block. It serves as a unique identifier for the block. That is the general idea for all Hashes, whether cryptographic or not.

2018-04-04_12-39-34

It would actually be more proper for me to refer to it as a Hash Function. It is a function where you pass in text, a document, a picture, anything digital, and the function will return a unique identifying number.

Hash functions were used even before cryptography was an issue. You may have heard the term Hash Tables, this is where computer programmers would store a table full of hashes indicating text or documents, instead of having to store large documents in memory.

The Hash tables worked kind of like this:

You would pass some text to the Hash Function and it would transform it to a hash.

Ben -> 0123
Data Science -> 4871
Analytics is a great field of study -> 2580

These hashes were then placed in a table and when the program needed to access the information, it used the Hash to look it up – kind of like an index in the back of a book.

Now the Hash Functions used by Blockchain are cryptographic, so they are a little different than just a simple hash table. In order for a hash to be cryptographic, it needs to follow some basic rules.

  1. It has to be 1 way. If you pass the text “Analytics” to a cryptographic hashing function it will return a hash(let’s say 1234). It will return the same hash every time you pass the text “Analytics” to it (again: 1234). So the hash function knows what hash to create for that text. But, we want to ensure that if someone has the hash 1234, they cannot reverse it to obtain the text (they can’t reverse engineer it)

Think of it like a finger print. If I have person, I can always obtain a fingerprint. However, if all I have is a fingerprint, I cannot produce a person from it. A finger print on its own won’t let me derive information like eye color or hair color of the individual who left print behind.

  1. The hash function needs to be fast. You will understand better when we get to mining, but blockchain miners are passing millions of a hashes a second. With a slow hash function, the entire concept would fail.
  2. The hash needs to ensure that similar items do not receive similar hashes. The example below shows what we do not want.
Text Hash
Analytics 1234
Analytics4 1235

 

With the hashes 1234 and 1235 being so close as well as the text being so close, it would make it possible to reverse engineer a hash. For example, if you knew that Analytics4all was 1236, you might be able to back track the hash until you hit analytics at 1234

Instead we want something like this:

Text Hash
Analytics 1234
Analytics4 8476

 

You see the similar text do not get similar hashes. Let’s look at it through numbers

Number Hash
20 8463
21 1258
22 6581
23 0874

SHA-256

Now different blockchain applications will use different cryptographic hash functions, For example, Etherium uses MD-5. BitCoin, however uses SHA-256, so that is what I will focus on.

SHA-256 was created by the NSA and as of this writing, it has not been cracked. The code for SHA-256 is open source, so anyone can  make use of it.

A SHA-256 HASH is 64 hexidecimal digits long  (64 digits * 4 bits per digit = 256).

Here is a SHA-256 HASH

8BC775C7EFAACAD6AFF7CED25E0A793EF7DD2C5B0652EF5F85BA02FF57407A2B

If you want to try it out, go to this website

https://passwordsgenerator.net/sha256-hash-generator/

Below, I pass the text Analytics to the hash generator and I get a 64 digit hash returned

2018-04-06_14-09-56.png

Now when I just add the number 4 to the end of my text, the hash completely changes. The two hashes do not look anything alike. Again, this is designed to help reduce the chance of someone reverse engineering the hash.

2018-04-06_14-10-56.png

Just to drive the point home. Here is removed the s from the end of the text. Note the new hash is again completely different.

2018-04-06_14-11-23.png

Go to the website. Try out some hashing for yourself.

In the next lesson I will cover the concept of an imputable ledger. This will take us one step closer to understanding Blockchain.

Blockchain: An Introduction

Unless you’ve been living under a rock, you have undoubtedly heard the  term blockchain being batted around. Most likely you’ve heard of it in relation to cryptocurrency like BitCoin, but blockchain is quickly moving into many other areas. Much like the Big Data craze though, my personal experience has been the more you hear someone utter the term blockchain, the less that person actually knows about it. It is the Dunning-Kruger Effect in action.

In this lesson, I am going to introduce you to the concept of blockchain. I have boiled it down to its simplest concepts, and I will be speaking very broadly about the subject. In future lessons I will dive deeper into the more technical aspects of blockchain, providing much more in the way of specifics.

For those already familiar with blockchain, I am aware that I am glossing over some rather important concepts here, but my goal in this lesson is to provide a simple, easily understood tutorial. The goal of my website is to provide an accessible education into many of the complex concepts surrounding analytics and data science to everyone, regardless of past experience or education. I promise a deeper dive in the future, but for now, let us start simply.

What is a blockchain?

At the most basic level, a blockchain is a collection of data kept in a list.

2018-04-04_12-41-14

What makes these lists so interesting is that:

  1. They are connected using cryptography
  2. They are distributed amongst multiple computers, providing a redundant method of protection

To understand how they work, let us start by looking at a block

2018-04-04_12-39-34

Starting from the top

Block #: is just the number of the block in the chain, first block # is 1, second is 2, so on

Nonce: stands for “number used only once”. I am going to cover the Nonce in depth in the lesson on Mining, but for right now, just be aware that it is a number and every block needs a Nonce

Data: This is where the data is stored. In BitCoin this is often filled with transaction information

Previous Hash: The hash of the block before this one

Hash: I will cover hashes in depth in a future lesson two. For now, just know this the cryptographic part of blockchain. A hash is a code number that identifies the block. An easy way to conceptualize it is to think of it like a VIN on a car. The VIN (or vehicle identification number) can be used to tell you the make, model, color, and many other characteristics of a car. The hash (in blockchain) will tell you everything found in the block (I know I am way over simplifying here, but hey we have to start somewhere, and I promise a future lesson on hash)

Now let’s add a second block to our chain

Notice the previous hash in block 2 is the same as the hash in block 1. This, you will soon see, is part of what makes blockchain so secure.

2018-04-04_12-40-26_1.png

In the picture below, you can see if someone tried to go into Block 1 and make a change, the Hash for block 1 would change. Any change to the first four fields in a block will cause the Hash to chance since the block is no longer the same anymore. The Hash is like a VIN or a fingerprint, it can only represent a single individual block. And once you change any aspect of a block, it is no longer the same individual block anymore.

2018-04-04_13-21-53.png

So as you can see, if the Hash in the first block changes, it will no longer match the Previous Hash in the second block. When this happens the blockchain is broken. So if a hacker tried to alter a transaction in block 1, the chain would break.

Okay, so what is to prevent the hacker from just changing the second block? Aside from hashing issues that we will discuss in future lessons? The other deterrent is found in peer to peer sharing of blockchains

In most real world applications of blockchain, the chain will not reside on only one computer, instead will be replicated across multiple computers.

2018-04-04_12-41-29

So now if a hacker tries to change the third block on one computer.

2018-04-04_12-41-57

The third blocks hash will change, breaking its connection from the fourth block

2018-04-04_12-42-17.png

And even more importantly, the chain will no longer look like the one on the second computer. Now in real life, this will be spread across thousands of computers. So to determine which blockchain is correct, they look to see what iteration the majority of computers say is correct.

So as seen below, 3 of the 4 computers show 4 blue squares, while only one was a yellow square. So based on the vote of the majority, the final result will be 4 blue squares.

2018-04-04_14-50-00.png

So there you have it, blockchain in its most simplistic form. In future lessons I’ll dive deeper in the different concepts to show how it works from the inside out.

 

 

 

 

Click below for an interesting link for BitCoin information

Alexus Security