Blockchain: Unhashing the Hash Functions— Part 1

0
59 Blockchain: Unhashing the Hash Functions — Part 1

Building on the previous post, we will look more closely at the features of Hashing Functions and how they get these features. Since the topics are complicated I have divided it into two posts instead of one. The concepts I am going to cover in this post can honestly be skipped, as long as you are happy to take Hashing Functions on faith. If not, read ahead…

Computers live on 0s and 1s

Us humans use the Decimal System for counting, which uses 10 digits 0–9. This came naturally to us, because most humans tend to have 10 fingers, and it was easy to count using fingers.

Computers run on electrical current, and they use the Binary System, which uses 2 bits 0 & 1. This comes naturally to the computers, because either there is a current (1) or there is no current (0).

Both Decimal & Binary systems are compatible systems, meaning what can be written in Decimal can also be written in Binary. But Binary has a drawback, it becomes very long to even represent small numbers. So we developed a new compatible system, Hexadecimal System, which shortens the Binary representation by 4 times. Hexadecimal uses 16 hex-digits, 0–9 and a-f.

The Hash Outputs in the previous post were the Hexadecimal representations of long Binary Numbers generated by computer based Hashing Functions. To reiterate —

• For any computations we can see or do in the Decimal System, there exists an equivalent computation in the Binary System
• Computers operations generate large Binary numbers which we can shorten using the Hexadecimal equivalent of the output

Features of Hashing — Part 1

The features of Hashing discussed below come from incredibly complex mathematics. My aim is not to uncover the exact inner workings of Hashing, rather I would like to show you that, in principle, it is possible to build these features using simpler concepts. The real world is going to be very different from what we discuss below, but that doesn’t mean it has to be difficult to understand. And that is the goal — to understand!

Author’s Note: I am going to use Decimal System to drive the mathematical concepts, as it is easier to relate to calculations in the Decimal System. Just remember that everything we do here, computers would do in the equivalent Binary System

Feature 1: One Way-ness

Hashing is an irreversible function that takes any input and produces an output; you cannot recreate the input from the output

We have all seen two versions of clocks — the 12 hour format and the 24 hour format. When the wall clock reads 2:00, it could be 2 AM or 2 PM, represented in 24hr format as 02:00 and 14:00 respectively.

Why does both 02:00 and 14:00 in the 24hr format get represented as 02:00 in 12hr? This is the result of using a piece of arithmetic called the Modular Arithmetic, which simply means output the remainder of the division. For clocks, this means divide the time in 24hr format(i.e. 14) by 12 and the remainder (i.e. 2) is output which represents time in 12hr format. This huge sentence can simply be represented as — 14 mod 12.

This creates a conundrum though. If you were stuck in a room with no windows and just a wall clock that looked like below, would you be able to say if it was dark outside with absolute certainty?

Now let’s play a game! I have a number in my mind, x. I tell you that x mod 12 = 5. What is the number is my mind? You have 3 guesses…

If the number on my mind was 5, x = 5, chances are very high that you guessed it correctly. If x = 2249, highly unlikely that you guessed that. If x = 10559585774837, I can be 100% certain you never thought that!

So in a way, Modular Arithmetic is one way. For the same output there are many possible inputs, so you can’t guess the correct input of all the infinite options. And Hash Functions use this to achieve One Way-ness.

Feature 2: Compression

Irrespective of the size of the input, the output is always a random looking alphanumeric string (a series of letters & numbers) of fixed length

This one is quite simple to see. If you x mod 12, for any input x, output will always be between 00 and 11. If you x mod 257, for any x, output will always be between 000 and 256. Of course, real Hash Functions don’t just use mod for compression. But its really easy to see that its entirely possible using mod alone.

Feature 3: Input Sensitivity

Hashing is extremely sensitive to input, even slightest of change to the input results in a wildly different output; output doesn’t change if the input doesn’t change!

It is pretty evident that mod, won’t achieve this. So we have to be a little creative. Let’s add a number to itself, only there is a small trick, we shift the number to left by 1 digit before the addition. This shift-addition process magnifies the change in input by the number of shifts.

Hashing Numbers & Potato Hash Browns

In addition to shift-addition, we can also do chopping, dicing and mixing of the input number, which further increases the input sensitivity. For example, we can swap every digit with their neighbor in the first shift-addition, then swap the first half with the second half of the entire number in the second shift-addition. This is similar to what is done in the hash brown recipe. We boil the potatoes, crush them, mix it all up and then add it all together into the hash browns — we Hash the potatoes. Here we Hash the numbers! The mathematics of creating Hash Browns from the a number input! See how different the outputs are, even though the different in inputs is so small.

Real world Hashing Functions make use of multiple steps of Hash and mod in a predetermined fashion. Every now and then, there is also a step of multiplying with a large prime number, just for additional randomness. Then there is more Hash and mod. Over and over again, until the output looks nothing like the input at all.

In Conclusion

Overall, the Hashing Functions work like this -

• Everything, numbers or text, in Computers is a series of 0s and 1s, it is a Binary System based machine
• Hashing Functions take a Binary Number as an input
• They do steps like swapping the digits internally, adding the number to itself, multiplying it with another large number
• Every now and then, the Hashing Function also take a mod but they use a very large prime number as base
• Then repeat the digit swapping, addition & multiplying operations, take a mod again, repeat the swapping, addition & multiplying, mod again….
• Finally, the output is the Hash Output for the given input

It is complicated, I am not denying that. But in essence, this is what is happening behind the scenes.

For now, we will stop here. Next time, we will cover the fourth feature and perhaps the most important feature of Hash Functions — Collision Resistance. 