### Blockchain: Unhashing the Hash Functions — Part 2

In the second post of this series, we saw the first mention of **Hashing Functions **and in the previous post we took a deeper look into how the functions get some of those features. The features we have seen so far are —

- Irreversible due to
*Modular Arithmetic* - Fixed Length Output partially due to
*Modular Arithmetic* - Pseudo-randomness, meaning Output doesn’t look anything like Input due to
*Hashing*(cut, dice, mix, blend etc.) - Extreme input sensitivity due to
*Hashing*

In this post we will look at how the fourth feature of **Hashing Functions** can ensure that two different inputs don’t produce the same output —** Collision Resistance**. Unlike the last post, I would recommend that you read this one. Its not as complicated, and we will be building on this one in the next post. But first, a new concept…

**Pigeon Hole Principle (PHP)**

*If you have more pigeons than you have holes in the wall to house the pigeons, at least one of the holes will have to house more than one pigeon*

From Feature 2 in the last post we know that **Hash Output** is a fixed length number. Which means there is an inherent limit to the total number of different Hash Outputs we can generate. For example, if the Hash Output is limited to 4-digit number, then we only have 10000 possible Hash Outputs (*0000–9999*). However, there is no such limit on the number of possible inputs, there are more inputs than there are outputs. Thus by **PHP**, some inputs will have to map to the same Hash Output.

This is called **Collision**, two inputs *collide *at the same output. And we do not want collision in our **Hashing Function**.

### Feature 4: Collision Resistance

*Probability of two different inputs producing the same output is practically 0, each input produces a unique output, like a fingerprint*

We can’t really get around **PHP**, its a fact, an unavoidable truth about the world. But we can design things such that it can be avoided for as long as possible. Specifically there are two ways —

#### Method 1: Increase the number of all possible outputs

We saw that if we limit ourselves to 4-digit **Hash Output**, we have 10000 or 10⁴ possible outputs available. If we use a 78 digit long output, we would have 10⁷⁸ possible Hash Outputs,* which is larger than the total number of atoms in the known universe*. This is very doable and that is what **SHA-256** does.

SHA-256 or Secure Hashing Algorithm-256 is the Hashing Function used by the Bitcoin Blockchain.* BAM!* We just connected our make believe ‘blockchain’ with a real world Blockchain!

#### Method 2: Probing

Just because there are many possible **Hash Outputs**, doesn’t mean that we have avoided **Collision**. It is still possible for 2 random inputs to generate the same Hash Output. But it is possible to get around this. Lets store all the Hash Outputs ever generated in a** Hash Storage Table**, **HST**. Then we do the following for every new input —

- If new input generates a new & unseen Hash Output, we add it to the bottom of the
**HST** - If new input generates a Hash Output which is already in the
**HST**(*ah Collision happened!*), we change the input slightly to generate a new Hash Output, which (*hopefully!*) avoids collision - We keep modifying the input slightly until we find a collision free Hash Output

The ** Modifier **is a number appended to the end of the original input. We usually start with 0 and keep repeatedly increasing the modifier by 1 until we are able to find a

**Hash Output**that avoids collision. And this

*modifier*is stored separately for each input, because we want to keep the original input intact. This helps in

*reproducibility*— if the same input is provided again, we know what

*modifier*needs be applied to reproduce the output.

*Storing Modifier in Blocks*

*Storing Modifier in Blocks*

As we have already seen in prior posts, each block in the chain stores 3 different Hash Outputs. The *Data ID*, the *Previous Block ID* and the *Current Block ID*. Which means, for every new block in the chain, we are adding 2 new Hash Outputs — *Data ID* and *Current Block ID*.

We don’t really care much about the collision in *Data ID*. If two block’s have the exact same data, which is entirely possible and allowable, they are going to have the same *Data ID*. That is fine!

We can’t have collision in the *Current Block ID* though, because two blocks cannot have identical *Current Block ID*. So we need to perform ** Probing**, and we need to store the

**somewhere. The current block structure doesn’t have space to store it!**

*modifier*Author’s Note: Why can’t two different blocks have identical Current Block ID?

To start with, ** Modifier **is always empty for a new block. If collision occurs, we put 0 in the

*Modifier*and recalculate the

*Current Block ID*using

*Data ID*,

*Previous Block ID*and

*Modifier*as inputs to the

**Hash Function**. By including

*Modifier*, we have a different input which changes the

**Hash Output**, hopefully this solves collision. If, however, the collision still persists, we will increase the

*Modifier*value to 1. And we will keep on increasing

*Modifier*value by 1 each time until the collision is solved. Thus

*Modifier*increases by 1 until there is no

**Collision**in the block’s

*Current Block ID*. The chain is now

**collision resistant**!

### In Conclusion

Honestly, we do not need **Probing **for real world blockchains. Given the large number of possible Hash outputs (~10⁷⁸ possible outputs) and the smart and complex randomness built in **Hash Functions**, collisions never happen. I do not have space nor the mathematical ability to prove it to you, but it is true! We don’t really need ** Modifier **to prevent collisions in the real world. So why did I even bring it up?

The blocks in the blockchain still have a space for *Modifier*, called ** Nonce **which stands for

*Number Once**—*because each number is tried once! Every block has a

*Nonce*, and a whole lot of time, money and energy is spent on finding the right value of

*Nonce*. In the next article, why

*Nonce*exists and what purpose it really serves. See you then…

### Cliffhanger

What is **Proof-Of-Work**? Why are Bitcoin and Ethereum said to consume a lot of electricity?

Join Coinmonks Telegram Channel and Youtube Channel learn about crypto trading and investing

#### Also, Read

- MXC Exchange Review | Pionex vs Binance | Pionex Arbitrage Bot
- My Experience with Crypto Copy Trading | Coinbase Review
- CoinFLEX Review | AEX Exchange Review | UPbit Review
- AscendEx Margin Trading | Bitfinex Staking | bitFlyer Review
- Sparrow Exchange Review | Nash Exchange Review
- Uphold Card Review | Trust Wallet vs MetaMask
- Exness Review | MoonXBT Vs Bitget Vs Bingbon
- How to Start Earning Passive Income With Crypto Lending
- Cryptocurrency Savings Accounts | Crypto Trading Bots
- BigONE Exchange Review | CEX.IO Review | Swapzone Review
- Best Bitcoin Margin Trading| Bityard Margin Trading

Blockchain: Unhashing the Hash Functions — Part 2 was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.