Hashing and Collisions in Security

December 15, 2017

Hashing

Hashing is a transformation of a string to an equivalent 128-bit hash value which is a hash. A hashing algorithm is used for this transformation. Examples of hashing algorithms are MD5, SHA1, SHA2, SHA3 etc,

The core functionality of the hashing algorithm is irrespective of the length of your string, it generates a 128 bit unique hash for your String. Hash is also termed as a digest.. Hashing is a uni-directional process. You can only generate hashes out of String, but, we cannot do the reverse process of generating string out of hash.

In Real world, you use hash/digest to transport files securely. When the sender transports file, they send the original file along with an encrypted hash. Encrypted hash is nothing but your digital signature (equivalent to your signature on any paperwork like contracts etc.,). So, you send a file with your signature written digitally. Here, the encrypted hash(aka. your (D)digital signature) ensures that sender has the ownership of the content written in the file which the sender cannot deny.

So, the sender expects this file not to be corrupted in between by a third person(hacker) because he has his encrypted has (digital signature) is in place. Let's assume if a third person(man in the middle/hacker) modifies something in your file. Then, a different hash will be generated for the file, which eventually changes the digital signature of the file. Now, the modified file has the hacker's signature.

When the file is received by the receiver, the receiver has a mechanism to identify the sender by unlocking the sender's hash using the public key of the sender. In this scenario, once he unlocks, the receiver sees a different hash generated for the file which does not match the hash decrypted from the digital signature. So, he gets to know the file got corrupted in the middle by a hacker/third-person. This whole process is represented in the below diagram.
(Note: the public and private key is a different concept, for time being, please assume, the sender has a pair of keys, 1 is private, 1 is public. He keeps private key with him and he shares the public key to receiver)

Collisions

Treat the text area in blue as a separate file, Yellow text area as a SHA1 algorithm, Rose text area as hash or digest which is equivalent to the text file. If you observe the above diagram carefully, even for a small text change in your input(blue), your hash algorithm(Yellow) generates a different hash. Suppose, if your hash algorithm(yellow) generates the same hash(Rose) for 2 different inputs(Blue), then that is called hash collision.

A strong hashing algorithm should not generate the same hash key for 2 different files (may be almost identical files). If it generates then ur hash algorithm is weak. This is what is happened to the SHA1 algorithm. Let me explain this with a simple example

Since it is difficult for hackers to break down the algorithm which generates a false copy of any existing file of same hash. Instead, they have chosen an approach, where they could generate 2 files with 2 different texts ("I owe 1000rs", "I owe 10rs") which could generate the same hash "8080 1231 3131 ..." through some computational research on the algorithm. Google has used this approach to crack down the SHA1 algorithm and published it on the web. So, using this approach, attackers targeted the shared repositories like git, google drives etc., for uploading the malicious files generated due to hash collisions in SHA1.

So, organizations which are using SHA1 has already migrated to SHA2 and SHA3 algorithms. The newer SHA algorithms are capable of generating a bigger hash (256-bit) which are relatively difficult to crack by the hackers and the existing algorithmic flaws in the SHA1 is also corrected in the latest releases.

However, for preventing the SHA1 collision issues, Microsoft research team has built a tool which has the knowledge of identifying wrong hashes. Organizations which are still using SHA1 can be benefited using this tool. This tool identifies when the sender tries to upload a file which evaluates to a wrong hash.

The conclusion is, if you have 2 docs which generate a single hash, you will lose the authenticity/ownership of the doc. In general terms, a hash is just like your signature, you cannot have 2 check leaves with same check leaf no with 2 different amounts signed by you!!!!!.

Blogs on Java

Hashing and Collisions in Security

Comments

Post a Comment

Popular posts from this blog

Distributed database design using CAP theorem

LDAP - Basics

SQL Analytical Functions - Partition by (to split resultset into groups)