Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value. It is also used in many encryption algorithms.
A hash function is any function that can be used to map digital data of arbitrary size to digital data of fixed size, with slight differences in input data producing very big differences in output data.
Hash algorithms are one way functions. They turn any amount of data into a fixed-length "fingerprint" that cannot be reversed.
Collision / Hash Collision
The hash function is used to index the original value or key and then used later each time the data associated with the value or key is to be retrieved. Thus, hashing is always a one-way operation. There's no need to "reverse engineer" the hash function by analysing the hashed values. In fact, the ideal hash function can't be derived by such analysis. A good hash function also should not produce the same hash value from two different inputs. If it does, this is known as a collision.
The MD5 message-digest algorithm is a widely used cryptographic hash function producing a 128-bit (16-byte) hash value, typically expressed in text format as a 32 digit hexadecimal number. MD5 has been utilized in a wide variety of cryptographic applications, and is also commonly used to verify data integrity.
SHA1 stands for Secure Hash Algorithm makes a larger (160-bit / 20-byte) message digest and is similar to MD4. A SHA-1 hash value is typically rendered as a 40 digits long hexadecimal number.
Hashing is great for protecting passwords, because we want to store passwords in a form that protects them even if the password file itself is compromised, but at the same time, we need to be able to verify that a user's password is correct.
The general workflow for an account registration and authentication in a hash-based account system is as follows:
- The user creates an account.
- Their password is hashed and stored in the database. At no point is the plain-text (unencrypted) password ever written to the hard drive.
- When the user attempts to login, the hash of the password they entered is checked against the hash of their real password (retrieved from the database).
- If the hashes match, the user is granted access. If not, the user is told they entered invalid login credentials.
- Steps 3 and 4 repeat everytime someone tries to login to their account.
In step 4, never tell the user if it was the username or password they got wrong. Always display a generic message like "Invalid username or password." This prevents attackers from enumerating valid usernames without knowing their passwords.
It should be noted that the hash functions used to protect passwords are not the same as the hash functions you may have seen in a data structures course. The hash functions used to implement data structures such as hash tables are designed to be fast, not secure. Only cryptographic hash functions may be used to implement password hashing. Hash functions like SHA256, SHA512, RipeMD, and WHIRLPOOL are cryptographic hash functions.
We can randomize the hashes by appending or prepending a random string, called a salt, to the password before hashing.The salt does not need to be secret. Just by randomizing the hashes, lookup tables, reverse lookup tables, and rainbow tables become ineffective. An attacker won't know in advance what the salt will be, so they can't pre-compute a lookup table or rainbow table. If each user's password is hashed with a different salt, the reverse lookup table attack won't work either.
Salted Password Hashing - Doing it Right