Credentials storage
When it comes to storing credentials such as passwords in a database, it is centainly not a good idea to store them as plain text. If the database is compromised and some unauthorized outside gets access to it, then all those important data are leaked.
Storing credentials as plain text not being an option leads us to the idea of storing some representation of them instead of the actual thing.
info
The idea here is that instead of storing the actual passowrd, you would store some representation of it instead. Wheneve the user tries to authenticate herself, the user-entered password is transformed into the utilized representation by the app server before being checked against what is stored in the database. If both representations (the one computed by the app server and the one currently stored in the database) are the same, it is concluded that the entered password is correct.
There are two main approaches to create secure representations for credentials: encryption and hash encryption (or simply hashing).
Encryption
Encryption is the process of scrambling text using a key so it can only be read by the person who has a secret code, or the decryption key. In other words, encryption is the act of encoding plain text into what we call ciphertext, where only authorized parties/entities can decipher the ciphertext back to original.
tip
It is important to note that encryption is a two-way function, meaning that with the correct key it is possible to decipher the ciphertext and reproduce the original text. This is not a good thing necessarily (and can become a weakness), especially when it comes to storing credentials. The reason is that the app server must have the decryption key so that it can decrypt the encrypted info. An attacker who gets access to the database and steals the encrypted passwords might very well also steal the key.
Hashing
Hashing, on the hand, is a one-way function (i.e. hash function) that scrambles plain text to produce a fixed-size message digest called the hashed massage or simply the hash.
tip
Hashing is the preferred approach when it comes to storing credentials in a database. The primary reason is that it is not possible to reveal the actual credentials even if the hashes were revealed, leadig to a better/higher security.
It is important to note the followings about hashing1:
The same input text generates the very same hash value, and is the only way that the value can function as a checksum. Is the entered password identical to what’s saved in the database? The system may only grant access if both hash values are the same.
A hash value should always be unique, so different entries can’t generate the same hash value. Only in this way can the function make sure that the correct password was also entered. Since the number of possible hash values is limited, but the number of possible entries isn’t, such collisions can’t be excluded. Modern hash functions and hashes with a sufficient length minimize the risk as much as possible.
Hash values can’t be recalculated: original content can never be derived from the hash value itself. This is why hash values can’t also be decrypted, as is sometimes vaguely claimed. Instead, hash values can only be comprehended.
Hash functions have to be relatively complex – but not too complex: to ensure security, an algorithm can’t work too quickly, because that would also make the work easier for attackers. The conversion also shouldn’t be too complex, as it does still need to be applied in practice.
Attackers can still have chances against hashes via a variery of attacks such as dictionary attacks or rainbow table: 2
Dictionary attack
Dictionary attack is the simplest form of attack possible on a hash function. We simply store for each possible input the corresponding hash. Then, given a hash, we can look it up in our dictionary, and find the matching input.
Rainbow Table
A rainbow table is a linked list of precomputed hash chains used for reversing cryptographic hash functions in order to crack password hashes. Having generated such a table attacks are then carried out by finding the hash and looking it up in the table to find the corresponding plaintext.
Solution
So, how can we prevent/minimize such attacks? One standard approach is to use a so-called salt in our hashing process.
Salt
A salt is random (can be user defined too!) string that is used as an additional input to the one-way hashing function to produce the hash. Typically, a new salt is randomly generated for each password/user.
Hashing algorithms
There is a wide variety of hashing function algorithms out there. MD5, SHA-x (e.g. SHA-1, SHA-2 etc.), Bcrypt, Scrypt are just a few to name. Some of these are considered old now (e.g. MD1, SHA-1) and it is advised not to use them. Bcrypt is a safe bet and provides a good well-rounded hashing encryption these days. There are Java implementations of Bcrypt, such as this, that you can use in your project if needed.
Let us assume you want to utlize Bcrypt to handle passwords in your app. First, add the following dependency to your build.gradle
:
Once this is added to your project, you can use Bcrypt to encrypt any string (e.g. password) and/or verify it back. Following is a very simple example: