I was reached out to through the local police force by the Santander branch in Stockton following on from our last talk to come in to talk to some of their business customers about staying safe online and avoiding falling victim to phishing attacks. The shape of the talk was very similar to my last one, but a few more questions came up that I thought it worthy of writing up here and addressing.

Password Hashing

This question came up after I had finished presenting, to more than a few affirming murmurs from the attendees:

"You mentioned 'password hashing', what's that?"

This really drove home that password hashing, to people coming from outside the field of computing, is still largely a mystery. It needn't be this way.

Understanding how your passwords are (or should be) stored by websites is absolutely vital to staying vigilant about how this sensitive piece of data is (or is not) protected from theft. Barely a week goes by these days without a data breach announced in the news, in which customer data (usually including passwords in some form) is stolen by criminals that breach a company's IT infrastructure and gain access to their customer database. Once this data has been stolen, however, there is still one last line of defence to keep your password safe. Enter the hash function.

Hash Function? What?

If you take nothing else away from this post, I'd like this to stick in your mind:

Websites (or any digital system) should never, ever under any circumstances store your password in a way that allows them to know what it is. Ever.

There is no excuse for this. End of story. Only you should know what your password is. A website needs to have a way to check your password is correct when you enter it, but that doesn't mean that it needs to know your password.

A hash function is what we call in computer science a one-way function. It is very easy to put something into it and get something out the other side, but much more difficult to look at what comes out and figure out what went into it originally.

For some context, think about mixing paints. A certain combination of paints gives a unique colour, but can we go the other way and get the original colours back? Of course we can't. The recipe for making that colour is secure now, only we know how to make it, and even if a thief steals some of our paint, they can't make it themselves because they can't go back to the original recipe. All they can do is guess over and over again to try to make the exact same colour, wasting time and lots and lots of paint in the process on an almost impossible task. This is the principle of the hash function.

Here's the popular hash function SHA-256 (standing for Secure Hash Algorithm, 256-bit) being used to irreversibly "mangle" or "hash" the password "myVerySecurePassword" in human-readable form (which we call the plaintext) and turn it into the hash of that password. Our password is to the hash below what the recipe is to the final paint colour in our analogy above. Going forwards is easy, going backwards is next to impossible.

"myVerySecurePassword" -> [SHA-256] -> "e86fa9b3b7fe815ce7ea227eef4c0e7b767e1b202c41e28de36239e2ed55514c"

You can see the hash at the end here; that big long string of numbers and letters. The important thing is that we can't take that hash and get the original password back. We have to just keep guessing again and again, putting password after password through SHA-256 and seeing if the hash we get back matches. Even the owner of the website can't recover the plaintext of our password, even if they wanted to.

"But the how does the website check my password is correct if it doesn't know my password?"

That's more simple than you might think! It just puts your plaintext password through that same hash function and sees if the hash that comes out is the same as the one it has in the database, then immediately forgets your plaintext password without storing it anywhere. If the hash is stolen, that's not good news, but your password is still safe if your password isn't easily guessable and the website uses hashing properly.

I Get It! So I'm Safe Right?

If only it were that simple. The fact remains that many, many websites use old, broken hash functions that people have found a way to beat and recover plaintext from (a function called "MD5" is particularly famous for this weakness), or even commit the cardinal sin of storing your password in plain text rather than hashed.

If a company does this, they are not taking your security seriously and are being negligent with your information. There is no excuse for this bad practice. If a website has ever e-mailed you your plaintext password (rather than asking you to reset it) or asked for your password (or part of it) over the phone, they are storing your password in plain text and do not deserve your business until they improve their security practices. A website called Plain Text Offenders is devoted to calling out websites that are negligent in this regard.

You also have a responsibility, for your own security, to make sure your password isn't easy to guess. Passwords like "charlie", "password", "P@$$w0rd!" or even the one used earlier "myVerySecurePassword" are easily-guessable choices. We'll cover choosing and re-using (or rather, not re-using) passwords in the next part of this multi-part post.

Note: I have not covered topics like salting/peppering hashes, hashing multiple times, or mentioned a wider variety of hash functions to keep this article as accessible as possible to as wide an audience as possible. If you think I've missed something vital, please let me know and I'll get to it as soon as I can.