Brief Analysis of the Gawker Password Dump
UPDATE 1: We've updated our analysis with approximately 200k additional cracked passwords.
UPDATE 2: We've launched a site that allows you to easily check if your username or email address was included in the Gawker password dump:http://didigetgawkered.com
UPDATE 3: Due to popular demand, we've posted the top 250 most common cracked passwords.
If you haven't heard yet, the Gawker Media network, which includes popular websites such as Lifehacker, Gizmodo, Jezebel, io9, Jalopnik, Kotaku, Deadspin, Fleshbot, and of course Gawker, was compromised yesterday. The hacker group Gnosis posted a torrent containing a full dump of Gawker's source code as well as the entire user database consisting of ~1.3 million usernames, email addresses, and DES-based crypt(3) password hashes. While this dump is not nearly on the scale of the RockYou incident, it is certainly a serious exposure.
As a two-factor authentication provider, situations like the Gawker hack are key illustrations of why strong auth is a necessity. While users may not care about an attacker having access to their Gawker account, the danger of password sharing across websites and services poses a much bigger threat. Services that lack a strong secondary authentication and host users who are sharing passwords (which, let's be honest, most users probably do) face the greatest risk. Attackers will undoubtedly be testing the cracked passwords against both personal and corporate services such as email accounts, online banking sites, VPN remote access logins.
As it's not very often that we get a glimpse into the human psychology of password selection, let's dig deeper into the password dump!
John the Ripper
The defacto tool for cracking password hashes is John the Ripper (also known as JtR), written by Solar Designer. If possible, I'd highly recommend using the available patches for JtR, allowing the parallelization of the cracking process using OpenMP. I ran our cracking session on a 8-core Xeon box:
b0x ~ # uname -a Linux b0x 2.6.36-gentoo Sat Dec 4 20:11:03 EST 2010 x86_64 Intel(R) Xeon(R) CPU X5460 @ 3.16GHz GenuineIntel GNU/Linux
This puppy can crank out a decent number of cracks/second:
b0x ~ # john -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 20465K c/s real, 2562K c/s virtual Only one salt: 16003K c/s real, 1999K c/s virtual
I'd also recommend using a comprehensive wordlist to assist in the cracking process. I compiled a wordlist of ~1.9M entries from a number of different sources (including datasets from Openwall and Skull Security):
b0x ~ # wc -l wordlist.txt 18966068 wordlist.txt
Cracking Results
Before getting started, we filtered the ~1.3M entires in the database dump down to ~748k crackable password hashes:
Loaded 748039 password hashes with 3844 different salts (Traditional DES [128/128 BS SSE2-16])
In just under an hour of cracking on a single 8-core machine, we had successfully cracked 190k passwords. Allowing JtR to continue to run yielded an additional 200k cracked passwords, resulting in a total of almost 400k cracked passwords representing over 50% of the total password hashes:
b0x ~ # wc -l output.txt 399380 output.txt
Password Analysis
As with any password dump, one of the most interesting outcomes is the most popular/common passwords chosen by users. The top 25 most common passwords from our cracking results were:
2516 123456 2188 password 1205 12345678 696 qwerty 498 abc123 459 12345 441 monkey 413 111111 385 consumer 376 letmein 351 1234 318 dragon 307 trustno1 303 baseball 302 gizmodo 300 whatever 297 superman 276 1234567 266 sunshine 266 iloveyou 262 fuckyou 256 starwars 255 shadow 241 princess 234 cheese
The vast majority (99.45%) of the cracked passwords were alphanumeric and did not contain any special characters or symbols:
b0x ~ # cat pws.txt | egrep "^[a-zA-Z0-9]+$" | wc -l 397198
Of the passwords that were alphanumeric, about 61% were composed of strictly lowercase alphabetic characters, 9% were strictly numeric, less than 1% were strictly uppercase alphabetic characters, and the rest were mixed alphanumeric:
b0x ~ # cat pws.txt | egrep "^[a-z]+$" | wc -l 241208b0x ~ # cat pws.txt | egrep "^[0-9]+$" | wc -l 34703
b0x ~ # cat pws.txt | egrep "^[A-Z]+$" | wc -l 2868
One interesting property of the dataset is that there are a large number of unique passwords. There are a total of 202k unique passwords in the set of 400k cracked passwords. Of those unique passwords, approximately 155k (77%) are used by only a single user (eg. they've selected a password that no one else has). Similarly, 24k (12%) are passwords that are shared by only two users and 8k (4%) are shared by only three users. The occurrence of unique passwords observed here will surely decrease as the more passwords are cracked by JtR and the odds of collisions between users increases.
Domain Analysis
Besides the cracked passwords, we can also take a look at the email addresses contained in the database dump. The top 25 most common email domains are as follows:
173942 gmail.com 101959 yahoo.com 72847 hotmail.com 20551 aol.com 8106 comcast.net 6078 msn.com 5835 mac.com 4341 sbcglobal.net 3397 hotmail.co.uk 2531 verizon.net 2204 cox.net 2174 live.com 2113 yahoo.co.uk 2050 earthlink.net 1939 yahoo.co.in 1851 aim.com 1626 mail.ru 1619 bellsouth.net 1490 googlemail.com 1045 charter.net 995 optonline.net 990 yahoo.ca 892 me.com 888 rediffmail.com 806 att.net
Perhaps more interesting are some of the accounts that belong to government officials with domains ending in .gov. The following is some of the .gov accounts contained in the Gawker dump and the number of occurrences of each domain:
15 nasa.gov 9 va.gov 9 mail.house.gov 7 usps.gov 7 irs.gov 7 cdc.gov 6 ssa.gov 6 dhs.gov 5 michigan.gov 5 mail.nih.gov 4 usdoj.gov 4 panynj.gov 4 edd.ca.gov 4 boe.ca.gov 4 bls.gov 3 ky.gov 3 fnal.gov 3 ed.gov 3 dol.gov 3 dc.gov 3 cabq.gov 2 wisconsin.gov 2 whitehouse.gov 2 utah.gov 2 state.gov
Wrap-Up
We'll be continuing to update this post with more statistics and analysis as the results come in!
If you're an end user and think you may have registered an account with Gawker or one of its affiliated sites, be sure to change your passwords on any sites that may have the same or similar password as your Gawker account. In general, incidents like these are a good time to revisit your existing password schemes and ensure you are protecting your online accounts adequately.
If you're an administrator who runs a website or service where your users are logging in with only a password, now is the time to beef up your security with some strong two-factor authentication. If your users happen to be sharing a password contained in the Gawker dump, their accounts could be at risk. Feel free to drop us a line at Duo to learn how easy it is to integrate two-factor authentication into your website, server, or remote access VPN!