Gravatars Can Leak Users' Email Addresses 170
Posted
by
kdawson
from the chatty-little-things dept.
from the chatty-little-things dept.
abell writes "Gravatar offers a global avatar service, using an MD5 hash of the user's email as avatar ID. This piece of information in some cases is enough to retrieve the original email address. Testing a simple attack on stackoverflow.com, I was able to determine the email addresses of more than 10% of the site's users."
Re:So let's change the algorithm. (Score:4, Insightful)
No it's not related to MD5 itself. period.
No need (Score:3, Insightful)
It would have been trivial for them to just add a secret salt string to the email before hashing, and that would have solved most of the problem. It is possible that they wanted to be "nice", in that in the case they go out of business, anyone can regenerate the ID's without them. But, as this guy has shown, that's not a great idea.
Possible workaround (Score:3, Insightful)
Can anyone tell me if the "you can add extra stuff after a +" that GMail lets you do is standard in the RFC for all email addresses? If it is, to "fix" this, if you should sign up to Gravatar with an email address using a random string after an added "+" the brute force search on hashes will be much, much harder. (Assuming that your email provider is implementing that part of the standard.)
Re:So let's change the algorithm. (Score:5, Insightful)
I [benramsey.com] disagree. [gromweb.com]
Granted, those are basically very unsophisticated databases that just store lookup values, but it's relatively easy to bruteforce an MD5 hash down into one of the possible original strings (obviously with any algorithm that has a fixed output size with limitless inputs like MD5 there are infinite inputs that will hash down to a single md5sum, but when you're trying to get a valid email address out of a hash it's easy to pick the right one). Couple that with the fact that in this situation, you know that the entire string is lowercased and probably 60% of the gravatar emails (probably more like 90% actually) are going to come from one of four or five domains... reversal becomes quite easy. If you're bored, you could spin up a few Amazon EC2 or Rackspace Cloud Server instances to dump out some large tables. One each for gmail, yahoo, msn, aol, whatever else; it'd be a very simple script to make. You could probably cover every alphanumeric email address under 12 characters overnight, at a cost of about a dollar and ten minutes of scripting.
The thing to realize here is that gravatar doesn't md5 emails to hide them from people who want to obscure their identity, just to obscure them from spambots. So it's really a non-issue. If you're that concerned, leave your blog comments with a fake email address.
Why is this a problem? (Score:2, Insightful)
use email+whatever@domain.com (Score:3, Insightful)
Use your email address with "+randomsequence"@
Randomsequence will have to be consistent between the user and the sites they want the gravatar to work at, but it will generate an MD5 hash different than their actual address; yet if the site sends email to the user with it the user will receive it.
Not A Bug (Score:3, Insightful)
Email addresses are usernames. They are not secret information. If somebody can be bothered enough to find your email address through brute-forcing the MD5 hash of it; you've got bigger problems.
Far more than "10% of stackoverflow.com's users" can have their email addresses GUESSED far faster. Likely your email address is also FAR easier to establish through a simple Google search on your pseudonyms.
If you for some odd reason want your email address to be secret; for the same name as wanting a secret pseudonym or using a false name when signing up; register a fake email address instead (and set it up for forwarding). You're giving your email address in clear text to the site's owner and all the internet hops inbetween him and you ANYWAY.
It's important to learn to distinguish between what is a secret and what is not; and if you want to make things secret, at what level you should put your trust.
Re:So let's change the algorithm. (Score:4, Insightful)
Doubt it. there's 26 letters and 10 digits, in addition to that . is very common in email-adresses. Thus you get 37 possibilities for each position. 37 to the 12th power is 6582952005840035281 hashes to run, and even if you do 10^9 Hz (i.e. one giga-hash-a-second, which would require on the order of a few hundred cores), you'd still need 208 years to do that many hashes -- then you need to look up each of them in gravatar, and analyze the result for a hit-or-miss.
"every alphanumeric email-address under 12 characters" is infact much too large a keyspace to reasonably cover overnight with a "very simple script".
It's not a large enough keyspace to be cryptographically secure, but it's large enough to not be trivially exhaustible.
Re:So let's change the algorithm. (Score:3, Insightful)
That's assuming email addresses are random sequences of letters, digits and dots.
If you're a spammer and don't mind missing the email of mr. q9x7.3f.1zzp@hotmail.com, a phone book would probably provide an effective dictionary for narrowing that keyspace considerably
Re:So let's change the algorithm. (Score:3, Insightful)
Simple way to protect yourself (Score:2, Insightful)
Some email providers have a simple way of giving you a throw away id. E.g example+slashdotnospam@gmail.com is sent to example@gmail.com.
Say my name is Lary Page. If my email id is lary.page@gmail.com, I can still protect myself so that you will never get my email id.
MD5 (lary.page@gmail.com) = "1b8dbe98e2b1138fd3ba34e26fc55107".
So I provide my email id as lary.page+1b8dbe98e2b1138fd3ba34e26fc55107@gmail.com. If I gave you the md5 of that id, you'll find it hard to get back to lary.page@gmail.com.
Try, the MD5 hash of the above email id is 803efbc80ead933f28d0704d43d1f63b.
Re:So let's change the algorithm. (Score:3, Insightful)
Or, use john -incremental -stdout. This will test reasonable names first, while not being restricted to RL names only.
Re:So let's change the algorithm. (Score:2, Insightful)
Correct: the attack here is:
Take big Site with thousands of user, many using thier (sorta) "real names".
Permute these names with some known big email provider hostnames.
Send them all some spam.
It does not really matter if 90% of those emailadresses are incorrect, the rest will hit.
I would not do the MD5 validation thing, why should I?
Re:So let's change the algorithm. (Score:2, Insightful)