Gmail Now Rejects Emails With Misleading Combinations of Unicode Characters 79
An anonymous reader writes: Google today announced it is implementing a new effort to thwart spammers and scammers: the open standard known as Unicode Consortium's "Highly Restricted" specification. In short, Gmail now rejects emails from domains that use what the Unicode community has identified as potentially misleading combinations of letters. The news today follows Google's announcement last week that Gmail has gained support for accented and non-Latin characters. The company is clearly okay with international domains, as long as they aren't abused to trick its users.
FÜÇK ÿèàh (Score:5, Funny)
...
Re: (Score:2)
Could've sworn Slashdot had zero support for unicode characters.
(I appear to be unable to paste in a 'Trademark' symbol. What is this magic, AC?!)
Re: (Score:1)
iso-8859-1
Slashdot supports the first 256 unicode characters (except some of C0 and C1)
Comment removed (Score:5, Interesting)
Re:Good that this applies to from: and not the bod (Score:4, Funny)
If this spells death to those ridiculous smilies then it's ok with me.
Re: (Score:2)
They're reason enough for me to almost believe whoever designed ASCII was a genius.
Re:Good that this applies to from: and not the bod (Score:4, Interesting)
I routinely substitute Cyrillic letters for Latin on Disqus and other forums to get around their filters (which block for more than mere "profanity").
Slashdot does not allow non-ASCII characters — although it does not attempt to screen out profanity either.
Re: (Score:3)
...unless they're in code page 1252.
Re: (Score:2)
Slashdot does not allow non-ASCII characters...
Óh réällý?
Re: (Score:2)
That's pretty cool. I guess, the entire ISO-8859-15 is Ok? But not Cyrillics :-( Or else, you would've seen some Ukrainian-Russian conflict right here...
Re: (Score:2)
Or pe
Re: (Score:2)
Heuristics could pretty easily determine if someone communicate only in English in their e-mails, and as such, any legitimate e-mails that contain large amounts of non-English words or characters should be viewed with greater suspicion. For those that routinely communicate in more than one language and use non-ascii sets, the heuristic should be able to account for that fact.
These sorts of rules are always fuzzy by nature. Obviously, whether an e-mail is determined to be legitimate or not is due to many d
Re: (Score:2)
Good that this applies to from: and not the body of the e-mail.
That's not at all good and filtering the body exactly what I want.
Spammers already spoof the from: domain and then link you out to exactly the type of domain that Gmail is now filtering.
There's no reason Gmail can't flag [body] links to domains that use mixed character sets.
Homoglyph protection at last, sort of. (Score:2)
OK, good. Now if ICANN applied that tougher standard to domain name registrars, we'd make progress. But no, ICANN still allows registrars to register domain names without forcing them to comply with the most restrictive profile.
all of them then? (Score:2)
This looks like fun, I probably wouldn't catch that bank example and family certainly wouldn't. Looks like pretty much any word could substitute one letter.
No idea exactly what these "combinations" are. The example used one letter substitution. Using this example and the little display of new letters there would appear to be billions of potentially misleading combinations.
Re: (Score:3)
The "restrictive profile" that Google is using for the filtering is defined in Unicode as any combination of the Latin character set with another set or sets, with the exception of very specific combinations (selected legitimate combinations of Asian sets that contain radically different letter forms and thus are unlikely to cause confusion).
Re: (Score:2)
I'd like to see the precise rules (but too lazy to RTFA now). There are many non-english words that can be highly confusing. In french "telephone" is "téléphone" which could be though as a way to trick users. Also turkish have a dotless i, I would not be surprised it appears in words with similar spelling in english.
Re: (Score:1)
ITYM téléphone.
Sounds bad (Score:2, Insightful)
If I start a business with a unicode domain, and if later a scammer registers an ascii domain that is similar looking, then Gmail will blackhole my business, not the scammer, because I'm the one using unicode.
Re: (Score:2)
Probably a bad idea - what exactly is the legitimate point of having a cool web address if *everyone* will *always* mistype it and go to the original site instead?
Re: (Score:2)
The Russian word "radio" should be in (the specific Russian Cyrillic subset of) Unicode/UTF, while the English word "radio" should be in Unicode/ASCII.
Mixing and matching character sets in URLs or email address typically indicates "intent to confuse". Within text, it usually just confuses translators and spell checkers.
whack-a-mole 3.0 (Score:2, Insightful)
And the latest round of whack-a-mole begins...
This is going to do... (Score:2)
Re: (Score:2)
You kidding? Thanks to allowing these new email addresses, I have an entirely new category of auto-deletable spam. These won't "confuse" me because I'll never see them. Win/Win!
Go ahead, spammers, get cute. Just makes my life thaaat much easier.
Re: (Score:2)
It looks after other interested parties looking for expected keywords.
Re: (Score:2)
Re: (Score:1)
ÂÂâ¥
Ã¥â(TM)â(TM)âs--ðYfâSâ±âââOEâSoeâ...â'ââoe
Finally! (Score:2)
Damn, now i see it's just domains, i tought they killed all my german and french spammers.
Re: (Score:1)
Why are we still blocking spam ? (Score:3, Interesting)
90% of the population would be better off with a white listed email account, i.e. if you are not on their list the email does not get through. END OF STORY.
I would seem to be more efficient to filter mail IN than to filter it out. Most people would have 20 or so people they actually want mail from.
I have mail accounts strictly for family and my local email rules enforce this
I have mail accounts for "sign up" sessions for competitions that I know are going to get spammed to hell
I have mail account for work, another for my business , etc etc all with differing contacts.
White listing would pretty much kill off spam, if there is zero chance of it getting though, what is the point. Currently spammers get through because of out dated spam lists, new tricks to get around baynesian filters, etc etc etc. White lists would negate the need.
Google, if you set up a white listed email system, my friends and family will happily sign up.
... because we make new friends (Score:1)
Seriously,
most filters are now "very good". And, I make new acquaintenances, connections and friends. They have new email addresses that aren't in the whitelist. But, the filters pretty much just work.
Re: (Score:1)
One way you could make whitelists work is to have a "secret handshake", a word that you require in the subject of mail from addresses that aren't whitelisted yet. You would regularly change that word and give it to new acquaintances along with your email address.
The problem with the whitelist approach is something else: A lot of spam already pretends to be from someone you know. Spammers don't just collect individual email addresses anymore. They collect email address pairs: Who knows who.
Re: (Score:2)
E-mail authentication seems like a better solution than whitelisting in the long term. Whitelisting can kill off spam, but that's sort of like saying you can fix a broken arm by amputation. It's technically true, but removes a lot of useful functionality.
The big problem with e-mail spam is that the e-mail sender can be trivially forged. If we employed ubiquitous authentication systems that proved a specific domain was used, and blocked non-authenticated users (or at the very least, flag them with a big w
Re: (Score:2)
They already happily sign up. Gmail is the largest email provider in the world.
BTW the Gmail spam filter, like any good one, does have per-user whitelists. If you reply to mail or mark mail from a sender as not spam, the filter will leave mail from those senders alone (modulo caveats like the sender properly authenticating). Thus the filter spends almost all of its effort on email from senders you haven't interacte
Re: (Score:2)
A whitelist would break site sign-up and password reset emails. You could never whitelist every legit site as hundreds are launched every day. Users will never figure out how to add sites to their whitelists before signing up, and can barely cope with such emails ending up in their spam folders.
Having said that, gmail filters 99.9% of spam for me, and I can tolerate hitting delete for the 1 in 1000 that gets though.
Why are we still blocking spam ? (Score:2)
Google, if you set up a white listed email system, my friends and family will happily sign up.
They did, it's called Google+. Nobody seems to like it.
Don't use Unicode for network stuff (Score:1)
If you use Unicode for domains, addresses, certificates and whatnot you are begging for an endless cascade of support problems and glitches, not to mention security vulnerabilities. Let others exercise all these broken codes paths for you while you avoid the fail. Eventually, after most of the broken code gets cycled out of use, many years from now, you may then safely allow this stuff into real systems.
Unicode breaks all sorts of stuff in subtle and unfixed ways. A fine example from a widely used Micros
Re: (Score:2)
But why would anyone waste resources properly fixing a bug that doesn't affect anyone? The only way these things will get fixed properly, is if they start causing a lot of problems. And the only way they'll cause problems is if people start using them.
Meanwhile, why should most of the world's population have to deal with an internet incapable of handling addresses in their language? How would you like it if you woke up tomorrow to discover that all web addresses could only be written in Arabic? The Web m
Slippery slope (Score:2)
As much as I can appreciate the intent and the fact that this will solve 99.999% of people's problems for this type of spamming and create 00.0000000001% of problems for legitimate users, it still feels a little like Google is trying to be the thought police on this one; you know free speech and all.
More generally (Score:2)
IME, Gmail is rejecting a lot of legitimate mail nowadays.
Their filters used to be good, but they completely fucked it up lately.
Al (Score:2)
As an interesting background fact, I heard that Google has an advanced Al doing all this stuff completely autonomously.
His real name is Albert, by the way.
GMail doesn't take everything... (Score:1)
GMail doesn't accept all comers. Get too many complaints and they'll reject you... this is just new ideas to add to that filter. There's a list of words you can't say on GMail without it getting read, they don't publish those lists because that'll never be said to them.
Unicode the standard .. (Score:1)
Re: (Score:1)
They're called "code points" actually. A particular code point can be encoded in different ways (for example, the encoding of 'ß' in UTF-8 is different from the encoding in UTF-16, but they both represent the same code point.) Yeah, something like that ought to be used for network addresses...
Re: (Score:1)
Have one set of 'code points' for every language on the planet and remove the duplicates. That way they wouldn't have needed to hack unicode in order to allow for the following:
'the code point U+006E (the
They are right - Uses of unicode ambiguous letters (Score:2)
They are right doing so. There are letters in different alphabets whose typing is very very similar -- or in fact they are written exactly the same, depending on the font used.
This can be exploited for interesting uses. For example, "E" and "ÃZ"** are respectively the latin "e" and the greek "epsilon" vowels, but they are indistinguishable in caps, at least in Arial font. The second one is the UTF 395 code. My name has an "E" on it, and for my email signature I spell my name using the traditional latin
Sounds rather ethnocentric (Score:3)
It allows combinations of Latin + Han + Hiragana + Katakana; Latin + Han + Bopomofo; or Latin + Han + Hangul.
There are a lot of equally safe combinations - what about Latin + Devanagari + Tamil? There would be no look-alike characters and it would allow a lot of people to put their name in multiple scripts that are likely to be meaningful to certain audiences (e.g. someone from Tamil Nadu sending an email to people throughout India and internationally). I'm sure that there are many other combinations that wouldn't have "look alike" issues but which would be useful
Insufficient (Score:2)