Gmail Recognizes Addresses Containing Non-Latin Characters 149
An anonymous reader writes In response to the creation in 2012 by the Internet Engineering Task Force (IETF) of "a new email standard that supports addresses incorporating non-Latin and accented Latin characters", Google has now made it possible for its Gmail users to "send emails to, and receive emails from, people who have these characters in their email addresses." Their goal is to eventually allow its users to create Gmail addresses utilizing these characters.
Next wave of phishing? (Score:5, Funny)
So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?
Great.
Metal umlaut! (Score:5, Funny)
Finally I can get motörhead@gmail.com!
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Posts like these are just random pot shots looking for a response. Chances are he doesn't even believe what he says, rather he just wants to cause somebody to come out speak in a righteous manner. Mission accomplished, I think?
Re: (Score:2)
I don't want to sound racist, but I've never heard of Jewish suicide bombers, Jewish plane hijackings, etc.
Re: (Score:3)
Finally I can get motörhead@gmail.com!
This is exactly what is going to happen, and I don't mean that in a good way. I already see it in other chat environments, like Second Life, where the full power of Unicode allows any and all characters in usernames. It's bad enough that they substitute Latin letters with superficially similar characters from other languages so we end up with names like ££¥ and , but miles of decorative symbols drawn from Braille and mathematics... and don't even get me started about the entire upside-down
Re: (Score:3)
So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?
It's not about bìllgàtes@outlook.com, but billgates@óutlook.com. It's the domain that is going to cause problems, not the user!
Re: (Score:3)
Re:Next wave of phishing? (Score:5, Insightful)
Re: (Score:3)
Worse; they will come from root@gmail.com, administrator@gmail.com or BillGates@gmail.com, only those o's and a's will be Cyrillic or something like that (can't do it here; Slashdot doesn't display them).
When you mix Latin htmail with a Cyrillic o to get hotmail, Google and all email programs should refuse that address immediately, mark it as spam, make the address red with a warning sign etc. Mixing character sets should not be allowed in a domain or in a username. So the username may be all Cyrillic or Greek, the domain name may be all Chinese or Latin, and these may mix, but no mixes in the domain name or username itself.
Re: (Score:3)
I think that's the way to go - only allow characters from a single unicode script [unicode.org] in the username and in the domain name. The domain name part is currently handled by registras so that may not need any additional rules.
However this really should be part of the RFC, or else anyone banning mixed names would be "non compliant". If the RCF does not specify this then the best that gmail (or any other system could do) would be to prevent people registering mixed names themselves and giving a warning (and maybe c
Re: (Score:2)
However this really should be part of the RFC, or else anyone banning mixed names would be "non compliant". If the RCF does not specify this then the best that gmail (or any other system could do) would be to prevent people registering mixed names themselves and giving a warning (and maybe colour characters) if email is recieived from an address with mixed scripts.
Gmail, Microsoft and Yahoo and others like gmx, universities, big companies should simply refuse these mails. Microsoft should make Exchange so that this is the default way for handling these mails. The same goes for qmail, postfix etc. But that won't be enough.
As another commenter said, you can make up latin looking names using cyrillic characters, and we won't notice. How do you catch that? I guess this will the the time that PGP will prove it's value.
Re: (Score:1)
I agree. The real solution is hardened authentication getting baked right into email. I'm all for UTF8 domain names and email user names, however if the email protocol suite is going to be expanded to allow for more features, then I think security should be top of that list.
Sure, for a while, domains that span multiple character sets such as hotmail.com with a Cyrillic o could be spam flagged, however what happens when (not, if, but when) legitimate domains with multiple character sets start appearing? What
Re: (Score:2)
How does confirming the domain's identity automatically solve this problem?
If someone from the gxail.com domain sends me email (let's assume here the 'x' is some weird Cyrillic character that looks just like an 'm'), any automated confirmation of the domain's validity would not do some sort of eyeball check "Oh, that looks like gmail.com, let's confirm if it is, oops it isn't..." but rather an automated "did that email come from gxail.com? Yup, sure did."
Even if you popped up a notice that said "hey, I don'
Re: Next wave of phishing? (Score:2)
Re: (Score:2)
There are languages, such as the Scandinavian languages, which are "mostly latin". This means we have the full A-Z as used in English (although C,Q,W,X,Z are never used) PLUS some extra letters "Æ/Ø/Å" (dunno if this displays correctly here). There are also domains which uses these letters like "lånekassen.no", which is the state agency handling student loans. (They are also available at the alternative address "laanekassen.no".)
Thus a "hard and fast" rule disallowing domain names with m
Re: (Score:2, Insightful)
"Rich company problems". XD
Bluntly, this won't affect most Americans for the same reason spam from
Re: (Score:2)
It doesn't take your spam filter long to figure out "if the address contains character-X, 100% chance of spam"...
Yes, yes it does. It took Google years to stop sending me spam in foreign languages that I couldn't read anyway.
Re: (Score:2)
Bluntly, this won't affect most Americans for the same reason spam from .il, ru, or .cn doesn't matter - Because we simply don't get any legitimate email from those domains. It doesn't take your spam filter long to figure out "if the address contains character-X, 100% chance of spam"... And that assumes your mail server doesn't outright block those as a hardcoded rule (in a former life I had to babysit the Exchange server for a small business; if you came from anywhere not in one of the big-six TLDs, auto-junk).
It must be wonderful to run a mom and pop operation where none of your customers, suppliers or anyone else has an international mail address. And it certainly won't work for any other country but the US, a canadian business that doesn't accept .ca mail? Don't think so. And if you're operating an ISP, university or whatever some of your users will be foreigners in real contact with the rest of the world. Neat that you can wave the WORKS4ME flag, it's still a problem for a lot of other people.
Re: (Score:2)
Yellow flag: Failure to extrapolate.
Canadians can't block
Re: (Score:2)
Is it deeply wrong that I originally read "WORKS4ME" as 'worksame' due to excessive IRC leetspeak exposure in my youth... and yet it still kind of made sense?
Re: (Score:2)
Most of the Fren
Re: (Score:2)
No worse the will come from
updates@?tfosorcim?.com which will be displayed like:
updates@microsoft.com
Just imagine the ? marks being the left-right reverse character.
Who cares? (Score:1)
Re: (Score:2)
I'm glad everyone pointed this out because it's the first concern that came to mind.
This makes me happy because;
1) Great minds think alike.
2) Google will instantly realize the error of their ways with weird characters and change this policy.
3) I'm not alone. This could very well be my posse.
4) Slashdot has it handled, the world is safer.
5) I don't have enough things I'm happy about on a regular basis to make a top ten list -- and I've learned to be OK with that.
Re: (Score:3)
Re: (Score:2)
That kind of phishing already exists, even more sophisticated: a bug that a lot of software contains is not distinguishing between same looking characters in different alphabets. E.g. you can sign up on many forum/bbs platforms as Administrator if your leading A is cyrillic A instead of latin A. Both look the same but have different html entity codes and are different unicode chracatres, which is true for most vowels and many consonants (e.g. cyrillic B and latin B, C and C, E and E...).
What software (or library) is programmed to recognize that two chars look the same and therefore allows them based on the appearance rather than their encoding?
Re: (Score:2)
What software (or library) is programmed to recognize that two chars look the same and therefore allows them based on the appearance rather than their encoding?
I am not aware of any. My "solution" to this problem is to allow only unambiguous characters to be used. I really mostly have to deal with only about 60 characters in total which I allow people use for unique fields, so it's manageable.
Re:Next wave of phishing? (Score:4, Funny)
Re: (Score:2)
If you're careful you can get pretty much all the major mailbox providers accepting email from any address, including many major ISPs, and at least Gmail (and Apps) and Hotmail (and Office 365). Again if you are careful, they will go into the inbox. SMTP was great back in the days when there were 20 networked computers. The continued lack of strict adhesion to the common authentication standards makes me want to both laugh and cry simultaneously.
I send email "from" bill.gates@microsoft.com from the command
Re: (Score:2)
Re: (Score:2)
So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?
Great.
I've heard this argument before.
Spoofing a return/sending mail address is incredibly simple. In fact, I do it every day as part of my legitimate non-hacker job. I could send you an email from Bill.Gates@microsoft.com if I wanted to. (though your mail server might have some security issues with that) Do you thing that when you email Support@somecompany.com there are a team of people that all log into that same mailbox? Or when they reply to you, they're really using that mailbox?
So there's no reason to use
Re: (Score:2)
Lôvè yôûr ïdèàl Éxample.
Well, I'm impressed. (Score:3, Insightful)
Google updated their regular expression. Good for them.
Re:Well, I'm impressed. (Score:5, Informative)
Re: (Score:2)
I would imagine that there they implemented RFC6532 [ietf.org], which involves a lot more than changing a regular expression
So we get sometimes unreadable mails because the encoding of the content is unknown. Then some mails will be rejected because of an encoding problem in the address itself. At least in the first case the mail was received and we had enough time to fix the problem.
Sigh (Score:5, Insightful)
From what I can tell, a mail server has two options when receiving this mail:
Accept it.
Reject it.
The default, with software that doesn't understand this RFC yet (which seems to be... just about everything), is to reject. So trying to use this as an email is not only going to mess up every form you try to fill in online (because they won't see it as an email address either), but quite likely just gets you bouncebacks from everyone you email.
What was needed was surely a system similar to the IDN system for internationalisation, which would allow those with ASCII-only DNS servers etc. to STILL WORK, by converting the Unicode characters to ASCII subsets and then sending the email as normal, through the entire PLANET-worth of working email servers out there that could accept it.
Having a content negotiation option at the SMTP level, that mail servers have to implement and handle specifically, is just ridiculous, and even with GMail's kickstart it could be decades before you can guarantee that your UTF-8 email address will work across the Internet and even then there'll be some old legacy server that will just bounce all your email BECAUSE of that character set in your address. And it will be perfectly legitimate to do so.
However, as others have pointed out, if this goes through, it will be nigh-on impossible to spot phished/faked email addresses, just like it is with IDN links unless you know how to find the original ASCII-encoding of them.
Re: (Score:2)
Re:Sigh (Score:4, Funny)
Accept it.
Reject it.
Temporary failure, try again later.
User not local, will forward to <somewhere>.
Syntax error, command unrecognised.
Wait, I'll come in again...
Re: (Score:2)
Good luck (Score:5, Interesting)
Now imagine this with non-latin characters (or just non-ASCII characters)... If you only write to people also using GMail, it might work.
Re: (Score:2)
(Fortunately I also had an alias address which didn't have the apostrophe and was about two dozen characters shorter.)
.
Re: (Score:2)
"+" or plenty of other special characters. Stuff like quotes can even be valid if used properly, while we still have some website that won't even accept a dash/underscore.
Re: (Score:2)
"+" or plenty of other special characters. Stuff like quotes can even be valid if used properly, while we still have some website that won't even accept a dash/underscore.
I had to wait nearly 10 years for my ".name" domain to be accepted by most websites (say, 99.5%).
For "+" or other funny characters, my estimate is that you will need at least 10 years starting from now.
I would not hold my breath.
Re: (Score:2)
"+"-characters keep out an amazing amount of spam. Please do not teach the world to recognize them.
Re: (Score:3)
It's not stupid website. It's just stupid 4 char tld that shouldn't exist according to the standards.
Please show me in RFC 1035 [ietf.org] where you see this 3 letter limitation.
By the way, the ".arpa" pseudo-domain has always existed.
There are myriad of validators out there that will reject it
No, most validators correctly implement the standard, only a handfew are incorrect.
they worked well for decades.
Something on the web that worked well for decades has necessarily been enhanced at some point...
Anti-phishing measures? (Score:2)
Homoglyph attack generator (Score:2)
http://www.irongeek.com/homogl... [irongeek.com]
Ah, great! (Score:2)
Maybe now my e-mails to Tutankhamun [google.com] will quit bouncing.
Re: (Score:2)
Internet Engineering Task Force Realised... (Score:1)
That "signed char" was a bad coding choice back in the day.
They better filter out non-printing characters (Score:2)
They better be filtering out the non-printing characters that do fun stuff like reverse the text direction, overstrike, etc. How long until people start registering gmail addresses with Zalgo [google.com] text?
And how long until someone registers pile of poo [google.com] @gmail.com?
What should the rest of us do? (Score:3)
For the domain, I'd hope that the MTA (Postfix in my case) would allow UTF-8 and convert to punycode as required, but I'm not sure it does. So currently I don't allow for that. I _could_ convert to punycode myself, but I don't.
And as for the local-part, I'm fairly certain Postfix doesn't allow for UTF-8 at present.... at least, not the Postfix version supported on Debian 7.
So I'm just wondering what everyone else is doing? Should I improve my support, or should I just wait for support to be added to my MTA before I bother?
Re: (Score:3)
I don't see much point getting anal about email validation, especially since it's fairly hard problem. It's been a while since I've written one but something along the lines of something@something.something is usually enough and let the mail servers sort out the rest.
Re: (Score:1)
So as a result, I thought I should
Re: (Score:1)
youre a moron for having a 'validation function' in the first place. if the user screws their address up its their fault, not yours. and so long as you don't pass it to your shell like an idiot, you shouldn't care either. let the damn thing bounce.
They should all be duel email addresses (Score:2)
Probably set up so that if the Russian gets bounced, it tries again with the latin alphabet.
Also, the signature of all emails sent from this should have a copy of the latin email address, so that people that don't have the Russian capability can reply.
Re: (Score:2)
And when there is no "one-to-one", "official" transliteration from one script to the other.
My wife's Russian passport transliterated her name into Latin characters in two different ways. This cause non-trivial issues when trying to get her citizenship sorted out.
So, you add TWO Latin 2equivalents to the one Russian one? Doesn't work, does it?
Re: (Score:2)
Sorry, got a sticky shift key ; "equivalents"
Slashdot's time? (Score:2)
Now that Google has implemented 2012 i18n technology, maybe vaunted technology site Slashdot can catch up to 1998 [ietf.org] and implement UTF-8 properly?
Nah.
address standards are a nightmare (Score:3)
This is a valid email address:
dude"".dude@[192.168.1.1]
so is this:
a@com
also valid:
test+test=gmail.com@test.com
none of those will work in MS Outlook or exchange, none of them will work with jquery validation plug-in, some close to that will work with java mail API. Most funky but standards compliant email addresses will pass Apache commons validation.
In the end, I went with a 2 part validation: 1) Apache Commons Validation (mostly RFC correct), then a second pass on Javax.mail because if I can't send email to it, then what is the point of having it? We still get addresses that pass both validations, and bounce at some SMTP relay due to "invalid address format."
I am sure internationalization will make all this better.
Re: (Score:2)
+a."b(c)d,e:f;gh>i[j\k]l".a@gmail.com
not only is yourname+a."b(c)d,e:f;gh>i[j\k]l".a@gmail.com a valid email address... but you will actually get email addressed that way. It will fail most email address validation that I have found.
Re: (Score:3)
just send a mail. if it fails, discard the pending registration or whatever, possibly via "not confirmed" timeout some days later.
Re: (Score:2)
WTF? (Score:2)
Isn't this something, which was introduced years ago?
If only they recognized the difference... (Score:1)
firstname.lastname@gmail.com
firstnamelastname@gmail.com
I keep getting email at the former addressed to the latter. Anyone else encounter this oddity with Gmail?
Re: (Score:3)
In case you are asking this for real, this is a documented gmail feature.
https://support.google.com/mail/answer/10313?hl=en
You can actually log in with any variant of your username that includes 1 or more periods added in arbitrary locations.
Re:Dammit this is a terrible idea (Score:5, Insightful)
Re: (Score:1)
Also, thin space, zero width space, zero width non joiner, combiners that combine in such a way that they essentially do nothing. There are a lot of possibilities and if any of them are missed it will be a disaster.
I forsee a lot of pain comming from this.
Re: (Score:2)
Usually with such things it's better to whitelist than blacklist. As you add characters to the whitelist you determine what character they should be equivilent to for conflict-management purposes.
Out of interest does anyone know if people actually use internationalised domain names as their main domains or if they stick to conventional names that work with all software and which everyone can type.
Re: (Score:2)
Most email names could be spoofed using Cyrillic characters which look exactly the same as latin ones [wikipedia.org]. How could you tell if the "c" in chrisq@gmal.com really was a latin 'c' or a cyrillic Es [wikipedia.org]?
gmail.ru (or its equivalent) will find a way to support cyrillic
gmail.qc.ca and gmail.fr will find ways to support French accents (otherwise, Google will get sued or blocked by Quebec or France)
These details will get worked out at the local level. It will take time, but they'll get there eventually.
Re: (Score:2)
Most email names could be spoofed using Cyrillic characters which look exactly the same as latin ones [wikipedia.org]. How could you tell if the "c" in chrisq@gmal.com really was a latin 'c' or a cyrillic Es [wikipedia.org]?
gmail.ru (or its equivalent) will find a way to support cyrillic gmail.qc.ca and gmail.fr will find ways to support French accents (otherwise, Google will get sued or blocked by Quebec or France) These details will get worked out at the local level. It will take time, but they'll get there eventually.
I don't think that would work in protecting users against attacks unless you said that only users if gmail.ru could receive emails from users with Cyrillic characters in the name, etc.
Re: (Score:2)
I wouldn't be surprised to see l'Office québécois de la langue française do something like that. I speak french and I still think they're assholes who are over-reaching their boundaries.
Re: (Score:3)
They might mark conspicuous characters, like when multiple character sets are combined in a single domainname.
Re: (Score:2)
Re: (Score:1)
Yes, warning users works really well. Especially after decades of windows training users to click accept on alerts without reading them.
Re: (Score:2)
Because no language ever makes use of characters from other languages, I mean surely Latin capital letter R is only used by latin speakers. Seriously you should get a better understanding of what you are saying before you make bold claims about how 'easy' something is going to be, could it be done, maybe, will there be oversights, bugs and glitches for people to exploit, almost definitely.
Actually Unicode does make a good effort of classifying characters into scripts [wikipedia.org], with some "common" characters that can appear in any scripts and some "inherited" characters (like diacritics) that belong to the character that they are applied to. Thus the Cyrillic"Es [wikipedia.org]" looks like a Latin "C" but is a different Unicode character, one belonging to the Cyrillic scripts and the other to the Latin script. The different languages using the same scriptis a red-herring, it doesn't matter that both French and Englis
Re: (Score:2)
Take "mathematical letter kappa" and "latin k" for example, do you think your mom will be able to tell the difference?
To be fair they do have different script values so would be identified by the proposal
Re: (Score:2)
I don't know about the c, but that "gmal" domain looks mighty suspicious.
Re: Terrible idea (Score:2)
Re: (Score:2)
Re: (Score:1)
The Latin alphabet is not American.
Re: (Score:2)
If a chinaman and a russian swap buisness cards and both have used their own scripts for email addresses are thier thoughts going to be "great" or "how the fuck do I type this?"
My guess is nationalists who don't care about the world beyond their countries borders may adopt this, those who care about being part of the global community (or simply about interoperating with older software) will avoid it like the plauge it is.
Re: (Score:2)
And don't forget that maybe some Chinese dude has problem with typing English (although I think most keyboards all around the world do keep ASCII letters and base ASCII punctuation at least, so there's that at least today...)
Phonetic entry using pinyin is still the most common method, which has been greatly sped up with predictive text like on cell phones, so the most common characters can be entered with a few keystrokes. Google Pinyn [wikipedia.org] in this regard is, as the kid's say, the shiznit.
.
Re: (Score:2)
How on earth am I supposed to email someone when I don't even have a key that corresponds to a letter in their email address. And do I'm not keeping a huge chart of Alt+number combinations handy.
Of course there is probably someone in China or Korea thinking "why do I have to use this special keyboard mode with characters I don't understand to write emails".
Re: (Score:2)
Cause while his countrymen were running around killing sparrows with sticks at the behest of an insane, leftist ruler, the capitalist west had already been working on the transistor for 10 years, and was continuing working on improving the integrated circuit it had become part of and thus had a huge head start on defining the standards that would be used in a global communications network of billions of computers.
Re: (Score:1)
Re: (Score:2)
We can just use the corollary of the standard westerner's mode of improving communications with furriners of raising your voice, ALL CAPS!!!
Re: (Score:2)
Of course there is probably someone in China or Korea thinking "why do I have to use this special keyboard mode with characters I don't understand to write emails".
Any educated person there, and in countries that use Cyrillic in case you wondered, will learn the Latin alphabet in school. By the way, their keyboards always have the Latin alphabet on them along with symbols for certain characters in their own writing systems.
Re: (Score:3)
Hit the 'Reply-To' button, naturally.
- After adding the user to your Address book.
Re: (Score:1)
Re: (Score:2)
That would probablly work reasonablly well for greek and cryllic scripts.
For other scripts have fun dealing with weired rules for mixing LTR and RTL chacters. Characters that join together into something that looks more like squiggly handwriting that what we would recognise as printed text, or a sea of thousands of characters that all look very similar to the western eye.
Try installing international fonts (Score:1)
By default ubuntu doesn't unless your codepage requires it. Most of the 'complete' unicode fonts aren't included by default.
Re: (Score:2)
Afaict there are no fonts covering all of unicode, partly because it's a moving target, partly because the unicode consortium doesn't release a free reference font leaving it up to third parties to look at the standard and come up with their own versions of the characters (the fonts used in the spec pdfs are propietary).
There are fonts that come reasonablly close to full coverage but they tend to be large and have poor quality coverage of some scripts. Also the only free one i'm aware of is a bitmap font.
Re: (Score:2)
Shouldn't there be? Serious question.
Re: (Score:2)
Google is working on a font family which supports every Unicode character, it's a big job though.
http://www.google.com/get/noto... [google.com]
Re: (Score:2)
Yeah, definitely they should have one. It wouldn't need to be fancy and it wouldn't matter if it was so huge, it blew stuff up (that stuff needs fixing) but I'm sure it would have helped adoption.
Looks like google is making one according amRadioHed below. Which is good news but I'm not quite sure how they're going to make joining Google+ a requirement to use it.
Re: (Score:2)
you cannot use solely international characters, the first one need to be simple ascii
What?Where do you get that from? TFA gives examples where the whole email address is in international characters (katakana)
Re: (Score:1)
Implementing proper domain and user authentication by baking PGP or some other PKI right into the email protocols will both solve the spam problem comprehensively AND allow UTF8 domains with minimum risk of phishing /spoofing.