Forgot your password?
typodupeerror
Communications Google

Gmail Recognizes Addresses Containing Non-Latin Characters 149

Posted by timothy
from the germanic-frankish-anything dept.
An anonymous reader writes In response to the creation in 2012 by the Internet Engineering Task Force (IETF) of "a new email standard that supports addresses incorporating non-Latin and accented Latin characters", Google has now made it possible for its Gmail users to "send emails to, and receive emails from, people who have these characters in their email addresses." Their goal is to eventually allow its users to create Gmail addresses utilizing these characters.
This discussion has been archived. No new comments can be posted.

Gmail Recognizes Addresses Containing Non-Latin Characters

Comments Filter:
  • by CRC'99 (96526) on Wednesday August 06, 2014 @04:02AM (#47612207) Homepage

    So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?

    Great.

    • by Anonymous Coward on Wednesday August 06, 2014 @04:07AM (#47612223)

      Finally I can get motörhead@gmail.com!

      • I will represent myself as a shady unofficial sales representative for an Australian microphone brand.
      • Finally I can get motörhead@gmail.com!

        This is exactly what is going to happen, and I don't mean that in a good way. I already see it in other chat environments, like Second Life, where the full power of Unicode allows any and all characters in usernames. It's bad enough that they substitute Latin letters with superficially similar characters from other languages so we end up with names like ££¥ and , but miles of decorative symbols drawn from Braille and mathematics... and don't even get me started about the entire upside-down

    • by rvw (755107)

      So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?

      It's not about bìllgàtes@outlook.com, but billgates@óutlook.com. It's the domain that is going to cause problems, not the user!

      • ... and they'll use a greek lower case omicron (), rather than an accented o. The looks exactly the same as an o (except on Slashdot, of course. Slashdot hates Unicode...)
    • by Captain_Chaos (103843) on Wednesday August 06, 2014 @04:22AM (#47612269)
      Worse; they will come from root@gmail.com, administrator@gmail.com or BillGates@gmail.com, only those o's and a's will be Cyrillic or something like that (can't do it here; Slashdot doesn't display them).
      • by rvw (755107)

        Worse; they will come from root@gmail.com, administrator@gmail.com or BillGates@gmail.com, only those o's and a's will be Cyrillic or something like that (can't do it here; Slashdot doesn't display them).

        When you mix Latin htmail with a Cyrillic o to get hotmail, Google and all email programs should refuse that address immediately, mark it as spam, make the address red with a warning sign etc. Mixing character sets should not be allowed in a domain or in a username. So the username may be all Cyrillic or Greek, the domain name may be all Chinese or Latin, and these may mix, but no mixes in the domain name or username itself.

        • by Chrisq (894406)

          I think that's the way to go - only allow characters from a single unicode script [unicode.org] in the username and in the domain name. The domain name part is currently handled by registras so that may not need any additional rules.

          However this really should be part of the RFC, or else anyone banning mixed names would be "non compliant". If the RCF does not specify this then the best that gmail (or any other system could do) would be to prevent people registering mixed names themselves and giving a warning (and maybe c

          • by rvw (755107)

            However this really should be part of the RFC, or else anyone banning mixed names would be "non compliant". If the RCF does not specify this then the best that gmail (or any other system could do) would be to prevent people registering mixed names themselves and giving a warning (and maybe colour characters) if email is recieived from an address with mixed scripts.

            Gmail, Microsoft and Yahoo and others like gmx, universities, big companies should simply refuse these mails. Microsoft should make Exchange so that this is the default way for handling these mails. The same goes for qmail, postfix etc. But that won't be enough.

            As another commenter said, you can make up latin looking names using cyrillic characters, and we won't notice. How do you catch that? I guess this will the the time that PGP will prove it's value.

            • by MrNaz (730548)

              I agree. The real solution is hardened authentication getting baked right into email. I'm all for UTF8 domain names and email user names, however if the email protocol suite is going to be expanded to allow for more features, then I think security should be top of that list.

              Sure, for a while, domains that span multiple character sets such as hotmail.com with a Cyrillic o could be spam flagged, however what happens when (not, if, but when) legitimate domains with multiple character sets start appearing? What

              • How does confirming the domain's identity automatically solve this problem?

                If someone from the gxail.com domain sends me email (let's assume here the 'x' is some weird Cyrillic character that looks just like an 'm'), any automated confirmation of the domain's validity would not do some sort of eyeball check "Oh, that looks like gmail.com, let's confirm if it is, oops it isn't..." but rather an automated "did that email come from gxail.com? Yup, sure did."

                Even if you popped up a notice that said "hey, I don'

        • Belarusian uses a Latin-style "i" in place of the typical Cyrillic short i... So you can still phish admirably with "paypaI" and never leave Cyrillic. e, x, c, y, i, o, p, a: how many words can you make?
        • by kyrsjo (2420192)

          There are languages, such as the Scandinavian languages, which are "mostly latin". This means we have the full A-Z as used in English (although C,Q,W,X,Z are never used) PLUS some extra letters "Æ/Ø/Å" (dunno if this displays correctly here). There are also domains which uses these letters like "lånekassen.no", which is the state agency handling student loans. (They are also available at the alternative address "laanekassen.no".)

          Thus a "hard and fast" rule disallowing domain names with m

      • Re: (Score:2, Insightful)

        by pla (258480)
        Worse; they will come from root@gmail.com, administrator@gmail.com or BillGates@gmail.com, only those o's and a's will be Cyrillic or something like that (can't do it here; Slashdot doesn't display them).

        "Rich company problems". XD

        Bluntly, this won't affect most Americans for the same reason spam from .il, ru, or .cn doesn't matter - Because we simply don't get any legitimate email from those domains. It doesn't take your spam filter long to figure out "if the address contains character-X, 100% chanc
        • by drinkypoo (153816)

          It doesn't take your spam filter long to figure out "if the address contains character-X, 100% chance of spam"...

          Yes, yes it does. It took Google years to stop sending me spam in foreign languages that I couldn't read anyway.

        • by Kjella (173770)

          Bluntly, this won't affect most Americans for the same reason spam from .il, ru, or .cn doesn't matter - Because we simply don't get any legitimate email from those domains. It doesn't take your spam filter long to figure out "if the address contains character-X, 100% chance of spam"... And that assumes your mail server doesn't outright block those as a hardcoded rule (in a former life I had to babysit the Exchange server for a small business; if you came from anywhere not in one of the big-six TLDs, auto-junk).

          It must be wonderful to run a mom and pop operation where none of your customers, suppliers or anyone else has an international mail address. And it certainly won't work for any other country but the US, a canadian business that doesn't accept .ca mail? Don't think so. And if you're operating an ISP, university or whatever some of your users will be foreigners in real contact with the rest of the world. Neat that you can wave the WORKS4ME flag, it's still a problem for a lot of other people.

          • by pla (258480)
            It must be wonderful to run a mom and pop operation where none of your customers, suppliers or anyone else has an international mail address. And it certainly won't work for any other country but the US, a canadian business that doesn't accept .ca mail? Don't think so.

            Yellow flag: Failure to extrapolate.

            Canadians can't block .ca, of course. They probably feel pretty much the same as I do about .il, .ru. and .cn, however. Canadians can't block the few diacriticals used in French (although in my experi
            • by Behrooz (302401)

              Is it deeply wrong that I originally read "WORKS4ME" as 'worksame' due to excessive IRC leetspeak exposure in my youth... and yet it still kind of made sense?

          • by tlhIngan (30335)

            It must be wonderful to run a mom and pop operation where none of your customers, suppliers or anyone else has an international mail address. And it certainly won't work for any other country but the US, a canadian business that doesn't accept .ca mail? Don't think so. And if you're operating an ISP, university or whatever some of your users will be foreigners in real contact with the rest of the world. Neat that you can wave the WORKS4ME flag, it's still a problem for a lot of other people.

            Most of the Fren

      • by DarkOx (621550)

        No worse the will come from

        updates@?tfosorcim?.com which will be displayed like:
        updates@microsoft.com

        Just imagine the ? marks being the left-right reverse character.

      • The "From:" header has been spoofable in ASCII since the beginning of e-mail. Given its unreliability, you are foolish if you put much stock into it.
      • I'm glad everyone pointed this out because it's the first concern that came to mind.

        This makes me happy because;
        1) Great minds think alike.
        2) Google will instantly realize the error of their ways with weird characters and change this policy.
        3) I'm not alone. This could very well be my posse.
        4) Slashdot has it handled, the world is safer.
        5) I don't have enough things I'm happy about on a regular basis to make a top ten list -- and I've learned to be OK with that.

    • by dejanc (1528235)
      That kind of phishing already exists, even more sophisticated: a bug that a lot of software contains is not distinguishing between same looking characters in different alphabets. E.g. you can sign up on many forum/bbs platforms as Administrator if your leading A is cyrillic [fileformat.info] A instead of latin [fileformat.info] A. Both look the same but have different html entity codes and are different unicode chracatres, which is true for most vowels and many consonants (e.g. cyrillic B and latin B, C and C, E and E...). Or, for more fun, l
      • by seyyah (986027)

        That kind of phishing already exists, even more sophisticated: a bug that a lot of software contains is not distinguishing between same looking characters in different alphabets. E.g. you can sign up on many forum/bbs platforms as Administrator if your leading A is cyrillic A instead of latin A. Both look the same but have different html entity codes and are different unicode chracatres, which is true for most vowels and many consonants (e.g. cyrillic B and latin B, C and C, E and E...).

        What software (or library) is programmed to recognize that two chars look the same and therefore allows them based on the appearance rather than their encoding?

        • by dejanc (1528235)

          What software (or library) is programmed to recognize that two chars look the same and therefore allows them based on the appearance rather than their encoding?

          I am not aware of any. My "solution" to this problem is to allow only unambiguous characters to be used. I really mostly have to deal with only about 60 characters in total which I allow people use for unique fields, so it's manageable.

    • by Megane (129182) on Wednesday August 06, 2014 @09:40AM (#47613469) Homepage
      I think ròót@gmail.com is a better choice because it looks angry.
    • If you're careful you can get pretty much all the major mailbox providers accepting email from any address, including many major ISPs, and at least Gmail (and Apps) and Hotmail (and Office 365). Again if you are careful, they will go into the inbox. SMTP was great back in the days when there were 20 networked computers. The continued lack of strict adhesion to the common authentication standards makes me want to both laugh and cry simultaneously.

      I send email "from" bill.gates@microsoft.com from the command

    • by lamber45 (658956)
      No, they're not allowing gmail accounts to use non-ASCII local parts yet. However, mail to/from other domains can have non-ASCII local part and domain name. If that other domain allows a random user to create an account "róót", that's about the extent of the possible phishing.
    • So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?

      Great.

      I've heard this argument before.
      Spoofing a return/sending mail address is incredibly simple. In fact, I do it every day as part of my legitimate non-hacker job. I could send you an email from Bill.Gates@microsoft.com if I wanted to. (though your mail server might have some security issues with that) Do you thing that when you email Support@somecompany.com there are a team of people that all log into that same mailbox? Or when they reply to you, they're really using that mailbox?

      So there's no reason to use

    • Lôvè yôûr ïdèàl Éxample.

  • by Anonymous Coward on Wednesday August 06, 2014 @04:05AM (#47612217)

    Google updated their regular expression. Good for them.

    • by Chrisq (894406) on Wednesday August 06, 2014 @04:10AM (#47612229)
      I would imagine that there they implemented RFC6532 [ietf.org], which involves a lot more than changing a regular expression
      • I would imagine that there they implemented RFC6532 [ietf.org], which involves a lot more than changing a regular expression

        So we get sometimes unreadable mails because the encoding of the content is unknown. Then some mails will be rejected because of an encoding problem in the address itself. At least in the first case the mail was received and we had enough time to fix the problem.

  • Sigh (Score:5, Insightful)

    by ledow (319597) on Wednesday August 06, 2014 @04:10AM (#47612231) Homepage

    From what I can tell, a mail server has two options when receiving this mail:

    Accept it.
    Reject it.

    The default, with software that doesn't understand this RFC yet (which seems to be... just about everything), is to reject. So trying to use this as an email is not only going to mess up every form you try to fill in online (because they won't see it as an email address either), but quite likely just gets you bouncebacks from everyone you email.

    What was needed was surely a system similar to the IDN system for internationalisation, which would allow those with ASCII-only DNS servers etc. to STILL WORK, by converting the Unicode characters to ASCII subsets and then sending the email as normal, through the entire PLANET-worth of working email servers out there that could accept it.

    Having a content negotiation option at the SMTP level, that mail servers have to implement and handle specifically, is just ridiculous, and even with GMail's kickstart it could be decades before you can guarantee that your UTF-8 email address will work across the Internet and even then there'll be some old legacy server that will just bounce all your email BECAUSE of that character set in your address. And it will be perfectly legitimate to do so.

    However, as others have pointed out, if this goes through, it will be nigh-on impossible to spot phished/faked email addresses, just like it is with IDN links unless you know how to find the original ASCII-encoding of them.

    • by gurps_npc (621217)
      Many email accounts have the option of setting up a temporary clone email with different letters. That is, you could be something_in_Mandarin@gmail.com and also AlexanderTheGreat@gmail.com All in one single account. So you use the Mandarin email address for your mandarin business cards, and the English one for all web sites and even on your English business cards.
  • Good luck (Score:5, Interesting)

    by Pascal Sartoretti (454385) on Wednesday August 06, 2014 @04:16AM (#47612255)
    My e-mail address ends with the suffix ".name". It is perfectly correct (even if not common), but I still sometimes have issues today because some stupid website has an outdated regular expression which says that ".name" is not correct.

    Now imagine this with non-latin characters (or just non-ASCII characters)... If you only write to people also using GMail, it might work.
    • by RevWaldo (1186281)
      I was once given a corporate e-mail address with an apostrophe for my last name. Perfectly legal, but many web sites choked on it. And they left the apostrophe off my first batch of business cards.

      (Fortunately I also had an alias address which didn't have the apostrophe and was about two dozen characters shorter.)

      .
  • I hope they implement the same kind of anti-phishing measures that browsers are taking for displaying domain names with non-Latin scripts. http://en.wikipedia.org/wiki/I... [wikipedia.org]
  • Maybe now my e-mails to Tutankhamun [google.com] will quit bouncing.

    • by Megane (129182)
      Last I checked, you can't use a JPEG image in the To: field of an e-mail header. Maybe you could give a link to a UTF-8 text example?
  • That "signed char" was a bad coding choice back in the day.

  • They better be filtering out the non-printing characters that do fun stuff like reverse the text direction, overstrike, etc. How long until people start registering gmail addresses with Zalgo [google.com] text?

    And how long until someone registers pile of poo [google.com] @gmail.com?

  • by Zaiff Urgulbunger (591514) on Wednesday August 06, 2014 @09:40AM (#47613473)
    As a webdev who gets irritated at websites that fail badly with their email validation (e.g. not allowing + in the local part, or only allowing 2 or 3 char TLDs), I do try very hard to get this right. So I've got a solid(ish) email validation function. But, I'm a bit sketchy on what to do with UTF-8.

    For the domain, I'd hope that the MTA (Postfix in my case) would allow UTF-8 and convert to punycode as required, but I'm not sure it does. So currently I don't allow for that. I _could_ convert to punycode myself, but I don't.

    And as for the local-part, I'm fairly certain Postfix doesn't allow for UTF-8 at present.... at least, not the Postfix version supported on Debian 7.

    So I'm just wondering what everyone else is doing? Should I improve my support, or should I just wait for support to be added to my MTA before I bother?
    • by Richy_T (111409)

      I don't see much point getting anal about email validation, especially since it's fairly hard problem. It's been a while since I've written one but something along the lines of something@something.something is usually enough and let the mail servers sort out the rest.

      • Yeah, for the most part, I prefer validation that does as little as possible. I previously had a simple check for a single @ symbol. That worked, except someone used a comma which caused the mailer to think there was more than one email address. I should have anticipated that one. Not a biggie, but there was crap in the mail.log where it was trying to use an invalid address... I don't recall the exact issue, but I believe it though it might be a local address or something.

        So as a result, I thought I should
    • by Anonymous Coward

      youre a moron for having a 'validation function' in the first place. if the user screws their address up its their fault, not yours. and so long as you don't pass it to your shell like an idiot, you shouldn't care either. let the damn thing bounce.

  • As in, one email account connected to two email addresses, one in say Russian, and the other using the latin alphabet.

    Probably set up so that if the Russian gets bounced, it tries again with the latin alphabet.

    Also, the signature of all emails sent from this should have a copy of the latin email address, so that people that don't have the Russian capability can reply.

    • by RockDoctor (15477)

      As in, one email account connected to two email addresses, one in say Russian, and the other using the latin alphabet.

      And when there is no "one-to-one", "official" transliteration from one script to the other.

      My wife's Russian passport transliterated her name into Latin characters in two different ways. This cause non-trivial issues when trying to get her citizenship sorted out.

      So, you add TWO Latin 2equivalents to the one Russian one? Doesn't work, does it?

  • Now that Google has implemented 2012 i18n technology, maybe vaunted technology site Slashdot can catch up to 1998 [ietf.org] and implement UTF-8 properly?

    Nah.

  • by netsavior (627338) on Wednesday August 06, 2014 @10:00AM (#47613645)
    first off, I went down the slippery death defying slope of email address validation recently... Our software had simple regex rules... so I thought I would just implement RFC rules, or find a library that did... wow. RFC is a mess... APIs are worse.
    This is a valid email address:
    dude"".dude@[192.168.1.1]
    so is this:
    a@com
    also valid:
    test+test=gmail.com@test.com
    none of those will work in MS Outlook or exchange, none of them will work with jquery validation plug-in, some close to that will work with java mail API. Most funky but standards compliant email addresses will pass Apache commons validation.

    In the end, I went with a 2 part validation: 1) Apache Commons Validation (mostly RFC correct), then a second pass on Javax.mail because if I can't send email to it, then what is the point of having it? We still get addresses that pass both validations, and bounce at some SMTP relay due to "invalid address format."

    I am sure internationalization will make all this better.
    • by netsavior (627338)
      take the name part of your gmail address... add this string to it:
      +a."b(c)d,e:f;gh>i[j\k]l".a@gmail.com

      not only is yourname+a."b(c)d,e:f;gh>i[j\k]l".a@gmail.com a valid email address... but you will actually get email addressed that way. It will fail most email address validation that I have found.
    • by allo (1728082)

      just send a mail. if it fails, discard the pending registration or whatever, possibly via "not confirmed" timeout some days later.

      • by netsavior (627338)
        I would love to do that. I beg our management every week. Our customers are Luddites. Email is optional.
  • by allo (1728082)

    Isn't this something, which was introduced years ago?

  • ...between these two addresses:

    firstname.lastname@gmail.com
    firstnamelastname@gmail.com

    I keep getting email at the former addressed to the latter. Anyone else encounter this oddity with Gmail?
    • by devman (1163205)

      In case you are asking this for real, this is a documented gmail feature.

      https://support.google.com/mail/answer/10313?hl=en

      You can actually log in with any variant of your username that includes 1 or more periods added in arbitrary locations.

2.4 statute miles of surgical tubing at Yale U. = 1 I.V.League

Working...