ICANN Approves Non-Latin ccTLDs 284
Several readers including alphadogg tipped the news that ICANN has approved non-Latin ccTLDs at its meeting in Seoul. "Starting in mid-November, countries and territories will be able to apply to show domain names in their native language, a major technical tweak to the Internet designed to increase language accessibility. On Friday, the Internet's addressing authority approved a Fast-Track Process for applying for an IDN (Internationalized Domain Name) and will begin accepting applications on Nov. 16. The move comes after years of technical testing and policy development... Currently, domain names can only be displayed using the Latin alphabet letters A-Z, the digits 0-9 and the hyphen, but in future countries will be able to display country-code Top Level Domains (cc TLDs) in their native language. ... 'The usability of IDNs may be limited, as not all application software is capable of working with IDNs,' ICANN said in a 59-page proposal (PDF) dated Sept. 30 that describes the [application] process." Reader dhermann adds, "Great, now even less chance I can identify NSFW links before they are blocked by my work's big brother app and my boss is notified... again."
terrorist level domain (Score:4, Funny)
Re: (Score:2)
it's only a matter of time before someone registers bánkófámérícá.com or llóydstsb.co.uk for their phishing schemes
Re: (Score:3, Informative)
That has been possible for years.
This is about registering bankofamerica.cõm or lloydstsb.cø.ûk
The part AFTER the dot.
I took Latin in high school (Score:2, Funny)
I'm glad we're going with Non-Latin TLDs now, I never understood going to the website "e.pluribus.unm"
Perdire (Score:3, Funny)
Re: (Score:2)
What about fahrvergnugen.vw?
first urls, then slashdot (Score:5, Funny)
ï höpé thãt slâshðõt wìll dö thís töø wìth ÜRLs!
www.íçáñn.örg
ìt wörkéð!
Re:first urls, then slashdot (Score:5, Funny)
Here's a demonstration of how non-Latin characters show on /., starting with Arabic:
Hindi:
Russian:
Japanese:
Korean:
Chinese:
Re: (Score:2)
Just because the characters don't show up in the edited text doesn't mean that they won't be handled in anchor tags or Slashdot's URL tag.
Re: (Score:3, Insightful)
Just because the characters don't show up in the edited text doesn't mean that they won't be handled in anchor tags or Slashdot's URL tag.
Well, Slashdot mangles them anyway [russian-]. The URL should end in .com.
Slashdot's web interface is quite embarrassing in this respect. Having a non-Unicode-capable page in 2009 is like having one that is optimized for Netscape 0.9, no matter what amount of JavaScript and Web 2.0 bling they put in there.
If international URLs will finally force Slashdot to implement a triviality such as string parsing, so much the better.
Re: (Score:2)
ICANN has lost it! (Score:4, Insightful)
Far too much software makes the assumption that TLDs only contain [a-z0-9-], so if you want to go changing that there needs to be a damn good reason, there is not. There are ~1369 2 letter TLDS to be shared between ~200 soverin states and 49284 3 letter generic ones to be split between uses (.xxx .nws .org .edu, etc), there doesn't seam to be any good reason to expand that and make lots of software more complex.
And the answer to that... (Score:5, Interesting)
... of course, is Punycode.
A comment [slashdot.org] before yours has www.íçáñn.örg, which, when entered into Firefox, turns into
www.xn--n-tfarxw.xn--rg-eka
. Looks like the software will still live :)
Re: (Score:2)
Probably because up until about five minutes ago, it wasn't a valid address.
Re:And the answer to that... (Score:5, Informative)
You don't understand. Punycode is how second-level domains are already implemented, even on top of relatively old browsers. This is an extension of Punycode to be usable in the TLD as well.
In other words, your current version of Firefox will be able to visit pages in IDN TLDs when they're implemented, and so if someone does create a .örg TLD today, you can go to www.anysite.örg to your heart's content already.
Note that this doesn't mean you can go to www.anysite.örg in NCSA Mosaic or anything, because these old browsers were around when Punycode wasn't even a standard. You can go to www.anysite.xn--rg-eka and NCSA Mosaic will recognise that, though. The seamless IDN TLD usage is just going to be present in the more modern browsers. I expect that Opera 8+, IE 6+, Firefox 2+ and recent Safari/Konqueror/Epiphany are going to be able to visit www.anysite.örg and 'hide' the xn--etc- access details from you, the user.
Happy surfing!
Re: (Score:2, Informative)
Re: (Score:2)
Though TBH I'm not certain WHY we need TLDs anyways. It isn't like there is some commercial slashdot.com it just redirects. I imagine that any big name will
Re: (Score:2, Insightful)
This is about letting people use characters from their frickin' own language instead of just english.
Just like so many other things in programming.. if the software doesn't do international, it doesn't do international.
This has nothing to do with making more TLDs.
Re: (Score:3, Insightful)
Yeah right. Because everybody in the whole world only uses ASCII right?
Sorry for sounding flippant, but such US-myopia is far to prevalent for my liking.... Come on guys: Wake up and smell the coffee! There is more to the world than the US! There is no reason to make most of South East Asia and China 2nd-rate citizens on the internet.
I agree that there is a lot of software that needs changing as a result though. But that just means more work, right? You could probably sell this as an anti-recession measure
Re: (Score:2)
You look in the table of characters, it's not like there's thousands of french words that have that. Hell, the cedille is common enough to be added to the characters you can hit with a modifier on at least apple's version of US qwerty. Also, US ASCII isn't even considered good enough for british english because of loan words - it can do three languages right: Latin, US English and Hawai'ian.
Re: (Score:2)
I can't speak for Windows or Linux, but on a Mac:
Option-C.
System Preferences > Language & Text > Input Sources, and select 'Arabic'. There's a keystroke that will let you switch instantly between layouts.
Re: (Score:2)
It's not really an assumption is it if until now the "standard" only called for [a-z0-9-].
Re: (Score:3, Insightful)
You know, except for ease of use for those who don't use Latin characters in their daily lives. But who cares about them? They should just go back to their own country and create their own internet.
Re: (Score:3, Informative)
if you want to go changing that there needs to be a damn good reason
I don't have any first-hand experience, but according to the BBC story when one enters a native-script domain name into one's browser, the domain name is entered normally (for the locale) and then to enter, e.g., ".in", one needs to press a key combination to shift the keyboard into latin-mode, then, enter the two letters, then shift the keyboard back into native mode.
It's a usability problem. I sure would be annoyed if .com had to be rend
Re: (Score:3, Funny)
And now, with today's progress, that'd be CØBÖL.
Encoding? (Score:2, Interesting)
The encoding seems weird to me:
Any DNS gurus care to explain why they wouldn't simply use UTF8?
Comment removed (Score:4, Informative)
Re: (Score:2)
But it's not compatible with URLs that contain xn--, intended to show as xn--.
But how many of those were there? and afaict it's not xn-- anywhere just xn-- at the start of a part of a DNS name.
Re:Encoding? (Score:5, Informative)
To avoid breaking all the DNS-related code out there that assumes (ie correctly, based on the current spec) only alphanumerics and '-' in each component.
If you wish to rewrite every single bit of DNS-dependent code, in every laptop, server, embedded network device, etc, etc, ... well assume that it can't be done, and with this mechanism it doesn't need to be. Though I bet a few bits of code will barf at the '--' anyhow...
Rgds
Damon
Re: (Score:2)
(ie correctly, based on the current spec)
Only hostnames are restricted. Other than that, DNS is almost 8-bit-clean (it case folds A-Z to a-z and dot is special) so UTF-8 is fine.
Punycode only exists because some people have puny ...
Re: (Score:2)
Can you be sure that the DNS code in the WinME that runs your building's lifts is 8-bit clean, just for example? Or your old-but-good HP laser printer with embedded networking?
This is pragmatically addressing the probability of code still in use but written long ago when UTF-8 and 8-bit-clean were woolly notions and twinkles in academic eyes, or just badly slapped together by some junior lowest-big developer who thought "oh, just (ASCII) letters and numbers" and it seemed to work...
I'd bet you a whole doll
Re: (Score:2)
Actually, all labels are restricted to the characters allowed for ARPANET hosts. The spec does state that implementations should store labels as a length octect followed by a sequence of octets, thus implying that any compliant software _should_ handle UTF8, but no one wants to take that chance.
Re: (Score:2, Informative)
I am not DNS guru, but guessing. RFC882 - November 1983. RFC2044 - October 1996.
Re: (Score:2)
Backwards compatibility with existing systems that don't support UTF-8 but still need to make DNS queries. Ranges from basic tools like dig, to un-updated browsers, to embedded devices like routers.
Are there any public DNS servers that support this to see what happens with my existing software??
Re: (Score:2)
Since software makes the assumption that TLDs only contain [a-z0-9-] [slashdot.org], UTF-8 can't be used in the DNS. Internationalised domain names, even before these new ccTLDs, used that xn-- system, called Punycode [wikipedia.org]. For instance, the site tinyarro.ws, which provides short URLs via a Unicode domain name, already used .ws for that purpose. It turns into xn--hgi.ws when the DNS request is issued.
ccTLDs using Punycode is just an extension of that mechanism for second-level domains.
Erratum (Score:2)
Yeah, Slashdot apparently needs to be internationalised too. That ".ws" should be "[U+27A1].ws" (BLACK RIGHTWARDS ARROW).
Re: (Score:2)
Slashdot apparently needs to be internationalised too.
Slashdot uses a Unicode character whitelist due to past abuse [slashdot.org], and U+27A1 isn't on that whitelist. The euro sign € is though.
Re: (Score:2)
Re: (Score:3, Insightful)
Actually, UTF-8 can and is being used in DNS - as long as you stick to basic Latin characters, that is. Also it is Unicode - as I posted earlier, Unicode is a blanket for UTF-8, UTF-16 and UTF-32 which makes it ambiguous.
UTF-8 bits 0-7 is ASCII as long as bit 8 isn't set, so to fully support it you'd need to still exclude bits below 7 that are not valid html characters and include support for multiple bytes and bit 8. The reason existing DNS servers won't work with it is because bit 8 indicates multibyte
Re: (Score:2)
Good question. The field size for DNS requests is in double words (16bits) increments, so I don't see why it couldn't have been.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Because they know full well that the vast majority of web developers don't really know what unicode is [joelonsoftware.com] or how it works. Moreover the unicode spec is forever in flux and complete overkill for the international url problem. Lation only urls are a fly, we don't need a bazooka.
Frankly, the current Punycode based system is truly inspired, giving the best of both worlds. Newer browsers can display and use international urls seamlessly, but older syst
Re: (Score:2)
Phishing aid (Score:5, Insightful)
don't forget who wer're talking about here... (Score:5, Insightful)
There are letters in the Cyrillic alphabet that have different character codes than their look-alike letters in the Latin alphabet. I'm sure there are other collisions as well. I'm sure they accounted for this in the proposal, but the problem always lies in the implementation
This is a decision made by ICANN. We've known for some time that they will willingly approve really tremendously bad ideas, if enough money is presented to them. They recently moved on a motion to start selling gTLDs, after all.
From a security standpoint, this is a VERY bad idea without proper regulation of domain name registrations, and so far it has been demonstrated that we cannot manage them properly even with only the Latin alphabet
Security is not of any concern for ICANN. Never has been, never will be. As long as they keep making money they're happy; security, spam, phishing, etc, be damned.
Re:Phishing aid (Score:4, Informative)
I think the limitation that nationalized character sets will be restricted to the country TLDs where that language is native is a good first step. Additionally, I believe you're not allowed to use the latin alternative form characters from unicode (like 0xFF20-0xFF5F).
If you're really paranoid, you could just be extra suspicious of domains that end in two letters (and yes, I am including .us), particularly when the 2nd level name is something you recognize, like paypal, ebay, etc. If you're in China, there may indeed be a legitimate paypal.cn, but I suspect it would set off my spidey sense to see a URL like that show up in my e-mail.
Re: (Score:3, Insightful)
If you're really paranoid, you could just be extra suspicious of domains that end in two letters (and yes, I am including .us), particularly when the 2nd level name is something you recognize, like paypal, ebay, etc. If you're in China, there may indeed be a legitimate paypal.cn, but I suspect it would set off my spidey sense to see a URL like that show up in my e-mail.
That won't work. There really are a lot of big companies that have country-specific sites that use the two-letter global domains. For example, if you're after books in German then you might be very interested in visiting amazon.de, which is totally legit.
Re: (Score:3, Insightful)
Yeah, but if you know that you want that, then you'll be expecting it. We're talking about being on the lookout for 2 letter TLDs in places you don't expect them.
Re: (Score:2)
I don't think it's a big deal for TLDs since afaict those are created manually anyway.
For lower level domains (which are already using IDN) it's a bigger issue, firefox resorted to using a whitelist to get arround irresponsible registrars.
Re: (Score:2, Informative)
There are letters in the Cyrillic alphabet that have different character codes than their look-alike letters in the Latin alphabet.
Remember we are talking about ccTLDs. There are no more than 200 countries that would like to use non ASCII ccTLD, and they can be inspected manually. Russia wasn't awarded Cyrillic .ru because it looks like Latin .py (Paraguay). They will get .fr (Russian Federation) that looks like 0p (0 with vertical bar).
Re: (Score:2)
Russia wasn't awarded Cyrillic .ru because it looks like Latin .py (Paraguay). They will get .fr (Russian Federation) that looks like 0p (0 with vertical bar).
Are you sure it's not .rf - which doesn't clash with anything either, and makes much more sense to Russians themselves (since that is the standard abbreviation for Russian Federation in Russian).
Re:Phishing aid (Score:5, Insightful)
This risk can be greatly reduced if they limit domain names to only one alphabet, i.e. Russian domain with Cyrillic ccTLD should have only Cyrillic letters in it.
In many of these countries, they often have two domain names for a website: one that is easy to remember by foreigners, one that is easy to remember by locals (i.e. cyrillic name transliterated to Latin alphabet). The transliterated domain name is usually horrible, sounds weird, and often people transliterate stuff in different ways, so it's often not easy to remember anyway.
I think non-latin ccTLDs is a good thing.
matt
Here comes the Phishers! (Score:2, Insightful)
I'm all for internationalization, but perhaps limit it to internationalized domain extensions (.jp or
Re: (Score:2)
You not only didn't read TFA, but you didn't even read the summary very well, did you?
Re: (Score:3, Informative)
If ICANN did not standardise this then nations will just implement their own systems which will be different and incompatible with each other, much like China and Thailand have already done.
Are we going to have to update the URL RFC? (Score:2)
Thee current RFC 1738 http://www.faqs.org/rfcs/rfc1738.html [faqs.org] Only allows URLs to be composed of
" Within those parts, an octet may be represented by the chararacter which has that octet as its code within the US-ASCII [20] coded character set. In addition, octets may be encoded by a character triplet consisting of the character "%" followed by the two hexadecimal digits (from "0123456789ABCDEF") which forming the hexadecimal value of the octet. (The characters "abcdef" may also be used in hexadecimal encod
Re: (Score:2)
RTFA. Internationalized characters in domains are encoded. See also RFC 3492.
TLDs only? (Score:2)
Is it my imagination, or does this proposal only apply for TLDs, like .uk and .jp? I don't see any mention of supporting it for the rest of the domain name. That seems a logical extension, but it's not been announced.
Re: (Score:3, Informative)
It's already been in use for the rest of the domain name under certain TLDs for some time.
Not trying to be harsh but... (Score:2)
cmon how could you think "but in future countries" sounds okay.
it should be "but in the future countries"
great info though. I mean its nice to see that the internet is starting to become more international, especially as the US cuts mandatory ties to ICANN.
Excellent idea (Score:4, Insightful)
Now those countries, organizations and businesses that wish to become inaccessible to most of the world (except the native speakers of their own language) can finally do so as easily as possible. Create their own little Internet reservations and stay there :)
As long as my software (such as Firefox) obligingly converts these IDN urls into the dash-hex notation making them obviously unreadable, I am ok with that.
Disclaimer: I am a native of non-English speaking country. I am sure a few of my countrymen will use this feature based on misplaced patriotism. I am also sure that vast majority will ignore it just like they ignore potential to use non-latin domain names that exists right now.
Re: (Score:2)
How is this any different today?
If the content pointed to by a domain written in Latin characters points to a site written in Chinese, non-speakers of Chinese still can't actually do anything useful with the site.
Re: (Score:2)
Why? There are translators. The problem is that you are cutting people off as they can't type in Chinese characters (I know its possible but you have to install extra bits to do so, and know what your typing and how to type it on a latin keyboard).
I have friends in a few different non-latin alphabet countries and family in one. If their email addresses were in their alphabet I likely wouldn't be able to email them easily even though we, mostly anyway, correspond in English.
Re: (Score:2)
For example in China this will be used a lot. Other countries using non-Latin scripts will do so as well.
Actually it was possible already for a few years to register domain names in Chinese characters in Hong Kong, but still ending in .hk. Now that part can also become Chinese characters as replacement for .hk, .cn or .tw.
The catch was that a Chinese URL would work only within HK/China. Now this will also start to work worldwide.
One big issue for many lower-educated Chinese is that the Latin script is as
Re: (Score:2)
They make sense as aliases to sites that also have standard domain names.
Re: (Score:2)
Your point may be valid for some non-latin languages but as soon as you put China into the equation the figures change radically.
China IS the world. They are zillions! If they start using Chinese names it'll be us, "latin speakers", who'll be confined to our "own language and make our sites inaccesible to the rest of the world".
Um, can they be more specific than "Unicode"? (Score:2)
Unicode can mean many things - UTF-8, UTF-16, UTF-32 - so specifying Unicode is not detailed enough to implement and by not specifying, it is opening a can of worms IMO. UTF-8 tends to be slower and larger for non-ASCII but has wide acceptance. It would also be the favorite for Linux/UNIX because it is very common there (my Linux box has LANG=en_US.UTF-8) and also for communication with databases (in my experience, UTF-8 is what most enterprise companies use for their database settings if they need multi-
Re: (Score:2)
TFA is badly written and factually inaccurate.
All that is actually going on here is that icann is allowing use of IDN (which is already in use at lower levels of the heirachy) in the root. The standards for IDN already specify exactly how the names are encoded.
http://tools.ietf.org/html/rfc3490 [ietf.org]
Re: (Score:3, Informative)
Several mistakes there.
First of all any domain name is going to have to be encoded as a stream of bytes somehow because far too much stuff is already implemented to handle the string that way. As others pointed out punycode is used.
Second, UTF-8 is smaller than UTF-16 for all languages, even Chinese. This is because all the ASCII 0x00-0x7F characters are smaller, and therefore the encoding will be smaller if there are more of these than there Unicode 0x800-0xFFFF characters. This seems incorrect for Chinese
IDNs are dangerous. (Score:2)
See these slides about exploiting UTF-aware software.
http://www.casabasecurity.com/files/Chris_Weber_Character%20Transformations%20v1.7_IUC33.pdf [casabasecurity.com]
Latin =/= Support for English only. (Score:3, Insightful)
Re: (Score:2)
There's only a handful languages that use strict ascii, one is dead, and a bunch is a small family of closely related languages spoken by about 2 million people in the Pacific, which is what TFS and TFA describes TLDs as being able to do.
Re: (Score:3, Insightful)
Actually we are talking about the English alphabet, with j, u and w, which Latin din't have.
it just got easier to phish (Score:4, Informative)
Yay. Now you can can register yourbankname.com with some funky characters that render in exactly the same way as the letter you are used to.
What's new here? (Score:2)
Re: (Score:2)
You are. This story is about applying Punycode to TLDs (like .cn, .jp or .ru)
so chase.com (Score:2)
could be chàse.com or cháse.com
every website i go to from now on, i need to study the url with a magnifying glass to make sure i am getting the actual site i wanted. not even as a security precaution, but just to avoid phony sites that might be spoofing a real one for all sorts of purposes, even if just humor, not all of them nefarious, but all of them certainly annoying
a with accent mark may be easy to see, but there are some subtle unicode characters that are so completely like the lowercase "L"
How far will they allow this.... (Score:2)
One word: Klingon.
Re: (Score:2, Insightful)
I wonder what impact this will have on the ever decreasing amount of IPv4 addresses available.
This will have absolutely no effect on IPv4/IPv6. This is a DNS change to allow additional characters in domain names.
The domain names get translated to ip addresses by DNS servers.
I doubt that individuals & companies said, "No! We refuse to go on the internet until we can have TLDs with non-Latin characters."
Re: (Score:2)
This will have absolutely no effect on IPv4/IPv6.
It's not as clear as you think. The post you respond to probably thinks that having non-Latin TLDs will increase domain registrations, which might require more IP addresses. Not all new registrations will be redundant.
Re: (Score:2)
No fortunately they're not going yet for a full unicode thing, only a few select character sets like Chinese or Arabic. So for the moment that shouldn't be a problem.
Re: (Score:3, Funny)
micrösöft.cöm?
That's Microsoft with the volume turned up to 11?
Re: (Score:2)
Re: (Score:2)
IDK, but IMHO it could be a real FUBAR if they do it ASAP. YMMV.
Re: (Score:2)
Its to do with people with the wrong keyboard ... (Score:2)
... not being able to enter the URL!
How exactly do you think you'll be able to type in a URL in mandarin or russian on west european keyboard?
Re:Its to do with people with the wrong keyboard . (Score:4, Informative)
How exactly do you think you'll be able to type in a URL in mandarin or russian on west european keyboard?
You enable Chinese keyboard layout (dunno what's it called), and type it. The letters printed on the keys of your keyboard aren't some sort of magic that lets your computer input languages written in them, you know.
I don't have any keyboards with Russian characters on them, but I happily type in Russian regardless (in fact, I only first realized that I do actually truly touch type when I first ran into this problem, which turned out to not be a problem in the end).
Re: (Score:3, Interesting)
I'm happy you'll do this. I won't, and the majority of the internet users won't either. It'll just further separate nations, because I won't go through the hassle of typing in a foreign character domain name - it'll just a site I won't visit.
Presumably, if a site is designed to be visited by someone who only understands English, it will use an English TLD. If it uses TLD with national characters, then most likely the content is in the language other than English as well, and you'd need to have means to input that language to fully interact with the site anyway.
Re: (Score:3, Insightful)
"the majority of the internet users won't either."
Sorry, but that sounds like typical American ethnocentricity. The MAJORITY of internet users actually are people who don't natively speak English. Chinese speakers, Russian speakers, European people, many of whom use cyrilic alphabets, Arabs, South Americans, Indians, and others that I'm surely missing.
How can you possibly speak for "the majority of internet users", when people who speak English as their native language constitute a pretty small percentage
Re: (Score:2)
There is no Mandarin keyboard, you have to use an input method to input Chinese characters. Computers sold in China are in qwerty. You type the romanisation and you can choose from a dictionary of characters which one you want. Of course you should not have to do it to type an URL to visit a page in English, but I expect all Mandarin speakers to have a way to type Chinese on their computers so it should not be a problem.
You may have more problems with european languages, for example French and its accents.
Re: (Score:2, Insightful)
Re:Its to do with people with the wrong keyboard . (Score:4, Insightful)
Uh, yeah, because the keyboard you're using is a clear indicator of which language(s) you understand.
Re: (Score:2)
how is that a problem? if you can't get to the website because it's in a funny language, what makes you think you can read the contents?
Ever go on holiday? Ever need to use an internet cafe in your holiday country?
Re: (Score:2)
With blackjack and hookers! In fact, forget about the internet.
Seriously though, it is nice to have a lowest common denominator in characters, so that everyone can type every address on the Internet.
Re: (Score:3, Funny)
Re: (Score:2)
The Internet has to evolve (Score:5, Funny)
....although obviously not ... in Kansas.
Re: (Score:2)
Re: (Score:3, Insightful)
I don't normally browse websites written in a language I can't understand.
1. The link text in the example I provided was in English.
2. I am not aware of any requirement that only one language may be used on a given website. If there is such a requirement, please inform my contacts on Facebook of this, because they post messages there in about 15 different languages using at least 4 different writing systems. (And I've posted there myself in 4 languages, including English.)
I still see an ignorant american that thinks the whole world should read and write english for people like dhermann.
1. See above.
2. So you are saying that you can read my mind? Perhaps this ability of yours needs some fine-tu