How Are You Accomplishing Your i18n? 117
cobrabyte asks: "My team has recently been given the task of implementing internationalization (i18n) in our MySQL databases (PHP-interfaced). Essentially, for every article X, we need it presented in any number of languages (once translated). As we were working on gathering the necessary procedures, we were very surprised to find that there's not much organized information regarding i18n using MySQL and PHP. Is the topic of i18n too new to garner any usable info?"
Indian companies are very qualified for this stuff (Score:2, Offtopic)
Re:Indian companies are very qualified for this st (Score:2, Offtopic)
I know a number of projects which have been outsourced to India, and they have all been done wrong, and quite a few of them ended in disaster. I don't know of a single outsourcing project that has been finished correctly.
Re:Indian companies are very qualified for this st (Score:2, Offtopic)
I don't know of a single outsourcing project that has been finished correctly.
Not all projects are intended to be, as you say, "finished correctly".
Re:Indian companies are very qualified for this st (Score:1, Offtopic)
Re:Indian companies are very qualified for this st (Score:2)
Not really. The Indian scripts are very poorly supported by most operating systems and software. It is only recently that Indian programmers have started to work on this and improve the software situation for their own domestic market. Most Indian programmers have barely more awareness of internationalisation is
Re:Indian companies are very qualified for this st (Score:2)
Re:Indian companies are very qualified for this st (Score:2)
But otherwise, the broader point is well-taken; despite India's obvious linguistic diversity, Indian programmers dont necessarily have an advantage over other nationalities in i18n efforts.
Re:Indian companies are very qualified for this st (Score:1, Offtopic)
Re:Indian companies are very qualified for this st (Score:1)
What's the question? (Score:2)
I'm not sure what the question(s) is.
Re:What's the question? (Score:1)
We are familiar with UTF and have done extensive research on the subject. However, outside the realm of standards, there is not a clear path for bringing all of the various pieces (MySQL, PHP, Apache, etc.) together to form a cohesive, multi-language-compatible unit.
There are articles here and there about various aspects of internationalization. However, I get a sense, after reading these articles, that the authors are just experimenting. I d
Re:What's the question? (Score:2)
Re:What's the question? (Score:2)
How do you translate it into other languages?
-without using some crappy 'BabelFish' layer
-without having to write a complete localized version for each language.
Re:What's the question? (Score:1)
Re:What's the question? (Score:5, Informative)
Ask any government that supports multiple official languages (Canada, Switzerland,
-without having to write a complete localized version for each language.
You need to make the content management system (CMS) language aware, and you need to localize all your templates. Then you need to add a key to your article database for language, so the user can retrieve article 101 in either english or french. (think a long the lines of http://localhost/cms/display.php?article=101&lang
I know nothing about PHP programming, so I cannot comment on that, or MySQL (main gotcha I expect is datatype, UTF-8, iso8859-1, vs. windowspage1574). Two articles I found useful in general about internationalization are
UTF-8 and Unicode FAQ for Unix/Linux by Markus Kahn
How do I have to modify my software?
http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod [cam.ac.uk]
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
http://www.joelonsoftware.com/articles/Unicode.ht
Re:What's the question? (Score:2)
In many languages, you have strings like this:
"You, %s, owe us %s dollars."
that are used with formatted print statements such as printf() that make assumptions based on ordering: printf(get_locale_string(you_owe_us_string),name, l ocal_money_string(amount)).
In another language, the orderi
Re:What's the question? (Score:2)
Re:What's the question? (Score:2)
Create table LocalizableFields (PK FieldID int)
Create table LocalizedValues (FieldID, LocaleID, LocalizedValue)
Then, instead of creating table products with:
ProductID int, ProductName nvarchar, ProductDescription nvarchar
You use:
ProductID int, ProductName int, ProductDescription int
When you want to fetch stuff back out, you do a join like this:
SELECT lpn.LocalizedValue AS ProductName,
lpd.LocalizedValue AS ProductDescription
FROM
Some answers (Score:5, Informative)
I sorted out the i18n design for a project recently, so I can share some insights on the process. My project used Java/JSP, but the problems are mostly the same. One of the most important points to be made is that you *need* to sit down and design it all the way through -- this is not a "feature" that can be easily added in when you need it later (and extreme programming teams can get hosed on this one pretty easily).
Things to consider (in the sequence of a request for simplicity's sake):
1) How will you know what language a user wants (first time, and on subsequent pages)? The user should be able to select/change their preference (though you could use their browser-reported locale as a guess), and they should be able to *bookmark* the homepage in their language. You could use a cookie, and redirect from the basic homepage based on that. Personally I avoid depending on cookies where possible, I didn't want to have duplicated directory structures, and I didn't want an added param on every request, so I used multiple *subdomains*, one per supported locale. They all mapped to the same IP, same application -- but in the web application I could check the requested URL and set the locale (and build the page) correctly using that. There were links on the top of the homepage to switch languages -- which would just flip to the proper subdomain. (Important note -- this complicates getting a cert for SSL, since that's tied to the domain... keep that in mind).
Once you know what language you're using, build the page... this will probably involve getting data out of the database and displaying some of it.
First, make sure your tables support whatever character set the languages will need. Then make your data design carefully. You need to make sure that any data in the database that will show up onscreen: product descriptions, category names, and ALSO prices (you probably have to give prices in various currencies, right?).
Building the page -- you'll need more PHP-specific advice here, but the idea is that you need to get text and possibly images that are language-specific for each page. The general choices are:
* Use a single PHP file for the content (e.g., a form for registration info), and get the text displayed from locale-specific files (so for the "name" label over that field, you'd grab the proper translation).
* Maintain a separate PHP file for the content in each language, plug the proper one into the template.
The first option is better if your content is mostly short bits of text -- but if there are larger chunks of text it gets hard to read (and if the whole page is text -- like a privacy policy page, etc. -- the second option may make more sense). Personally, I supported both options.
What else? Don't forget that formatting of currency, numbers, dates, and times will vary by locale. Don't forget to review any Flash animations, dropdown menus, popup calendars, etc.. these will need to support changes based on locale. Organize your resources carefully, so that a simple substitution in the path will get you the right image, content file, etc. (e.g., images/fr_CA/whatever.gif).
HTH.
One more thing (Score:3, Interesting)
Something that helps one does NOT always help the other -- for example, building the site in English, then making complete copies and translating all text into other languages is easy to develop, but quickly becomes a nightmare in maintenance... the customer wants a minor change and you have to update 10 files.
Just walk through quick scenarios for each option: I would do X to create and integrate this page,
Re:One more thing (Score:1)
and
These are very good points -- When you have a minor change in text, you've got to worry about getting it translated in to every language you support. Who's going to do the translations? Maybe I'm missing something obvious by not working at a huge corporation (like "we just send it to our Bejing and Madrid offices and they do the translations in to their local languages"), bu
Re:Some answers (Score:1)
Pet peeve: the locales reported by a browser isn't just a guess; it's the standard way for a user to tell what languages she prefers to read. I realize that web developers mostly ignore this, but IMHO not using it (in the absence of other information, of course) is a bug.
Re:Some answers (Score:2)
Of course, travellers are not the major
Re:Some answers (Score:2)
For example,
$1,999.00
might be formatted as
USD 1.999,00 in a different locale.
It is just me? (Score:2, Insightful)
Re:It is just me? (Score:1)
"I can't stand that a10n, i18n. I mean who thought that would be a good a10n? It bears no r9e to the o6l word. I think we can do b4r."
Re:It is just me? (Score:1, Offtopic)
It is just you (Score:4, Insightful)
The problem is you speak English. There is a good chance that you speak no other language. Since nearly everything is written in English first these days, you don't care about these issues.
Many of those who care about i18n do not speak English at all! To these people even spelling the word out gives no help. In fact it is less helpful because they have to learn this large symbol. (There is no reason to assume they even know the Latin alphabit, so they will not think to learn each letter separately)
Of those who speak English, many do not speak it fluently. Often they speak English as a first year student ("hello, my name is"), and they know how to look words up in their English-whatever dictionary.
Of course English is the dominate second language in the world. There are plenty of people who speak English fluently as a second language. They often have trouble with the creative spelling English came up with. Words with 20 letters are hard for anyone to spell, so it would be no surprise if they have trouble spelling it.
The goal is one symbol that is easy for everyone to recognize. No matter what language the page is written in, if you see "i18n", you know you are in a location where people are interested in translation. This is often enough for some educated clicking to find the same information in your language.
i18n may not be a good abbreviation. However can you come up with a way to represent the concept to all 6+billion people on earth?
Re:It is just you (Score:2)
I do speak a bit of French (not fluent by any standards). But you are right, I had never thought of the fact that others simply didn't understand what "internationalization" is. Seems I've been pretty humbled.
Re:It is just you (Score:2)
In English, that would be internationalisation.
Re:It is just you (Score:3, Funny)
I propose we use the locale-neutral word "internationali1ation".
Re:It is just you (Score:1)
Re:It is just you (Score:1)
Hardly an authority on the native language of England then, is it?
Re:It is just you (Score:1)
I thought "internationalisation" was a term more commonly used in the USA than in the UK. But The Cambridge Dictionary [cambridge.org] states the term is "internationalization", with "internationalisation" being a UK localism. The Oxford Engl [oed.com]
Re:It is just you (Score:3, Funny)
Unfortunately, "internaciigo" isn't necessarily an improvement.
Admittedly, it's shorter. However, the word is pronounced something like in-tehr-na-tsee-EE-go. Many people find the "ts" followed by two separately pronounced "i"'s, with overall word emphasis on the second, a bit hard to pronounce. And it sounds a bit like there's the word "nazi" in the middle, which means the thread is over.
Re:It is just you (Score:1, Insightful)
The grandparent poster complains about 'i18n' being a lousy abbreviation, and you give the world a six paragraph rant about cultural imperialism. This is rather like going off on communism because someone commented on the color of an object.
Seriously, it has numbers in it. Numbers!
(at this point, I start wanting to scream 'Those aren't even WORDS!!!! ED! ED! ED IS THE ST
Re:It is just you (Score:1)
Please mod this parent up to at least the level of the rant to which it applies.
Re:It is just you (Score:2)
The L33t spelling is just a bonus.....
Re:It is just you (Score:1)
That unpronouncable symbol Prince changed his name to was easily recognizable too, but that doesn't mean it served well as a name.
Re:It is just you (Score:2)
There is a very sound reason to put Arabic numerals in the word, it's easy to pick out no matter what language(s) you read. This isn't anything like Prince's name where he just made up a totally new symbol to get out of
Re:It is just you (Score:2)
Re:It is just you (Score:1)
Except for the whole unpronouncable symbol thing.
Re:It is just you (Score:1)
Re:It is just you (Score:1)
I don't consider "I-eighteen-n" to be much more pronouncable than "The artist formerly known as Prince". Hell, "internationalization" isn't very pronouncable either. Eight syllables in one word is too many.
Re:It is just you (Score:4, Interesting)
In IT, English holds the majority by far. And Spanish doesn't even come in second - You have Japanese and German as distant seconds, with Hebrew and French as dark-horse thirds.
Attempts at internationalization simply hinder the adoption of English as the next ubiquitous academic language. Much like Greek and Latin during the Roman empire - The rabble may all speak Spanish, but those who want to appear educated speak English. Of course, Latin later went on to hold the same place, so perhaps some day Spanish will function as the language of the academic elite.
Personally, I don't have great hope of us not blowing up the planet before then. So I code with English as my target language. Speak it, or don't use my programs, doesn't much matter to me
Many of those who care about i18n do not speak English at all!
I don't think that needs an exclamation mark - It doesn't come as a particular surprise to anyone. If you speak English, you don't have the least interest in "internationalization", which basically means "Make it accessible to people who don't speak English".
And I don't write this as a xenophobic rant... I regularly use programs written by Japanese coders, and a few in German. And do I sit around complaining about how those coders, who already have given me something I find useful, should do extra work unrelated to the purpose of the program to make those programs more friendly to me? No. I recognized my inability to read the menus and such as a shortcoming in myself, and made the effort to learn enough Japanese and German (albeit very little) to navigate those programs.
Or to put that another way - If Bill Gates only spoke Italian, a LOT more people would have learned at least a basic proficiency in it by now.
Re:It is just you (Score:2)
I agree that a system needs to have a 'base' language. The business case, requirements, design docs, and code/comments are usually better off being written 1 language.
However, if you are dealing with clients in multiple regions footing the bill for your project, it's also a good idea to think in terms of having the application support multiple languages in some type of modular way, so that th
Re:It is just you (Score:1, Interesting)
ciao
Re:It is just me? (Score:2)
Not just you. I actually followed the link to wiki to figure out where that damned thing started.
I've always assumed it was somehow l337 and supposed to match phonetically --- the fact that 18 is the number of omitted letters (according to wiki) makes me hate it as an abbreviation even more.
Not just you - but mostly (Score:2)
It's a pun! (Score:2)
I thought this was common knowledge, but no-one seems to have posted it yet while many people seem to be asking, so: it's a pun.
The word is written either "internationalisation" or "internationalization", depending on which English-speaking country you're in at the time, but both versions have 18 letters between the 'i' and the 'n'. As well as being shorter, "i18n" therefore works without adjustment in all En
How Are You Accomplishing Your i18n? (Score:5, Funny)
By p09g.
Have your looked at PEAR? (Score:5, Informative)
Easy way, using SQL (Score:3, Informative)
Adding a new language then just becomes a case of adding a new language ID to the system, and adding a new string becomes adding a string ID.
Any place that you want to generate an output string, simply insert a token which represents the string ID. Your translation code scans for the tokens, gets the current language from the environment, and then searches your strings table for the substitution string.
(For those who remember the Commodore PET computer, this is very similar to how it worked. The Print command, for example, was stored internally as a "?" token. It substituted when displaying.)
You do not need a table for the string IDs, an enumerated type would be sufficient to track what IDs are in use and what for. You WOULD want a table for the language, with the language ID as the key field (preferably as an enumerated type) and the font ID as the attribute. If you are not using fonts (eg: plain-text output) then again you can just use the enumerated type.
Because you would NOT be encoding font data into the string (NEVER, EVER do that, by the way, as you're just padding the data with redundant information, and introducing extra complexity), you can replace the font at will, provided it conforms to the mapping standards for international character sets.
This method fails for many things (Score:1)
Languages like English are SVO while other languages are SOV. Throw in a few extra grammar rules and a simple string substitution scheme becomes impossible because printf("%s %s %s", S, V, O); will simply not create correct strings for any language that uses a different ordering.
Re:This method fails for many things (Score:2)
Character direction is another problem, if you're going truly international, as some languages alternate between left-to-right and right-to-left on different lines. This is a problem, because you can't now just store a direction somewhere and use that t
Re:This method fails for many things (Score:1)
Too new? (Score:1)
Uh, its been around for a decade at least. Maybe a google search would help you.
i18nHTML (Score:3, Informative)
That's not just i18n. (Score:3, Insightful)
Simple (Score:3, Funny)
(I keed, I keeed...)
Surpise! (Score:3, Interesting)
You seem confused... (Score:2)
Internationalization is the process of adapting your program so that it can easily be made to work in any locale. Not hardcoding strings in english, not assuming 1 byte == 1 char, that kind of thing. A good i18n architecture makes localization much easier.
Re:You seem confused... (Score:2)
Re:You seem confused... (Score:2)
Re:You seem confused... (Score:2)
you are wrong on the account of what they mean, Senjutsu is actually accurate there.
Re: (Score:1)
UTF! (Score:3, Insightful)
Make sure your preferred editors really are saving UTF-8.
profanity, morality? (Score:1)
Re:profanity, morality? (Score:3, Insightful)
Because these issues will trip you up.
Particularly when using automatic translation (which is a bad idea anyway), something that is acceptable in your language may come out as something unacceptable in a different one. No matter how cheap you are trying to get by, you still need a someone to check profanity in your output. This is less a problem with human translators who will avoid the issue, but even still you should check because some translators will apply them thinking you won't know.
Morality i
Here's a link for Rails (Score:1, Informative)
http://manuals.rubyonrails.com/read/chapter/82 [rubyonrails.com]
In particular it links to the following:
http://www.quepublishing.com/articles/printerfrien dly.asp?p=328641&rl=1 [quepublishing.com]
Which is a very good discussion of characters sets in MySQL. I didn't realize it was so thorough. For instance you can have different character sets on tables, connections, and the server itself. Finally, it seems MySQL got something right.
Multilingual user interface (Score:2)
I created a multilingual user interface for a moderately complicated web application with a small number of users like this:
create an include directory 'lang' with language files for every language needed. In my case, two 'en.inc.php' for English and 'nl.inc.php' for Dutch. These files contain the strings for the interface in an associative array. Example:
'nl.inc.php' contains:
'en.inc.php' contains:
I use a session to store the desired language:
Re:Multilingual user interface (Score:3, Insightful)
Re:Multilingual user interface (Score:2)
Of course this only makes sense for horizontal text and maybe even only left-to-right at that. It's also bound to be wildly off in some cases. The reasoning I h
Re:Multilingual user interface (Score:2)
Some southeast Asian languages (e.g., Burmese and Khmer) stack letters vertically in certain cases, so they end up taking a lot more space. What I mean is
Re:Multilingual user interface (Score:2)
I found that translating some concepts gave strings of very different lengths. For example, some technical stuff became much longer strings in Spanish (maybe it was my translators). What do you do about the problem of the web forms getting messed up in different languages? My site is small enough to just test and adjust where necessary, but for a bigger site, this could be a problem.
In my case the number of different page templates (about 50) and languages (two, English and Dutch) was also small enough t
application-level (Score:2)
i18n is generally token language-set in a 1:n relationship.. Which maps nicely to table layouts, thus I don't see any need to create i18n support in the DB itself.
If you want some degree of abstraction, java p
Smarty + preparse plugin (Score:4, Informative)
Damien
The database tier (Score:2)
1. Define Widget and WidgetText, with all the I18N material moved to WidgetText. WidgetText is keyed on the Id from Widget and a Culture identifier. Every time you need a Widget, you JOIN to WidgetText based on the Id from Widget and the Culture identifier of the requesting user.
2. Add a Culture identifier column to your Widget table, and use that in your WHERE clause
Easy (Score:1)
PHP? MySQL?? (Score:2)
MySQL? The less said, the better.
gettext (Score:2)
Re:gettext (Score:1)
One of the many things you may need to do is... (Score:1)
Why? (Score:1, Troll)
If you cant speak/read English, then screw you.
Hell, if you arent an American, screw you. Even better.
Ya, mod me down. I dont care. Ill be the one laughing when your job is outsourced. You people cant hide from the truth forever.
Re:Just curious (Score:2)
phpBB (Score:1)
www.phpbb.com
Gettext and separate version. (Score:3, Insightful)
For large or potentially dynamic text l10n (eg entire content of pages, descriptions of products in a database, etc etc..) then you need to have 1 version for each language you are supporting (you COULD do it through gettext but it would be rather tedious). How you do that is of course 100% dependant on your application.
This is a pretty common task for OSS projects... (Score:2)
PHP + MySQL for I18N (Score:1)
As a number of people have mentioned, Internationalization and localization can be an incredibly complex process.
Since you are working with an existing system, you don't have the option of designing in I18N support from the very beginning.
Get a good book.
I recommend "XML Internationalization and Localization" by Yves Savourel, and "Beyond Borders web globalization strategies" by John Yunker. Both the authors have been in the I18N business a long time. They know what they are talking about.
Choose
Re: (Score:2)
Re:Too new? (Score:2)
And, for that matter, neither does C, or C++, or assembler. We can conclude from this that Unicode support is not possible, except perhaps in Java or Python.
The grandparent post was entirely correct to point out that this is not a new problem. People have been doing multibyte characters in all sorts of languages for a long time. I was even doing i18n in PHP in 1999.
Not having 'native support' for Unicode doesn't mean that you can't use Unico
Re:Too new? (Score:2)
Umm, I never said "Unicode is bytes". Unicode is a standard, Unicode is a consortium, Unicode is a registered trademark of Unicode Inc. Unicode is not bytes.
What I said was that Unicode strings are composed of bytes. A sequence of Unicode characters, under a particular encoding, is generally representable as a sequence of bytes.
I guess you were happy with FORTRAN character strings, t