Google's Computing Power Refines Translation 142
gollum123 sends an excerpt from the NY Times on how Google has taken a lead in language translation, in one of the company's few unqualified successes as it attempts to broaden its offerings beyond search. "...Google's quick rise to the top echelons of the translation business is a reminder of what can happen when Google unleashes its brute-force computing power on complex problems. The network of data centers that it built for Web searches may now be, when lashed together, the world's largest computer. Google is using that machine to push the limits on translation technology. Last month, for example, it said it was working to combine its translation tool with image analysis, allowing a person to, say, take a cellphone photo of a menu in German and get an instant English translation. ...in the mid-1990s, researchers began favoring a so-called statistical approach. They found that if they fed the computer thousands or millions of passages and their human-generated translations, it could learn to make accurate guesses about how to translate new texts. It turns out that this technique, which requires huge amounts of data and lots of computing horsepower, is right up Google's alley. ...Google's service is good enough to convey the essence of a news article, and it has become a quick source for translations for millions of people."
Not from NY Times (Score:3, Informative)
Last week's The Economist adressed this issue (http://www.economist.com/specialreports/displaystory.cfm?story_id=15557431). NY Times recycled it
Similar languages (Score:3, Informative)
Sure, you might get something decent if you try to translate from English to German, but what about languages with entirely different thought models behind them, like Chinese or Hungarian? Last time I tried using it, it confused "has been" with "Latvian".
Re:Why is machine translation so difficult? (Score:3, Informative)
Is it really that difficult to come up with a set of rules so things are worded correctly?
Yes.
Longer answer - computers are very bad at context and meaning. Take French to English - it would be one thing if words had the same exact connotations and grammar, and you could just do a find-replace. But, unfortunately, that's not the case. There are many words in French that - depending on context - have many different meanings. In mathematical terms, the mapping of French words to English words is not bijective, nor vice-versa. Take the French word bete - it most literally means "beast", but is often used to mean "stupid". How is a computer supposed to figure out which one to use?
I just checked and Google Translate actually gets the connotation right, but it's a relatively simple example. Consider the French word "baise" - either kiss or fuck - and a more complicated example. Now... Google gets this right too (creepy!)
In any case, the only to get perfect translation is to make the computer understand the relevant meanings and connotations of words and stylistic choices... How would you convey a Cockney accent, or Cockney phrasing, in Chinese? In short, you'd need an AI.
Re:Asian languages and vastly different grammar (Score:3, Informative)
Russian, Polish and Ukrainian translations are laughable as well.
Even UkrainianRussian translation is mediocre, even though it's pretty trivial (other translators have almost 100% perfect translations).
So, good job but still lots to do.
Re:Converting that article from English to Chinese (Score:3, Informative)
Exhibit A: http://winterson.com/2005/06/episode-iii-backstroke-of-west.html [winterson.com]
Translation is hard for people. (Score:3, Informative)
But translation isn't easy for humans, so there's no reason to expect it should be easy for computers.
Translating from one language to another, for a human translator, basically comes down to this:
But the problem is that there is never unique "equivalent" text in the target language, but rather, a lot of alternatives that make different tradeoffs. This is because a foreign language is part of a foreign culture that has many concepts that are foreign to the source language, and likewise, the source language is part of a source culture that is foreign to the target language. So translators repeatedly find themselves in situations where either they must leave out something that the source text says or implies, or else say something unnatural in the target language to convey that information.
Comparing the grammar of dramatically different languages makes this really clear. For example, many languages have grammatical evidentiality [wikipedia.org], where statements are subject to grammatical rules that depend on the source of the speaker's information for the statement. So for example, a language where the equivalent to the sentence "Joe kicked Tom" required the verb to be conjugated differently depending on whether the speaker saw Joe kick Tom or heard so. If you had to translate an English text to a language like that, you'd have to decide, for each clause in the English text, who is the speaker of the sentence, and whether they know the event first-hand or second-hand, and either of those may often be unclear from the English text.
In the converse case, imagine if we're translating from a language like that into English. Then every sentence in the source language encodes some claim about how the speaker knows the information conveyed in that sentence. A completely literal translation, in which every English sentence had that information, would be extremely unnatural English writing. Leaving it out of every single sentence, on the other hand, might leave out something important to understand the text in some cases. So the translator has to decide in which cases the evidential conjugations of the source language must be translated into a longer English sentence than otherwise necessary.
This is one extreme example, but this sort of problem occurs at every level in translation. Translators often find themselves adding in information that the source text doesn't say, having to use circumlocutions in the target language to express really simple things from the source language, leaving out information from the source text has because it would be too cumbersome to phrase it in the target language, adopting strange conventions in the target language, or having to write supplementary materials to help the readers understand the translation (footnotes, introductions).
Or in a few cases, the translators write for people who don't know the source language but are familiar with some of the customs and concepts, or willing to learn them to understand the translation, and then they just leave untranslated words in. (Examples: lots of philosophy translations from German or French; anime fansubs that leave Japanese honorifics like -san or -sempai in, because the people who use them are anime fans, are at least a bit familiar with them, and actually understand more nuances that way.)
So, translation is not a mechanical task, and thus, there can't be a simple set of rules to do it. It's, as I said at the top, understanding a text in the source language, and writing another in the target language, tailored toward a different audience. And it requires understanding the audiences of the original text and the translation, and making many informal decisions based on that.