Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Google Businesses The Internet

Coming Soon, The Google Translator 418

compuglot writes "Google gave journalists a glimpse of its next generation machine translation system at a May 19th Google Factory Tour. "Google Blogoscoped" offers an excellent overview of the presentation. The system has been trained using the United Nations Documents as a corpus. This corpus is some 20 billion words worth of content. It uses existing source and target language translations (done by human translators at the U.N.) to find patterns it then uses to build rules for translating between those languages. Apparently it was successful where the current version had failed in translating certain phrases. If anyone were capable of making a serious go of MT, that would have to be Google."
This discussion has been archived. No new comments can be posted.

Coming Soon, The Google Translator

Comments Filter:
  • by gowen ( 141411 ) <gwowen@gmail.com> on Tuesday May 31, 2005 @10:18AM (#12683710) Homepage Journal
    If anyone were capable of making a serious go of MT, that would have to be Google.
    Erm... why is that? Is it because machine translation in some sense search technology? Because they've hired reknowned experts in natural language processing? Because they've got a lot of money slushing around and employ a lot of generally smart people?

    Oh, no. It's because geeks like Google. Therefore, Google are capable of superhuman feats that mere scientists -- those with years of experience in relevant fields -- are incapable of doing.
  • by Shotgun ( 30919 ) on Tuesday May 31, 2005 @10:23AM (#12683753)
    If your blog sounds like a politician giving a speech at the UN, this service will do a wonderful job. Doubtful that it will do any better that Babelfish otherwise.

    The biggest problem in artificial intelligence is that the system learns the material that it is trained to, and only that material. Computers don't generalize or extrapolate the known into the unknown worth a damn.
  • by metlin ( 258108 ) on Tuesday May 31, 2005 @10:23AM (#12683757) Journal
    While Google's existing translator and Altavista's Babelfish are good, they do not help in the translation of several other languages.

    That would be a really good benefit - for instance, I wanted something translated to and fro from Svensk (Swedish), but I really couldn't find any translation service that did.

    Good translation of the more common languages would be nice, but simple translations, even - of a variety of languages would be really useful.
  • by stevejsmith ( 614145 ) on Tuesday May 31, 2005 @10:26AM (#12683778) Homepage
    No, it's because Google has tons of talent, money, already-archived text to work with, computers, respect in the industry, and consumer base. I can't think of a company that possesses these characteristics more so than Google.
  • T.Q. (Score:5, Insightful)

    by moviepig.com ( 745183 ) on Tuesday May 31, 2005 @10:29AM (#12683807)
    The system has been trained using the United Nations Documents as a corpus.

    Seems one could devise a TQ (tranlsation quotient) measuring the effectiveness of machine (or human) translators. Take any standard reading-comprehension test, a send its text material through the translator, and back ...and then compare the scores of subjects taking the resulting test vs. those taking the original.

    (Before such translators make their way into, say, diplomatic circles, I'd sure hope there's some objective demonstration of near-infallibility...)

  • by gowen ( 141411 ) <gwowen@gmail.com> on Tuesday May 31, 2005 @10:30AM (#12683821) Homepage Journal
    Well, (oh dear, here comes the Flamebait mod again), I'd argue that Microsoft has more of all of those, with the possible exception of "respect in the industry." As does IBM, Dell, Cisco ... and any number of other well established, Blue Chip IT companies.

    Furthermore, Google's ideas are not new. People have doing things like this for years. But here on slashdot, a google press release about their latest software which doesn't even exist yet gets treated like the announcement of an earth shattering invention.
  • by benjcurry ( 754899 ) on Tuesday May 31, 2005 @10:40AM (#12683915) Homepage
    Oh, come on! It's because in the past, most of what Google has undertaken has been enormously successful and useful. Yeah, they hire alot of smart people and have lots of money. Gmail (IMO) is the golden standard of free webmail. Google Maps (IMO) is the best map system out there. They also are responsable for Adsense, Adwords and I think they even have a search engine that gets a good amount of hits per diem. Maybe there is a reason to think this translation thingamabob will be good!
  • Re:fascinating (Score:5, Insightful)

    by elrous0 ( 869638 ) on Tuesday May 31, 2005 @10:42AM (#12683935)
    or at the least pick out pertinent words like "bomb."

    Why do I have a funny feeling that this research isn't being funded by philanthropic foundations?

    -Eric

  • by gowen ( 141411 ) <gwowen@gmail.com> on Tuesday May 31, 2005 @10:47AM (#12683973) Homepage Journal
    Really? Google search is great, and Gmail's a adequate front end attached to a webmail system whose sole selling point is the massive amount of storage space.

    But have you seen the monstrosity when that front end got belted onto the deja Usenet archive? Google Maps is usable, but it's hardly ground breaking.

    And other than those things, exactly what hype have google delivered on?
  • by imroy ( 755 ) <imroykun@gmail.com> on Tuesday May 31, 2005 @10:56AM (#12684069) Homepage Journal
    Erm... why is that?

    Because Google has shown that it knows how to handle large amounts of human-created content and create useful information from it. The search engine was just the start. Just look at the spell checker they added. It doesn't use a dictionary, just the mass of web pages they spider monthly. It's not always perfect, but it allows it to be more adaptive than other methods. This translator looks like something similar along those lines.

  • Re:fascinating (Score:4, Insightful)

    by MindStalker ( 22827 ) <mindstalker@[ ]il.com ['gma' in gap]> on Tuesday May 31, 2005 @10:59AM (#12684089) Journal
    Well the bible is hebrew, greek and latin. There are no outdated English phrases in the Bible. Now if your refering to the King James translation of the bible, obviously such would be good for teaching google Old English but not modern english. You would need a much newer translation that doesn't use old phrases. Such do exist btw.
  • Re:if anyone... (Score:2, Insightful)

    by rca66 ( 818002 ) on Tuesday May 31, 2005 @11:01AM (#12684108)
    Actually, my bet for most likely to make a real go of machine translation would be... IBM

    They already did it. Several years ago. You can get it with Websphere and offsprings are sold under different labels.

    Look how far they ran with chess programs, because they felt like it...

    Chess is trivial compared to the task of translation. You can not compare these two problems.

  • by rca66 ( 818002 ) on Tuesday May 31, 2005 @11:13AM (#12684213)
    If you'll read the article, you'll find that the translator properly translated a fairly complicated phrase from Arabic to English.

    For each existing MT system you can find fairly complicated sentences which translate ok.

    I'd guess that this service is, from a technical standpoint, at least 95% done -it's just the packaging and touching-up that needs to be done.

    "Technial standpoint" you mean, the system is able to translate arbritrary text? Maybe. Or do you mean the system is able to translate arbritrary text into semantically correct text in the target language? Highly unlikely. People are trying this vor decades now. And other companies and institutes have smart people too.

  • by mOdQuArK! ( 87332 ) on Tuesday May 31, 2005 @11:16AM (#12684239)
    I am a translator,

    Well, if their service is free and works well (not necessarily perfectly), you now have a tool which should let you translate that entire book in about a week (assuming most of the week will be spent checking the translation & preserving the "flavor" of the source).

  • except, no. (Score:4, Insightful)

    by mattdm ( 1931 ) on Tuesday May 31, 2005 @11:20AM (#12684272) Homepage
    "It's all just observation," Yarowsky adds. "Children do the same thing, but they also do it through visual stimulation and feedback. They see a book and hear the word 'book,' and eventually they learn that it's a book. They see a bird with its wings flapping around and learn that is called a bird. It's the same with machines, only they have much better memories. Computers could remember exactly when and where they saw the words bird and book."

    Except, no. Humans are basically generalization machines. Babies are able to grasp very quickly that words apply to categories of things -- not just that a *specific* item is a bird or a book, but to learn "I know a bird when I see it", even without necessarily being able to provide a scientific definition. Computers can be built to emulate this ability, but learning word-to-word mappings isn't *nearly* the same as learning abstract concepts and which words apply to them.
  • Re:Two thoughts (Score:2, Insightful)

    by Secret Agent 99 ( 855215 ) on Tuesday May 31, 2005 @11:23AM (#12684292)
    If they use UN documents as a guide, the Google MT engine will be excellent at translating bureaucratese between languages. I'm not sure if that's a good thing!

    Exactly. And the UN surely has fairly rigorous QA processes for its translations. Now try expanding the corpus with more translated copy.

    In addition to feeding the system with translations that haven't been through formal QA (in many but not all cases), you also are now feeding it copy that has not had all the style deliberately squeezed out of it for easy translatability. (Which is the way they write in bi- and multilingual bureaucracies.)

    If and when MT can handle that situation, I'll be impressed. But a "bureaucratese" translator seems like a much smaller challenge to me, relatively speaking.
  • Re:fascinating (Score:4, Insightful)

    by Bigman ( 12384 ) on Tuesday May 31, 2005 @11:35AM (#12684407) Homepage Journal
    Don't forget that many works of fiction are translated into several languages. The only problem with that is persuading the copyright holders to permit their use in training computer translation systems. I'm not sure where you would stand with this legally (After all, IANAL!), so I suspect this is why Google has been using the UN documents. I would imagine these are effectively public domain; and if not, I would imagine the UN would see a reliable machine translation project worth supporting. The only downside I can see is that the UN texts are unlikely to have many idioms or colloqualisms, which would limit the resulting translators usefulness in a more general context.
  • by BullfrogJones ( 572383 ) on Tuesday May 31, 2005 @11:46AM (#12684509)
    One serious problem I see with the 'matching source' method is that it's rare to find two sources that truly match. Movies are a great example - as a native English speaker that lived for 5 years in Spain, I can attest to the fact that the translations provided by the movie studios (used for subtitles in the theater and also for DVDs) are problematic on many levels.

    It's not enough to recognize a given word in language A is such and such word in language B, and not even enough to do the same with idiomatic phrases such as 'His bark is worse than his bite' (Mucho ruido, pocas nueces in Spanish, literally 'Lots of noise but few pecans').

    The problem is that the content itself is sometimes changed in translation. Cultural differences, pop culture references, names and places are all changed liberally when creating movie subtitles. This is something that it is easy enough for a bilingual human to notice and disregard, but how is a computer to know what to keep and what to disregard when comparing the supposedly matching sources.

    Choice of source material is extremely important here, and probably explains why they are starting with UN documents, a formal, business-like body of text with presumably less room for content differences. Unfortunately, the fact that movie translations cannot easily be used means that much of what we humans find amusing about bad babelfish translation (literal translation of slang, etc...) will continue to plague us for some time to come.

  • Re:Piffle (Score:3, Insightful)

    by gordo3000 ( 785698 ) on Tuesday May 31, 2005 @11:48AM (#12684542)
    neither their computing power nor their cash is anything to be in awe over. Neither are truly top contenders when it comes to the computing industry, unless you take the time to wonder why this is impressive.

    Remember, almost all of those servers are needed for what they are currently working on, sot hey don't really have anywhere near that kind of computing power. I would be willing to bet that if they threw every free cycle at this, they have closer to 20%. Further, most of these servers are for moving data around, from what I have read, almost none of them are high end number crunchers(what is needed for MT).

    They have no where near the cash of a lot of the big fish in the computer industry, so don't think they can out muscle people like Intel or IBM, much less the true heavy weight, Miscrosoft

    http://www.cbsnews.com/stories/2004/12/22/national /main662452.shtml [cbsnews.com]
    (just to give you an idea as to how much cash MS can use to crush competitors, its not an issue of can't, it's an issue of not wanting to)

    What makes Google's situation unique is they are in the best position to do this stuff in the group of companies that actually care to spend time on this project. This is the impressive part. A company that thrives on free material doing something so complex. things like google map are incredibly simple and only involve indexing available information. that isn't what this is by any means, this is a company attempting to break out of an niche and enter into the more revolutionary side of computing( I have yet to see a service by google that is actually this).

    I'm interested to see how much hype it all is. Hopefully, I can give it some tough japanese and see how well it holds up when it goes beta(the enternal state of any google project).
  • Re:Piffle (Score:2, Insightful)

    by gowen ( 141411 ) <gwowen@gmail.com> on Tuesday May 31, 2005 @11:51AM (#12684569) Homepage Journal
    What other application development group would you say has a better chance of creating a better MT system?
    IBM? Remember them? They probably spend more money on Blue Sky thinking than Google's entire research budget. Ever heard of Deep Blue, the computer that beat Garry Kasparov? Wasn't made by Google.

    Now, which do you imagine is closer to the sort of non-linear processing needed to do machine translation... playing chess, or cross referencing an enormous lookup table of ZIP codes?
  • by Chyeburashka ( 122715 ) on Tuesday May 31, 2005 @12:25PM (#12684904) Homepage
    I smiled when I read this recent headline:

    Clinton tours devastated Bandeh Aceh.

    Of course, I knew what the writer really meant. But the Bable Fish translation into French produces exactly the meaning which I first parsed when reading that headline.

    Les excursions de Clinton ont dévasté Bandeh Aceh.

    If machine translation become more common, perhaps English writers will have to be a little more careful.

  • Re:fascinating (Score:2, Insightful)

    by cicho ( 45472 ) on Tuesday May 31, 2005 @12:40PM (#12685029) Homepage
    Project Gutenberg has plenty of translations into English, but not other target languages, it seems. And given the nature of copyright, do you want modern machine translation to read like 19th century prose?
  • by grahamsz ( 150076 ) on Tuesday May 31, 2005 @01:08PM (#12685323) Homepage Journal
    I was wrong about the french. However the spanish NVI appears to parallel the NIV, and i'd imagine would be pretty good candidates for this sort of analysis.

    http://www.booksofthebible.com/p2390.html [booksofthebible.com]

    I believe it's key that in the situation of

    Ancient Lang A -> Modern Lang B -> Modern Lang C

    that B and C will be far closer than

    Ancient Lang A -> Modern Lang B
    Ancient Lang A -> Modern Lang C

  • by rca66 ( 818002 ) on Tuesday May 31, 2005 @01:45PM (#12685666)
    While they may have the "smart people" you speak of, none of them have applied those people appropriately.

    Oh come on! Google may be a great company, but to say it is the first in the history of mankind which is able to motivate its employees or make them being productive is a very strange remark. I don't think I am exaggerating when I say: everything what Google achieved up to now is trivial compared to the problem of translation of human language. If one looks at their ranking, their indexing, G-Mail and so on: the complexity of those tasks is order of magnitudes below the problem of handling human language.

  • Re:fascinating (Score:1, Insightful)

    by Anonymous Coward on Tuesday May 31, 2005 @01:49PM (#12685704)
    Any organization that can prevent said "bomb" from being used is the best philanthropic foundation I can imagine.

Stellar rays prove fibbing never pays. Embezzlement is another matter.

Working...