Automatic Translation Without Dictionaries

Automatic Translation Without Dictionaries 115

Posted by Soulskill on Saturday September 28, 2013 @05:42PM from the baby-steps-to-the-universal-translator dept.

New submitter physicsphairy writes "Tomas Mikolov and others at Google have developed a simple means of translating between languages using a large corpus of sample texts. Rather than being defined by humans, words are characterized based on their relation to other words. For example, in any language, a word like 'cat' will have a particular relationship to words like 'small,' 'furry,' 'pet,' etc. The set of relationships of words in a language can be described as a vector space, and words from one language can be translated into words in another language by identifying the mapping between their two vector spaces. The technique works even for very dissimilar languages, and is presently being used to refine and identify mistakes in existing translation dictionaries."

Automatic Translation Without Dictionaries

This discussion has been archived. No new comments can be posted.

Search 115 Comments Log In/Create an Account

Comments Filter:

Darmok and Jalad at Tanagra (Score:4, Interesting)

by Vanders ( 110092 ) writes: on Saturday September 28, 2013 @06:03PM (#44981675) Homepage

Finally, the team point out that since the technique makes few assumptions about the languages themselves, it can be used on argots that are entirely unrelated.
Once again, Star Trek is ahead of the curve.

Hofstadter? Isn't this AI, not translation? (Score:5, Interesting)

by Etcetera ( 14711 ) writes: on Saturday September 28, 2013 @06:11PM (#44981733) Homepage

Reminds me a lot of the Fluid Concepts and Creative Analogies [amazon.com] work that Hofstadter led back in the day.
I don't see this directly working for translation into non-lexographically swappable languages (eg, English -> Japanese) very well, because even if you have the idea space mapped out, you'd still have to build up the proper grammar, and you'll need rules for that.
That being said.... Holy cow, you have the idea space mapped out! That's a big chunk of Natural Language Processing and an important step in AI development. ... Understanding a sentence emergently in terms of fuzzy concepts that are an internal and internally created symbol of what's "going on", not just using a dictionary and CYC [wikipedia.org]-like rules to figure it out, seems like a useful building block, but maybe I'm wrong.
Very cool stuff. Makes me want to go back and finish that CS degree after all.

Old idea, new implementation? (Score:5, Interesting)

by Theovon ( 109752 ) writes: on Saturday September 28, 2013 @07:00PM (#44982003)

When I was in grad school, studying linguistics, compitational linguistics, and automatic speech recognition, I recall it mentioned more than once the idea of using latent semantic analysis and such to do this kind of translation. So am I correct in assuming that this hasn't been done well in the past, and Google finally made it work well because they have larger corpora of translated texts?

Re:the spirit is willing but the flesh is weak (Score:4, Interesting)

by icebike ( 68054 ) writes: on Saturday September 28, 2013 @07:17PM (#44982083)

Yes, the pretty vectors (nothing but lists of words) still have to be assembled by humans for the most part. Maybe not EVERY association, but enough of them such that you can build relationships and associations in-directly, and achieve a round-about translation, even if you end up having to go through 2 or 3 related languages to get there.
After a few words of context are translated you can, perhaps deduce the rest. But the idea you can do so without a dictionary is ridiculous. And putting your dictionary into digital forms and calling it a vector doesn't change the fact that you still have a dictionary associating an english word with a french word and a Mandarin word.

Re:Hofstadter? Isn't this AI, not translation? (Score:5, Interesting)

by phantomfive ( 622387 ) writes: on Saturday September 28, 2013 @07:18PM (#44982089) Journal

I don't see this directly working for translation into non-lexographically swappable languages (eg, English -> Japanese) very well, because even if you have the idea space mapped out, you'd still have to build up the proper grammar, and you'll need rules for that.
According to the paper, this translation technique is only for translating words and short phrases. But it seems to work well for languages as far apart as English and Vietnamese.

Like so many of these algorithms (Score:4, Interesting)

by holophrastic ( 221104 ) writes: on Saturday September 28, 2013 @08:21PM (#44982427)

They do a great job of improving the precision of what used to be mediocre. And then, as a direct result, they not only make the errors worse, they make the errors undetectable.
CAT: small, furry, pet.
BIG CAT: big, furry, pet.
Um. Both are orange. One's a tabby. One's a tiger.
It's not good enough that your translation system has a 99% accuracy whereas the old one has a 90% accuracy. What matters is that the old one's 10% error rate sounded like an error (e.g. tiger becomes monster), whereas your new one's 1% passes the turing test and can't be discerned by an intelligent listener (e.g. tiger becomes tabby).
"My friend owns a monster." -- You friend owns what? I don't think you meant a monster. -- "eh, you know, a very big dangerous jungle cat" -- oh, like a lion -- "not a lion, it has stripes" -- oh, a tiger.
"My friend owns a tabby." -- Ok.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Automatic Translation Without Dictionaries 115

Automatic Translation Without Dictionaries More Login

Automatic Translation Without Dictionaries

Darmok and Jalad at Tanagra (Score:4, Interesting)

Hofstadter? Isn't this AI, not translation? (Score:5, Interesting)

Old idea, new implementation? (Score:5, Interesting)

Re:the spirit is willing but the flesh is weak (Score:4, Interesting)

Re:Hofstadter? Isn't this AI, not translation? (Score:5, Interesting)

Like so many of these algorithms (Score:4, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot