Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Technology Science

Automatic Translation Without Dictionaries 115

New submitter physicsphairy writes "Tomas Mikolov and others at Google have developed a simple means of translating between languages using a large corpus of sample texts. Rather than being defined by humans, words are characterized based on their relation to other words. For example, in any language, a word like 'cat' will have a particular relationship to words like 'small,' 'furry,' 'pet,' etc. The set of relationships of words in a language can be described as a vector space, and words from one language can be translated into words in another language by identifying the mapping between their two vector spaces. The technique works even for very dissimilar languages, and is presently being used to refine and identify mistakes in existing translation dictionaries."
This discussion has been archived. No new comments can be posted.

Automatic Translation Without Dictionaries

Comments Filter:
  • by Vanders ( 110092 ) on Saturday September 28, 2013 @06:03PM (#44981675) Homepage

    Finally, the team point out that since the technique makes few assumptions about the languages themselves, it can be used on argots that are entirely unrelated.

    Once again, Star Trek is ahead of the curve.

  • by Etcetera ( 14711 ) on Saturday September 28, 2013 @06:11PM (#44981733) Homepage

    Reminds me a lot of the Fluid Concepts and Creative Analogies [amazon.com] work that Hofstadter led back in the day.

    I don't see this directly working for translation into non-lexographically swappable languages (eg, English -> Japanese) very well, because even if you have the idea space mapped out, you'd still have to build up the proper grammar, and you'll need rules for that.

    That being said.... Holy cow, you have the idea space mapped out! That's a big chunk of Natural Language Processing and an important step in AI development. ... Understanding a sentence emergently in terms of fuzzy concepts that are an internal and internally created symbol of what's "going on", not just using a dictionary and CYC [wikipedia.org]-like rules to figure it out, seems like a useful building block, but maybe I'm wrong.

    Very cool stuff. Makes me want to go back and finish that CS degree after all.

  • by Theovon ( 109752 ) on Saturday September 28, 2013 @07:00PM (#44982003)

    When I was in grad school, studying linguistics, compitational linguistics, and automatic speech recognition, I recall it mentioned more than once the idea of using latent semantic analysis and such to do this kind of translation. So am I correct in assuming that this hasn't been done well in the past, and Google finally made it work well because they have larger corpora of translated texts?

  • by icebike ( 68054 ) on Saturday September 28, 2013 @07:17PM (#44982083)

    Yes, the pretty vectors (nothing but lists of words) still have to be assembled by humans for the most part. Maybe not EVERY association, but enough of them such that you can build relationships and associations in-directly, and achieve a round-about translation, even if you end up having to go through 2 or 3 related languages to get there.

    After a few words of context are translated you can, perhaps deduce the rest. But the idea you can do so without a dictionary is ridiculous. And putting your dictionary into digital forms and calling it a vector doesn't change the fact that you still have a dictionary associating an english word with a french word and a Mandarin word.

  • by phantomfive ( 622387 ) on Saturday September 28, 2013 @07:18PM (#44982089) Journal

    I don't see this directly working for translation into non-lexographically swappable languages (eg, English -> Japanese) very well, because even if you have the idea space mapped out, you'd still have to build up the proper grammar, and you'll need rules for that.

    According to the paper, this translation technique is only for translating words and short phrases. But it seems to work well for languages as far apart as English and Vietnamese.

  • by holophrastic ( 221104 ) on Saturday September 28, 2013 @08:21PM (#44982427)

    They do a great job of improving the precision of what used to be mediocre. And then, as a direct result, they not only make the errors worse, they make the errors undetectable.

    CAT: small, furry, pet.
    BIG CAT: big, furry, pet.

    Um. Both are orange. One's a tabby. One's a tiger.

    It's not good enough that your translation system has a 99% accuracy whereas the old one has a 90% accuracy. What matters is that the old one's 10% error rate sounded like an error (e.g. tiger becomes monster), whereas your new one's 1% passes the turing test and can't be discerned by an intelligent listener (e.g. tiger becomes tabby).

    "My friend owns a monster." -- You friend owns what? I don't think you meant a monster. -- "eh, you know, a very big dangerous jungle cat" -- oh, like a lion -- "not a lion, it has stripes" -- oh, a tiger.

    "My friend owns a tabby." -- Ok.

The Macintosh is Xerox technology at its best.

Working...