Automatic Translation Without Dictionaries

Automatic Translation Without Dictionaries 115

Posted by Soulskill on Saturday September 28, 2013 @05:42PM from the baby-steps-to-the-universal-translator dept.

New submitter physicsphairy writes "Tomas Mikolov and others at Google have developed a simple means of translating between languages using a large corpus of sample texts. Rather than being defined by humans, words are characterized based on their relation to other words. For example, in any language, a word like 'cat' will have a particular relationship to words like 'small,' 'furry,' 'pet,' etc. The set of relationships of words in a language can be described as a vector space, and words from one language can be translated into words in another language by identifying the mapping between their two vector spaces. The technique works even for very dissimilar languages, and is presently being used to refine and identify mistakes in existing translation dictionaries."

Automatic Translation Without Dictionaries

This discussion has been archived. No new comments can be posted.

Search 115 Comments Log In/Create an Account

Comments Filter:

Re:Sounds good, but we need a robust plug (Score:4, Insightful)

by icebike ( 68054 ) writes: on Saturday September 28, 2013 @06:23PM (#44981799)

Firefox had nothing to do with it.
It was PEBCAK, pure and simple.

Re:Cat (Score:4, Insightful)

by blue trane ( 110704 ) writes: on Saturday September 28, 2013 @07:21PM (#44982107) Homepage Journal

jazz musician

Re:Summary wrong (again) (Score:5, Insightful)

by hey! ( 33014 ) writes: on Saturday September 28, 2013 @07:26PM (#44982131) Homepage Journal

Simply because you embed your dictionary in something you choose to call a vector doesn't make it any less of a dictionary.
True, but calling a dictionary a vector space doesn't make it so. For example how "close" are the definitions of "happiness" and "joy"? In a dictionary, the only concept of "closeness" is the lexical ordering of the word itself, and in that sense "happiness" and "joy" are quite far apart (as far apart as words beginning h-a are from words beginning with j-o are in the dictionary). But in some kind of adjacency matrix which show how often these words appear in some relation to other words, they might be quite close in vector-space; "guilt" and "shame" might likewise be closer to each other than either is from "happiness", and each of the four words ("happiness", "joy", "guilt", "shame") would be closer to any other of those words than they would be to "crankshaft"; probably close to "crankshaft" (a noun) than they'd be to "chewy" (an adjective).
Anyhow, if you'd read the paper, at least as far as the abstract, you'd see that this is about *generating* likely dictionary entries for unknown words using analysis of some corpus of texts.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Automatic Translation Without Dictionaries 115

Automatic Translation Without Dictionaries More Login

Automatic Translation Without Dictionaries

Re:Sounds good, but we need a robust plug (Score:4, Insightful)

Re:Cat (Score:4, Insightful)

Re:Summary wrong (again) (Score:5, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot