Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Technology Science

Automatic Translation Without Dictionaries 115

New submitter physicsphairy writes "Tomas Mikolov and others at Google have developed a simple means of translating between languages using a large corpus of sample texts. Rather than being defined by humans, words are characterized based on their relation to other words. For example, in any language, a word like 'cat' will have a particular relationship to words like 'small,' 'furry,' 'pet,' etc. The set of relationships of words in a language can be described as a vector space, and words from one language can be translated into words in another language by identifying the mapping between their two vector spaces. The technique works even for very dissimilar languages, and is presently being used to refine and identify mistakes in existing translation dictionaries."
This discussion has been archived. No new comments can be posted.

Automatic Translation Without Dictionaries

Comments Filter:
  • by icebike ( 68054 ) on Saturday September 28, 2013 @06:23PM (#44981799)

    Firefox had nothing to do with it.
    It was PEBCAK, pure and simple.

  • Re:Cat (Score:4, Insightful)

    by blue trane ( 110704 ) on Saturday September 28, 2013 @07:21PM (#44982107) Homepage Journal

    jazz musician

  • by hey! ( 33014 ) on Saturday September 28, 2013 @07:26PM (#44982131) Homepage Journal

    Simply because you embed your dictionary in something you choose to call a vector doesn't make it any less of a dictionary.

    True, but calling a dictionary a vector space doesn't make it so. For example how "close" are the definitions of "happiness" and "joy"? In a dictionary, the only concept of "closeness" is the lexical ordering of the word itself, and in that sense "happiness" and "joy" are quite far apart (as far apart as words beginning h-a are from words beginning with j-o are in the dictionary). But in some kind of adjacency matrix which show how often these words appear in some relation to other words, they might be quite close in vector-space; "guilt" and "shame" might likewise be closer to each other than either is from "happiness", and each of the four words ("happiness", "joy", "guilt", "shame") would be closer to any other of those words than they would be to "crankshaft"; probably close to "crankshaft" (a noun) than they'd be to "chewy" (an adjective).

    Anyhow, if you'd read the paper, at least as far as the abstract, you'd see that this is about *generating* likely dictionary entries for unknown words using analysis of some corpus of texts.

All the simple programs have been written.

Working...