Baidu Has a New Trick For Teaching AI the Meaning of Language (technologyreview.com) 12
Baidu, China's closest equivalent to Google, has achieved the highest score at the General Language Understanding Evaluation (GLUE) AI competition. What's notable about Baidu's achievement is that it illustrates how AI research benefits from a diversity of contributors. MIT Technology Review explains: GLUE is a widely accepted benchmark for how well an AI system understands human language. It consists of nine different tests for things like picking out the names of people and organizations in a sentence and figuring out what a pronoun like "it" refers to when there are multiple potential antecedents. A language model that scores highly on GLUE, therefore, can handle diverse reading comprehension tasks. Out of a full score of 100, the average person scores around 87 points. Baidu is now the first team to surpass 90 with its model, ERNIE.
Baidu's researchers had to develop a technique specifically for the Chinese language to build ERNIE (which stands for "Enhanced Representation through kNowledge IntEgration"). It just so happens, however, that the same technique makes it better at understanding English as well. [...] [T]he researchers trained ERNIE on a new version of masking that hides strings of characters rather than single ones. They also trained it to distinguish between meaningful and random strings so it could mask the right character combinations accordingly. As a result, ERNIE has a greater grasp of how words encode information in Chinese and is much more accurate at predicting the missing pieces. This proves useful for applications like translation and information retrieval from a text document. The researchers very quickly discovered that this approach actually works better for English, too. Though not as often as Chinese, English similarly has strings of words that express a meaning different from the sum of their parts. Proper nouns like "Harry Potter" and expressions like "chip off the old block" cannot be meaningfully parsed by separating them into individual words.
The latest version of ERNIE uses several other training techniques as well. It considers the ordering of sentences and the distances between them, for example, to understand the logical progression of a paragraph. Most important, however, it uses a method called continuous training that allows it to train on new data and new tasks without it forgetting those it learned before. This allows it to get better and better at performing a broad range of tasks over time with minimal human interference. Baidu actively uses ERNIE to give users more applicable search results, remove duplicate stories in its news feed, and improve its AI assistant Xiao Du's ability to accurately respond to requests. The researchers have described ERNIE's latest architecture in a paper that will be presented at the Association for the Advancement of Artificial Intelligence conference next year.
Baidu's researchers had to develop a technique specifically for the Chinese language to build ERNIE (which stands for "Enhanced Representation through kNowledge IntEgration"). It just so happens, however, that the same technique makes it better at understanding English as well. [...] [T]he researchers trained ERNIE on a new version of masking that hides strings of characters rather than single ones. They also trained it to distinguish between meaningful and random strings so it could mask the right character combinations accordingly. As a result, ERNIE has a greater grasp of how words encode information in Chinese and is much more accurate at predicting the missing pieces. This proves useful for applications like translation and information retrieval from a text document. The researchers very quickly discovered that this approach actually works better for English, too. Though not as often as Chinese, English similarly has strings of words that express a meaning different from the sum of their parts. Proper nouns like "Harry Potter" and expressions like "chip off the old block" cannot be meaningfully parsed by separating them into individual words.
The latest version of ERNIE uses several other training techniques as well. It considers the ordering of sentences and the distances between them, for example, to understand the logical progression of a paragraph. Most important, however, it uses a method called continuous training that allows it to train on new data and new tasks without it forgetting those it learned before. This allows it to get better and better at performing a broad range of tasks over time with minimal human interference. Baidu actively uses ERNIE to give users more applicable search results, remove duplicate stories in its news feed, and improve its AI assistant Xiao Du's ability to accurately respond to requests. The researchers have described ERNIE's latest architecture in a paper that will be presented at the Association for the Advancement of Artificial Intelligence conference next year.
Spamatron 9000 (Score:1)
You know the first use will be to get past spam detectors by saying the same thing gajillion different ways.
Re: (Score:3)
Re: (Score:2)
Doesn't spam filters already deal with that, like the deals for V14GR4. You don't get a better definition of adversarial attacker than that.
Re: (Score:2)
There are a lot of "betters" that come out of this - everything that requires language recognition benefits. That would also include artificial languages - including programming languages.
Chinese is highly contectual (Score:2)
Re: (Score:2)
This I a much more impressive concept in Chinese, where verbs and nouns can change meaning by placement or topic.
Well, bullshit doesn't generally mean shit from a bull. Not to disparage what they've done but dealing with ambiguous words, euphemisms and idiomatic usage is pretty standard if you want to get into natural language processing. There's a reason why computer code, legalese etc. that's trying for high precision are practically unreadable for the layperson.
classic example (Score:5, Interesting)
one classic example for this test is feed in these two sentences
"I tried to put my coat in the suitcase, but it was too small."
"I tried to put my coat in the suitcase, but it was too big."
and ask the computer what 'it' refers to in each sentence.