Google Plans Giant AI Language Model Supporting World's 1,000 Most Spoken Languages (theverge.com) 35
Google has announced an ambitious new project to develop a single AI language model that supports the world's "1,000 most spoken languages." The Verge reports: As a first step towards this goal, the company is unveiling an AI model trained on over 400 languages, which it describes as "the largest language coverage seen in a speech model today." [...] Google's "1,000 Languages Initiative" is not focusing on any particular functionality, but instead on creating a single system with huge breadth of knowledge across the world's languages.
Speaking to The Verge, Zoubin Ghahramani, vice president of research at Google AI, said the company believes that creating a model of this size will make it easier to bring various AI functionalities to languages that are poorly represented in online spaces and AI training datasets (also known as "low-resource languages"). "By having a single model that is exposed to and trained on many different languages, we get much better performance on our low resource languages," says Ghahramani. "The way we get to 1,000 languages is not by building 1,000 different models. Languages are like organisms, they've evolved from one another and they have certain similarities. And we can find some pretty spectacular advances in what we call zero-shot learning when we incorporate data from a new language into our 1,000 language model and get the ability to translate [what it's learned] from a high-resource language to a low-resource language."
Access to data is a problem when training across so many languages, though, and Google says that in order to support work on the 1,000-language model it will be funding the collection of data for low-resource languages, including audio recordings and written texts. The company says it has no direct plans on where to apply the functionality of this model -- only that it expects it will have a range of uses across Google's products, from Google Translate to YouTube captions and more. "One of the really interesting things about large language models and language research in general is that they can do lots and lots of different tasks," says Ghahramani. "The same language model can turn commands for a robot into code; it can solve maths problems; it can do translation. The really interesting things about language models is they're becoming repositories of a lot of knowledge, and by probing them in different ways you can get to different bits of useful functionality."
Speaking to The Verge, Zoubin Ghahramani, vice president of research at Google AI, said the company believes that creating a model of this size will make it easier to bring various AI functionalities to languages that are poorly represented in online spaces and AI training datasets (also known as "low-resource languages"). "By having a single model that is exposed to and trained on many different languages, we get much better performance on our low resource languages," says Ghahramani. "The way we get to 1,000 languages is not by building 1,000 different models. Languages are like organisms, they've evolved from one another and they have certain similarities. And we can find some pretty spectacular advances in what we call zero-shot learning when we incorporate data from a new language into our 1,000 language model and get the ability to translate [what it's learned] from a high-resource language to a low-resource language."
Access to data is a problem when training across so many languages, though, and Google says that in order to support work on the 1,000-language model it will be funding the collection of data for low-resource languages, including audio recordings and written texts. The company says it has no direct plans on where to apply the functionality of this model -- only that it expects it will have a range of uses across Google's products, from Google Translate to YouTube captions and more. "One of the really interesting things about large language models and language research in general is that they can do lots and lots of different tasks," says Ghahramani. "The same language model can turn commands for a robot into code; it can solve maths problems; it can do translation. The really interesting things about language models is they're becoming repositories of a lot of knowledge, and by probing them in different ways you can get to different bits of useful functionality."
It's an AI? Then we already know what happens (Score:4, Interesting)
Give it a month or two and all it will do in 1000 languages is swear and tell antisemitic and racist jokes.
Re: (Score:3, Funny)
Re: (Score:2)
So you're saying this is going to be for the automated taxi's?
Re: (Score:3)
Swearing at you in a language you don't spreak? Yeah, it's pretty much the taxi driver experience.
Re: (Score:2)
The AI's are so much like us.
Re: (Score:2)
This is nit about speaking languages. It is about listening in on ideally all conversations on the planet. Completely in line with Google's actual motto "Be Evil".
Re: It's an AI? Then we already know what happens (Score:1)
Re: It's an AI? Then we already know what happens (Score:1)
Re: (Score:2)
If a Sloshdat Editor writes a post nobody can read, is it still full of factual, spelling and grammatical mistakes ?
The language that I would really like (Score:2)
to be truthfully translated is the one that politicians use - where they say something but what it means is very different from what it appears at first listening.
Re: The language that I would really like (Score:2)
That's actually an easy model because everything they say is approximately false if not completely false.
To understand career politicians you have to understand their objectives but that's kind of like the press asking a General in the military what's thier battle plan for an active war. This is were we get into the illusion of choice in our two party system. The objective for two parties is often the same with small differences in reasoning or strategy.
Re: (Score:2)
Oh, that already exists:
"I understand and I'd love to support this proposal, but..." means "I have no fucking clue what you said, the only thing I do understand is that it's mighty popular with the masses but I have to be against it because you said it's a good idea"
"Zealot" is the term used for someone who believes strongly in an idea I don't believe in.
If I do believe in it, too, "Visionary" would be the term used.
"We cannot rule anything out" means that whatever you just said or proposed has zero chance
Re: (Score:2)
That one is simple and can be done even in BASIC:
10 print "I am lying"
20 goto 10.
Re: (Score:2)
Add a semi-colon and make your spam spread across the screen. I guess this is a feature of some Basic's. It works on ZX Spectrum and BBC B (I assume A, Electron and the newer hardware).
Feelings (Score:1)
How does interpreting this text make you feel? Does it make you feel good? Or would you consider this textual assault?
I've always felt like languages... (Score:4, Interesting)
Just my 2 cents as someone that speaks more than 1 language.
And please, for the love of god, make sure you run what you're about to tattoo on yourself by a person who actually speaks the language. I'm really tired of people having "anesthetic, gender/surname, shaking/rolling" instead of "drugs, sex and rock'n'roll" Or the other one I've ran across recently. "Tippy-top hunter." instead of "Apex predator".
Re: (Score:2, Interesting)
"Origami, origami" is merely redundant in Japanese, but not useful for those English speakers who don't know what it means. So while there is redundancy for speakers of both languages, "Origami, the art of folding paper" does convey something that "Origami" alone does not. Write a colon instead of a comma if that bothers less.
As it happens, was just watching Ghost in the Shell, the part where the laughing man says he wanted to give all phoneys in the world a kick in the behind. One of the better tricks of
Re: (Score:2)
" It does clearly convey the message "I don't know what I'm on about".
google translate is often wrong, so you can't really just run a text through it and assume it'll be correct."
But if your tattoo was intended to say "I'm a night owl" and in reality it says "I'm a lady of the night" you'll get more new friends.
Re: (Score:2)
I'm mostly asking for other people's sake.
Re: (Score:2)
My pet peeve with auto-translations is that they often misses the context, so homonyms get translated incorrectly. But something worse is what google-translate is obviously doing: when translating from language X to language Y and it doesn't have a direct translation for a word it goes via English hoping for the best. Where I encounter that most of the time is when youtube translates Italian cooking shows to English/Danish/German and the results can be hilarious. Eg:
- Caprino, literally "small goat"
Re: (Score:2)
Re: (Score:1)
It's not a problem of "mindset" (though their mindset is pretty horrible, too).
"AI" translation is basically a scam, an intellectual imposture.
A year or so ago, I typed the equivalent of "your mother-in-law was run over by a car" in google translate, and the result was like "your mother-in-law has stomped and trampled me under her feet". The curious thing was that only "YOUR mother-in-law" resulted in such nonsense; If changed to "MY mother-in-law ...", the translation was pretty accurate.
But the really cur
Re: (Score:2)
""AI" translation is basically a scam, an intellectual imposture." No. Machine Translation (MT) is used every day by lots of people. Is it perfect? No. Is MT usually worse than human translations, done by competent translators? Yes, but it's more or less free, whereas human translators are expensive (last I looked, around 25 cents US/ word). Is MT sometimes better than human translators? Surprisingly, yes; I've seen some pretty bad human translations.
Re: (Score:2)
"A "the English language and alphabet can be representative to all sounds in all languages" approach..." Not sure what you mean here. Are you referring to transliteration of *names* from one language into another? Yes, that will always be a problem, no matter which direction you're going.
"But often in languages two words that supposedly mean the same thing, doesn't." Again, unclear what you mean here. Two words in the same language (like "home" and "domicile", to take a not-so-good example), or one wor
Re: (Score:2)
"A "the English language and alphabet can be representative to all sounds in all languages" approach..." Not sure what you mean here. Are you referring to transliteration of *names* from one language into another?
This was said to me (almost at me) by a person who only spoke English and took some French in high school and some rudimentary understanding of simplified Chinese. So I'll explain it as best as I can as it was pitched to me (especially without being able to type non-English alphabet.).
What he said was this:
All the pronunciation of all languages can be presented by the English alphabet. Let's take Chinese for example since that what we were talking about. His argument is that the system like Hiragana,Kata
"Languages of Lesser Diffusion" (Score:2)
"Low Resource" languages sounds like a misnomer, both in the languages that it covers and the actual A.I. resources used to process them.
But this is Google so anyone want take book on how long it'll be before they're just flat out dropped?
hammer and screws (Score:2)
Re: (Score:2)
I am interested in how the AI will fare at translating "north" for the language family that has no cardinal directions.
"How do I get to the village?"
(Reorients body) "It is on my left-foot side."
But sure "languages [...] have certain similarities" or whatever.
Is the initiative a good idea? Yes, undoubtedly.
Is it going to make some hilarious mistakes? Likewise, yes.
Re: (Score:2)
I don't think you understand how this works: "munging these together" is an inaccurate way of describing what MT is.
Research Experiment (Score:2)
from TFA:
"The company says it has no direct plans on where to apply the functionality of this model"
This is a research experiment for it's AI, and nothing more.
1000 languages (Score:2)
For the record, there are around 330 (last I looked) languages with at least a million speakers. So the tail of 1000 languages are languages that are spoken by a few hundred thousand speakers, like maybe my favorite language: Tzeltal (Mayan language of southern Mexico). The vast majority of these will be written languages; signed languages are exceptions in that they are essentially unwritten, at least by people who 'speak' them.
That said, some of these languages are probably not written in daily life--th
AI role in linguistics (Score:2)