Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Google

Google Plans Giant AI Language Model Supporting World's 1,000 Most Spoken Languages (theverge.com) 35

Google has announced an ambitious new project to develop a single AI language model that supports the world's "1,000 most spoken languages." The Verge reports: As a first step towards this goal, the company is unveiling an AI model trained on over 400 languages, which it describes as "the largest language coverage seen in a speech model today." [...] Google's "1,000 Languages Initiative" is not focusing on any particular functionality, but instead on creating a single system with huge breadth of knowledge across the world's languages.

Speaking to The Verge, Zoubin Ghahramani, vice president of research at Google AI, said the company believes that creating a model of this size will make it easier to bring various AI functionalities to languages that are poorly represented in online spaces and AI training datasets (also known as "low-resource languages"). "By having a single model that is exposed to and trained on many different languages, we get much better performance on our low resource languages," says Ghahramani. "The way we get to 1,000 languages is not by building 1,000 different models. Languages are like organisms, they've evolved from one another and they have certain similarities. And we can find some pretty spectacular advances in what we call zero-shot learning when we incorporate data from a new language into our 1,000 language model and get the ability to translate [what it's learned] from a high-resource language to a low-resource language."

Access to data is a problem when training across so many languages, though, and Google says that in order to support work on the 1,000-language model it will be funding the collection of data for low-resource languages, including audio recordings and written texts. The company says it has no direct plans on where to apply the functionality of this model -- only that it expects it will have a range of uses across Google's products, from Google Translate to YouTube captions and more. "One of the really interesting things about large language models and language research in general is that they can do lots and lots of different tasks," says Ghahramani. "The same language model can turn commands for a robot into code; it can solve maths problems; it can do translation. The really interesting things about language models is they're becoming repositories of a lot of knowledge, and by probing them in different ways you can get to different bits of useful functionality."

This discussion has been archived. No new comments can be posted.

Google Plans Giant AI Language Model Supporting World's 1,000 Most Spoken Languages

Comments Filter:
  • by Opportunist ( 166417 ) on Saturday November 05, 2022 @05:05AM (#63026193)

    Give it a month or two and all it will do in 1000 languages is swear and tell antisemitic and racist jokes.

  • to be truthfully translated is the one that politicians use - where they say something but what it means is very different from what it appears at first listening.

    • That's actually an easy model because everything they say is approximately false if not completely false.

      To understand career politicians you have to understand their objectives but that's kind of like the press asking a General in the military what's thier battle plan for an active war. This is were we get into the illusion of choice in our two party system. The objective for two parties is often the same with small differences in reasoning or strategy.

    • Oh, that already exists:

      "I understand and I'd love to support this proposal, but..." means "I have no fucking clue what you said, the only thing I do understand is that it's mighty popular with the masses but I have to be against it because you said it's a good idea"

      "Zealot" is the term used for someone who believes strongly in an idea I don't believe in.

      If I do believe in it, too, "Visionary" would be the term used.

      "We cannot rule anything out" means that whatever you just said or proposed has zero chance

    • by gweihir ( 88907 )

      That one is simple and can be done even in BASIC:

      10 print "I am lying"
      20 goto 10.

      • by bvimo ( 780026 )

        Add a semi-colon and make your spam spread across the screen. I guess this is a feature of some Basic's. It works on ZX Spectrum and BBC B (I assume A, Electron and the newer hardware).

  • ... by probing them in different ways you can get to different bits of useful functionality.

    How does interpreting this text make you feel? Does it make you feel good? Or would you consider this textual assault?

  • by Sleeping Kirby ( 919817 ) on Saturday November 05, 2022 @06:56AM (#63026279)
    ...are one of the few things that shouldn't be AI'ed (though it is greatly desirable if it could be.). I've read a lot of AI translated stuff and it's always been really bad. From names translated to their literal meaning or slang and/or colloquial stuff being translated or mistranslated. Not to mention, it always feel like a lot of the people designing translation stuff have a very western language mindset. A "the English language and alphabet can be representative to all sounds in all languages" approach. But often in languages two words that supposedly mean the same thing, doesn't. Or, sometimes, there's a word that doesn't translate at all. Like I cringe every time someone says "Origami, the art of folding paper." because you've literally said "Folding paper, the art of folding paper." Lastly, I feel like there's a lot of work to translate one language to another and it quickly becomes a n^2 problem. i.e. if you're translating 1 to all languages 2...n, you'll need to do the same 2 and 3, etc. I think people that tackle these problems are too focused on words and words to translate rather than ideas and which word accurately represents the idea.

    Just my 2 cents as someone that speaks more than 1 language.

    And please, for the love of god, make sure you run what you're about to tattoo on yourself by a person who actually speaks the language. I'm really tired of people having "anesthetic, gender/surname, shaking/rolling" instead of "drugs, sex and rock'n'roll" Or the other one I've ran across recently. "Tippy-top hunter." instead of "Apex predator".
    • Re: (Score:2, Interesting)

      by Anonymous Coward

      "Origami, origami" is merely redundant in Japanese, but not useful for those English speakers who don't know what it means. So while there is redundancy for speakers of both languages, "Origami, the art of folding paper" does convey something that "Origami" alone does not. Write a colon instead of a comma if that bothers less.

      As it happens, was just watching Ghost in the Shell, the part where the laughing man says he wanted to give all phoneys in the world a kick in the behind. One of the better tricks of

      • " It does clearly convey the message "I don't know what I'm on about".

        google translate is often wrong, so you can't really just run a text through it and assume it'll be correct."

        But if your tattoo was intended to say "I'm a night owl" and in reality it says "I'm a lady of the night" you'll get more new friends.

        • This exactly. Everything I've written is just my opinion and my request. But, in college, I had a teammate that was excited to be teamed up with someone that, not only watches anime, but knows japanese. He proudly says the only phrase he knows. "Ore chin chin ga suki." I couldn't believe my ears so I asked "What?" And then he repeated it. So I asked him one more time, and he goes "It means you like p*nis." I reply with, "No. It means *I* like p*nis. 'Ore' is I."

          I'm mostly asking for other people's sake.
      • by isj ( 453011 )

        My pet peeve with auto-translations is that they often misses the context, so homonyms get translated incorrectly. But something worse is what google-translate is obviously doing: when translating from language X to language Y and it doesn't have a direct translation for a word it goes via English hoping for the best. Where I encounter that most of the time is when youtube translates Italian cooking shows to English/Danish/German and the results can be hilarious. Eg:
        - Caprino, literally "small goat"

        • I'm totally with you. So, in Chinese, human names often unique. So you know how like in English there's John and Sam and Heather. But name are usually not like "Celestial river" or "Stalwart Loyalty". In Chinese, names are almost always completely unique. It's very rare that you'll find 2 people with the same name unless the parent named their child after like a movie star or something. Well, translators often go nuts on those. Sometimes it'll translate them to the literal definition of the words, sometimes
    • by Anonymous Coward

      It's not a problem of "mindset" (though their mindset is pretty horrible, too).

      "AI" translation is basically a scam, an intellectual imposture.

      A year or so ago, I typed the equivalent of "your mother-in-law was run over by a car" in google translate, and the result was like "your mother-in-law has stomped and trampled me under her feet". The curious thing was that only "YOUR mother-in-law" resulted in such nonsense; If changed to "MY mother-in-law ...", the translation was pretty accurate.

      But the really cur

      • ""AI" translation is basically a scam, an intellectual imposture." No. Machine Translation (MT) is used every day by lots of people. Is it perfect? No. Is MT usually worse than human translations, done by competent translators? Yes, but it's more or less free, whereas human translators are expensive (last I looked, around 25 cents US/ word). Is MT sometimes better than human translators? Surprisingly, yes; I've seen some pretty bad human translations.

    • "A "the English language and alphabet can be representative to all sounds in all languages" approach..." Not sure what you mean here. Are you referring to transliteration of *names* from one language into another? Yes, that will always be a problem, no matter which direction you're going.

      "But often in languages two words that supposedly mean the same thing, doesn't." Again, unclear what you mean here. Two words in the same language (like "home" and "domicile", to take a not-so-good example), or one wor

      • "A "the English language and alphabet can be representative to all sounds in all languages" approach..." Not sure what you mean here. Are you referring to transliteration of *names* from one language into another?

        This was said to me (almost at me) by a person who only spoke English and took some French in high school and some rudimentary understanding of simplified Chinese. So I'll explain it as best as I can as it was pitched to me (especially without being able to type non-English alphabet.).

        What he said was this:
        All the pronunciation of all languages can be presented by the English alphabet. Let's take Chinese for example since that what we were talking about. His argument is that the system like Hiragana,Kata

  • I work in the field and that's what we call lesser known/used languages. It literally refers to how wide-spread, or diffused, the language is on the planet.

    "Low Resource" languages sounds like a misnomer, both in the languages that it covers and the actual A.I. resources used to process them.

    But this is Google so anyone want take book on how long it'll be before they're just flat out dropped?
  • That is an incredibly stupid idea. Inherent in different languages are different ways of seeing the world. Munging these together in one training model will damage culture difference information and could result in more nonsense sentences from a system that does not understand deep meaning. What this project is is an attempt to pound screws with a hammer when all you have is an ANN hammer.
    • by q4Fry ( 1322209 )

      I am interested in how the AI will fare at translating "north" for the language family that has no cardinal directions.

      "How do I get to the village?"
      (Reorients body) "It is on my left-foot side."

      But sure "languages [...] have certain similarities" or whatever.

      Is the initiative a good idea? Yes, undoubtedly.
      Is it going to make some hilarious mistakes? Likewise, yes.

    • I don't think you understand how this works: "munging these together" is an inaccurate way of describing what MT is.

  • from TFA:

    "The company says it has no direct plans on where to apply the functionality of this model"

    This is a research experiment for it's AI, and nothing more.

  • For the record, there are around 330 (last I looked) languages with at least a million speakers. So the tail of 1000 languages are languages that are spoken by a few hundred thousand speakers, like maybe my favorite language: Tzeltal (Mayan language of southern Mexico). The vast majority of these will be written languages; signed languages are exceptions in that they are essentially unwritten, at least by people who 'speak' them.

    That said, some of these languages are probably not written in daily life--th

  • I wonder if AI could help us to either prove or disprove theorized connections between language groups not currently known for certain to be related. Or between what are currently believed to be language isolates, such as Basque or Mapuche, and other attested languages of the past or present. Human effort has not definitively answered some of these kinds of questions.

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (10) Sorry, but that's too useful.

Working...