Google Trained a Trillion-Parameter AI Language Model

Google Trained a Trillion-Parameter AI Language Model (venturebeat.com) 110

Posted by BeauHD on Wednesday January 13, 2021 @11:30PM from the four-times-faster-than-before dept.

An anonymous reader quotes a report from VentureBeat: Google researchers developed and benchmarked techniques they claim enabled them to train a language model containing more than a trillion parameters. They say their 1.6-trillion-parameter model, which appears to be the largest of its size to date, achieved an up to 4 times speedup over the previously largest Google-developed language model (T5-XXL). As the researchers note in a paper detailing their work, large-scale training is an effective path toward powerful models. Simple architectures, backed by large datasets and parameter counts, surpass far more complicated algorithms. But effective, large-scale training is extremely computationally intensive. That's why the researchers pursued what they call the Switch Transformer, a "sparsely activated" technique that uses only a subset of a model's weights, or the parameters that transform input data within the model.

In an experiment, the researchers pretrained several different Switch Transformer models using 32 TPU cores on the Colossal Clean Crawled Corpus, a 750GB-sized dataset of text scraped from Reddit, Wikipedia, and other web sources. They tasked the models with predicting missing words in passages where 15% of the words had been masked out, as well as other challenges, like retrieving text to answer a list of increasingly difficult questions. The researchers claim their 1.6-trillion-parameter model with 2,048 experts (Switch-C) exhibited "no training instability at all," in contrast to a smaller model (Switch-XXL) containing 395 billion parameters and 64 experts. However, on one benchmark -- the Sanford Question Answering Dataset (SQuAD) -- Switch-C scored lower (87.7) versus Switch-XXL (89.6), which the researchers attribute to the opaque relationship between fine-tuning quality, computational requirements, and the number of parameters.

This being the case, the Switch Transformer led to gains in a number of downstream tasks. For example, it enabled an over 7 times pretraining speedup while using the same amount of computational resources, according to the researchers, who demonstrated that the large sparse models could be used to create smaller, dense models fine-tuned on tasks with 30% of the quality gains of the larger model. In one test where a Switch Transformer model was trained to translate between over 100 different languages, the researchers observed "a universal improvement" across 101 languages, with 91% of the languages benefitting from an over 4 times speedup compared with a baseline model. "Though this work has focused on extremely large models, we also find that models with as few as two experts improve performance while easily fitting within memory constraints of commonly available GPUs or TPUs," the researchers wrote in the paper. "We cannot fully preserve the model quality, but compression rates of 10 to 100 times are achievable by distilling our sparse models into dense models while achieving ~30% of the quality gain of the expert model."

Google Trained a Trillion-Parameter AI Language Model

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 110 Comments Log In/Create an Account

Comments Filter:

Is this why Google translate sucks? (Score:2)

by Mr. Dollar Ton ( 5495648 ) writes:

It was much better 4 or 5 years ago, and now it is nearly unusable, even for the widely used languages in places where google does business.
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  Yeah. There was a noticeably decline in quality when they switch from conventional to NN-based translation.
  - Re: (Score:1)
    
    by Mr. Dollar Ton ( 5495648 ) writes:
    
    Yeah, it appears more of a brute-force attack than anything else. Which is nice - it means the "self-teaching AI that can do anything" has hit the wall in a hard, hard way.
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      Current NN technology is really really good at interpolating, but shockingly bad at extrapolating.
      - Re: (Score:3)
        
        by Mr. Dollar Ton ( 5495648 ) writes:
        
        That's always the case when your models depend on parameter fits.
        You should see particle physics theories' predictions divergence when you extrapolate outside of the ranges where data is available.
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        That is fascinating.
        
        Re: (Score:3)
        
        by Mr. Dollar Ton ( 5495648 ) writes:
        
        Yes, it is. At least the part of it where all the funding that goes into overfitting "AI models" and not meaningful research.
      - Re: (Score:2)
        
        by ShanghaiBill ( 739463 ) writes:
        
        Current NN technology is really really good at interpolating, but shockingly bad at extrapolating.
        The solution is more training data in the areas needed.
        According to TFA, the NN was trained on text from Wikipedia. So it may be bad at text prediction in, say, a medical research paper. The solution is to add medical journal articles to the training set.
        
        Re: (Score:3)
        
        by phantomfive ( 622387 ) writes:
        
        No. Humans don't need to read the entire corpus of Wikipedia and JSTOR in order to form a coherent sentence. The solution is to improve our algorithms.
        
        Re: (Score:3)
        
        by raymorris ( 2726007 ) writes:
        
        Interesting point.
        On the other hand, humans can read thousands of Slashdot posts, yet still be unable to form a coherent sentence of their own. So indeed "reading a lot" isn't the key for humans. You'd think it would help, but it's not the key.
        
        Re: (Score:3)
        
        by medv4380 ( 1604309 ) writes:
        
        Training on slashdot posts would only result in unlearning.
        
        Re: Is this why Google translate sucks? (Score:2)
        
        by BAReFO0t ( 6240524 ) writes:
        
        We go through an entire youth though.
        And use much *much* more neurons.
        You are still right, of course... and to specify: We need algorithms, that are not based on simplifying down to perfectly spherical horses on sinusoidal trajectories. :)
        And here, with this sparse method, it seems they went even deeper down that very hole instead.
        I wonder when they get to the point that their model will need so much processing power that they notice that simplification is not the right deity to suck off and they might aswe
        
        Re: (Score:3)
        
        by Mr. Dollar Ton ( 5495648 ) writes:
        
        And use much *much* more neurons.
        many, many more.
        / sorry, pet peeve.
        
        Re: (Score:2)
        
        by Actually, I do RTFA ( 1058596 ) writes:
        
        And yet, when people enter a new field, they often have to devote years to learning before they can extrapolate, and weeks before they understand the common specialized vocabulary.
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        It's amazing how you missed the point there. Are you a bot?
        
        Re: (Score:2)
        
        by AmiMoJo ( 196126 ) writes:
        
        This is something that the head of AI ethics that Google recently forced out was saying. Massive corpuses are not the way forward, developing something that mimics the way the human brain is pre-wired to understand language is.
        Aside from anything else training on these massive corpuses is expensive and produces massive amounts of CO2 emissions, all for something that will never really understand the meaning of the words, just be able to transcribe them with decent accuracy.
        
        Re: (Score:2)
        
        by shmlco ( 594907 ) writes:
        
        "... developing something that mimics the way the human brain is pre-wired to understand language is..."
        Ummm. Assumption that the brain is "pre-wired" to understand language. Certain centers of the brain tend to be associated with speech, for example, but, in a certain number of people those locations can be swapped from hemisphere to hemisphere, nor always live in exactly the same place. In fact, "language" can involve many parts of the brain.
        
        Re: (Score:2)
        
        by skids ( 119237 ) writes:
        
        According to TFA, the NN was trained on text from Wikipedia. So it may be bad at text prediction in, say, a medical research paper. The solution is to add medical journal articles to the training set.
        They already include other sources:
        a 750GB-sized dataset of text scraped from Reddit, Wikipedia, and other web sources
        Though including reddit probably was counterproductive. Last thing we need is a language AI model that knows every flash in the pan meme and has no sense of what constitutes proper grammar.
        
        Re: (Score:2)
        
        by Visarga ( 1071662 ) writes:
        
        There's a grassroots effort to build a better corpus, called The Pile.
        
        > The Pile: An 800GB Dataset of Diverse Text for Language Modeling
        https://arxiv.org/abs/2101.000... [arxiv.org]
        
        This dataset includes all the Arxiv, PubMed and Github (open soure projects).
        
        Re: (Score:2)
        
        by Entrope ( 68843 ) writes:
        
        Shouldn't computer scientists know better than to rely on using more and more better force? Humans don't consume anywhere near 800 gigabytes of text in a lifetime. That's equal to more than 170,000 readings through Victor Hugo's unabridged Les Miserables, or 200,000 readings through Ayn Rand's Atlas Shrugged.
        
        Re: (Score:2)
        
        by Zak3056 ( 69287 ) writes:
        
        Thanks for the fantastic visual... I am now imaging a large gang of men all reading Les Miserables repeatedly for their entire lives, while singing,
        Look down, look down
        Eight hundred gigabytes
        Look down, look down
        You'll read until you die
    - Re: (Score:2)
      
      by Visarga ( 1071662 ) writes:
      
      You have no idea what you're talking about. AI is starting to deliver but it's still too expensive to deploy. Brute forcing is what evolution does, as well.
      - Re: (Score:2)
        
        by Mr. Dollar Ton ( 5495648 ) writes:
        
        Evolution, like this kind of "AI", sucks balls - there's a reason why the people who develop the covid vaccine, for example, are using their knowledge and not throwing the kitchen sink at a population of victims trying to get lucky by evolution.
        It is always better to solve the problem directly, and "AI" sucks at problem solving, and is not going to get much better anytime soon.
  - Re: (Score:2)
    
    by Vintermann ( 400722 ) writes:
    
    Maybe if you translated only in some small, very specific domain.
    On the whole, NN-based translation smokes the old latent Dirichlet allocation thing they used. I can probably train better models than that on my home computer now.
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      On the whole, NN-based translation smokes the old latent Dirichlet allocation thing they used.
      
      Based on what metric? My experience is the new NN translation does a better job of producing grammatically correct sentences, but a worse job of matching the original text.
      - Re: (Score:2)
        
        by Vintermann ( 400722 ) writes:
        
        Based on any metric you can come up with, from BLEU to costly manual rating of translations by professional translators. Just look up the papers, though you have to go many years back, because it's a long time since they dropped pre-NMT models from the comparisons.
        BLEU itself is biased to the old style of machine translation because understandably enough, the best way they could come up with to automatically rate translation at the time, looked a lot like the best way they could come up with to do machine t
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        Yeah, well the old method didn't translate "onna" from Japanese to "man" in English, so I call BS on whatever metric you're using.
        
        Re: (Score:2)
        
        by Vintermann ( 400722 ) writes:
        
        So you "call bullshit" on a single anecdote which isn't even true [google.com]?
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        Yeap. It's inconsistent. For a while I was taking screenshots of all the mistakes it randomly made, but eventually I gave up.
  - Re: Is this why Google translate sucks? (Score:2)
    
    by Sk00n ( 1790052 ) writes:
    
    What algos were they using before nn?
- Re: (Score:2)
  
  by Visarga ( 1071662 ) writes:
  
  No, it's unrelated. This model is too large to deploy to the public. The public model is cheap and efficient, but not as good.
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  I've noticed Google Translate improving over the last few years. In particular it seems to better understand context now.
  It used to translate headlines as if it was a person describing something that happened to them, e.g. "I found the bank vault door open and the money was gone", rather than "bank vault found open, money missing".
  Now the most common error I see is getting gender wrong. It tends to assume that everyone is male and doesn't notice when someone's name suggests otherwise.
  - Re: (Score:2)
    
    by Mr. Dollar Ton ( 5495648 ) writes:
    
    I've noticed Google Translate improving over the last few years.
    
    You've missed the big drop in quality that happened 4-5 years ago then.
    It could have been "improving" after that, but it has much ground to cover to just get back to where it was before.
    Currently, it "excels" at literal translations word for word, totally ignoring idioms or rare words, except that it often mixes up the type of the word (so you'd get a noun in the original become a verb and so on), and the sentence role (so the object becomes something else), which makes the "translation" nearly incomprehens
GIGO squared (Score:2)

by alternative_right ( 4678499 ) writes:

Machine learning means that a computer compiles the least coherent responses from the raving herd, and then repeats them in random splice order. It reflects the insanity of the human mind at the end of civilization, but it does it with grammatical perfection and digital precision!
- Re: (Score:2)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  Come on, that's Google wizards you're talking about, it's already optimized it to nearly log GIGO.
  - Re: (Score:2)
    
    by alternative_right ( 4678499 ) writes:
    
    They have perfected next-level bullshit!
- Re: GIGO squared (Score:2)
  
  by BAReFO0t ( 6240524 ) writes:
  
  Grammatical perfection?
  You ain't never seen no Reddit posts, have ya?
  - Re: (Score:2)
    
    by alternative_right ( 4678499 ) writes:
    
    I was told I needed to own a fedora before I could join Reddit, so I've held back. Apparently the users are all barista metrosexuals who think they are geniuses and enjoy fleshlights and pegging?
Compute magic (Score:5, Informative)

by sg_oneill ( 159032 ) writes: on Wednesday January 13, 2021 @11:56PM (#60941504)

A trillion parameters is kind of astonishing.
GPT-3 has about 170 billion if I remember right, and that thing is quite mind blowing. GPT4 is projected to have about 20 trillion, but theres a *huge* catch, training the damn thing at the current cost for GPU compute would come in at 8-9 billion dollars (GPT-3 cost about 4.6 million)
Googles clearly is not spending half quarter of a billion on GPU, that would be *extremely* hard to justify to investors. So they've figured out some pretty dark magic to get this to work without spending Googles entire R&D budget on GPU compute for a single project.

- Re: (Score:3)
  
  by phantomfive ( 622387 ) writes:
  
  Googles clearly is not spending half quarter of a billion on GPU, that would be *extremely* hard to justify to investors. So they've figured out some pretty dark magic to get this to work without spending Googles entire R&D budget on GPU compute for a single project.
  In the first place they build their own TPU hardware. In the second place, Google is really good at shaping their Neural Network trees to be more efficient. They also have tricks like "use 8 bit integers instead of floats." So they have some efficiency experts on their team.
  - Re: (Score:2)
    
    by ShanghaiBill ( 739463 ) writes:
    
    In the first place they build their own TPU hardware.
    Indeed. For a typical NN, using TPUs is a factor of six more energy efficient than using GPUs.
    Tesla also designs its own custom tensor-processing silicon.
    Does anyone else?
    - Re: (Score:3)
      
      by phantomfive ( 622387 ) writes:
      
      Tons of them [ai-startups.org], it's a hot topic for startups right now. I watched this speech given by the founders of one of them, it's worth a watch [youtube.com].
- Re: Compute magic (Score:1)
  
  by BAReFO0t ( 6240524 ) writes:
  
  It's a word trick.
  A image processing function that processes 32Kx32K pictures of 24 bit could be said to process "a trillion parameters".
  Or, if counting single bits feels like cheating, how about one that inserts an effect into equally high resolution videos with one second (24 frames for film) of spread. E.g. 1 second long temporal blur.
  So "one second of retina display VR goggles movie"... That's my attempt of today, at the news media system of units. :)
  - Re: (Score:2)
    
    by Visarga ( 1071662 ) writes:
    
    Are you counting input data size as parameter count? Parameters are the same between all input examples, don't change with each example.
    - Re: (Score:2)
      
      by Rockoon ( 1252108 ) writes:
      
      he is confusing the inputs with the weights, because he is an ignorant doofus that wants to pretend that he knows something, but proves once again that he doesnt
- Re: Compute magic (Score:1)
  
  by rantrantrant ( 4753443 ) writes:
  
  TFA reports that the model was completing just under nth-7 cloze deletion tests, which is comparable to humans. In theory at least, that means that it can not only process texts from the 'bottom-up' (i.e. processing syntax & lexis probabilistically = Bayesian inference which is unremarkable), but also from the 'top-down,' i.e. infer missing words from pragmatic meaning. In others words, it's at least simulating strong AI. Impressive!
  - Re: Compute magic (Score:1)
    
    by rantrantrant ( 4753443 ) writes:
    
    Either that or current language proficiency testing theory needs to be revised in the light of this. Specifically for cloze deletion tests, this may mean revisiting the 'reduced redundancy principle.'
    - Re: (Score:2)
      
      by Visarga ( 1071662 ) writes:
      
      Nah, cloze deletions work well. The fact that this model can be good at this task yet inferior to humans in other tasks means it lacks something else. What is missing from it is the ability to do anything of its own will, because it doesn't have a body and all it can do is read the text we feed into it. Like a life-long paralyzed person, blind, with no smell, taste and touch. Just a pair of eyes reading plain text fed page by page.
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  I'm sure they do have some special sauce, but even if they didn't $8bn might not be unreasonably for the potential gains. Language processing is extremely valuable now.
  Of course it's English only, presumably if they had a similar size model for say Hungarian it wouldn't be worth putting in nearly as much effort.
- Re: (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  Googles clearly is not spending half quarter of a billion on GPU, that would be *extremely* hard to justify to investors. So they've figured out some pretty dark magic to get this to work without spending Googles entire R&D budget on GPU compute for a single project.
  They are probably not using GPU.
  Google has absolute arseloads of servers, somewhere around a literal million of them. The unused capacity on these servers has made possible all of Google's side projects. They just give them the lowest priority so they don't step on indexing and correlating web data.
- Re: (Score:2)
  
  by Visarga ( 1071662 ) writes:
  
  Read the announcement again - it says they invented a trick to make it easier to train. The trick is to split the model into 2000 "experts" that are only called upon in a sparse way (like just 1% active at a time). Reminds me of the old trope that the brain is only using 10% of its power. By this trick they can push the scaling, but I don't think this model surpasses GPT-3 in precision, just in speed.
- Re: (Score:2)
  
  by Rockoon ( 1252108 ) writes:
  
  A trillion parameters is kind of astonishing.
  Its classic over-fitting.
- Re: (Score:2)
  
  by tinkerton ( 199273 ) writes:
  
  Give me 4 parameters and I can fit an elephant. Give me 5 and it can wiggle its trunk.
  Give me a trillion and uh ...an elephant that wiggles all its appendages?
Fantastic! (Score:2)

by Gravis Zero ( 934156 ) writes:

Looking forward to trolls getting overfed by AIs that can identify misinformation/disinformation and their posts automatically hidden for being factually inaccurate. The internet of idiots spewing bullshit vastly outnumbers the number of experts on the internet.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  There goes CNN.
  - Re: (Score:2)
    
    by Visarga ( 1071662 ) writes:
    
    Not just the CNN. The convolutional neural networks - CNN are also being replaced by this kind of neural networks - transformers. It's the hottest fad in computer vision.
- Re: (Score:3)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  But actually what you will see is high quality translations of trolls in multiple languages.
  They're doing it for the CLIKZ.
- Re: (Score:3)
  
  by Ungrounded Lightning ( 62228 ) writes:
  
  Looking forward to trolls getting overfed by AIs that can identify misinformation/disinformation ...
  Fat chance. They'd be trained by datasets where humans had made such ratings - with all their biases and the currently observed conflation between ideology and "correctness".
  They might become as good as humans at identifying truth-or-agreement-with-my-in-group's-ideology. But that just turns them into automatic ideologues, not oracles of truth but of "truthiness".
- Re: (Score:3)
  
  by dragonturtle69 ( 1002892 ) writes:
  
  Except that the AI will use what is trending on social media to determine Truth
  - Re: (Score:2)
    
    by Gravis Zero ( 934156 ) writes:
    
    Except that the AI will use what is trending on social media to determine Truth
    According to who? You literally just made that shit up and created a prime example of a comment that should be shut down by said AI because even the summary explained that they used reputable sources of information, not what was trending on social media.
    - Re: (Score:2)
      
      by dragonturtle69 ( 1002892 ) writes:
      
      What do the "reputable sources" on your stream report, except that which is trending on social media?
      - Re: (Score:2)
        
        by Gravis Zero ( 934156 ) writes:
        
        Read a newspaper and find out.
- Re: (Score:3)
  
  by ShanghaiBill ( 739463 ) writes:
  
  Looking forward to trolls getting overfed by AIs that can identify misinformation/disinformation and their posts automatically hidden
  That won't work. The trolls can easily defeat the filter by writing insightful, interesting, and factually accurate posts.
  https://xkcd.com/810/ [xkcd.com]
- Re: Fantastic! (Score:3)
  
  by BAReFO0t ( 6240524 ) writes:
  
  Fun fact: Your post counts as one of those. :)
  Also fun fact: If you want to see what an "AI" fed by an internet forum looks like, look up "Bucket (chatbot)" on Encyclopedia Dramatica from when it still existed.
  My favorite quote after it had a fun day with 4chan:
  Bucket: "Bucket is cancer is cancer is Bucket."
  Says it all. :)
Comment removed (Score:5, Interesting)

by account_deleted ( 4530225 ) writes: on Thursday January 14, 2021 @12:00AM (#60941516)

Comment removed based on user account deletion

- Re: (Score:3)
  
  by ShanghaiBill ( 739463 ) writes:
  
  Is it odd that they train a 1.6 trillion parameter model with only 0.75 trillion bytes of data?
  Perhaps that is all the data they could find.
  The text of the English edition of Wikipedia is only about 20 GB.
  But using so many parameters has a very big risk of overfitting.
- Re: 1.6-trillion-parameter model (Score:1)
  
  by BAReFO0t ( 6240524 ) writes:
  
  I *knew* they were counting single bits as "parameters"!
- Re: 1.6-trillion-parameter model (Score:2)
  
  by rantrantrant ( 4753443 ) writes:
  
  Not if the dataset is complex, which language is - highly complex, e.g. see 'dual structure.' A linguist can write a whole book describing the characteristics of a short paragraph of text without repeating him/herself. Wikipedia is more than enough to keep an AI busy for a few lifetimes. It's limitation is that it's only one particular mode & genre of text, i.e. written expository, & therefore not particularly generalisable to other genres, which have different lexicgrammatical configurations, e.g.
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  Compression maybe? Being text it probably compresses extremely well.
  It would make sense to tokenize it for faster processing, aside from anything else. No point doing trillions of string matches over and over again.
- Re: (Score:2)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  use that horsepower with an efficiently written algorithm and you could spit out the entire Encyclopedia Brittanica.
  Hardly. Knowledge production is not an algorithm.
  - Re: Trillion parameters? (Score:2)
    
    by BAReFO0t ( 6240524 ) writes:
    
    Oh yes it is!
    The automatic researcher is indeed a thing! :)
    No, abduction (the logical concept related to induction and deduction) is not a human exclusive.
    Disclaimer: I can automate *researchers* away now! ... Fear me! ;)
    - Re: (Score:2)
      
      by Mr. Dollar Ton ( 5495648 ) writes:
      
      Disclaimer: I can automate *researchers* away now! ... Fear me! ;)
      You think you're a big deal? Researcher production was automated long time ago. I myself have manufactured a whole bunch of self-teaching researchers and I'm so successful that their average h-index is now higher than mine.
In other words... (Score:3)

by SuperKendall ( 25149 ) writes: on Thursday January 14, 2021 @12:21AM (#60941582)

..enabled them to train a language model containing more than a trillion parameters.
In other words, Google finally has what is effectively an infinite number of monkeys devoted to language.

- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  There are around 100 to 1000 trillion synapses in the human brain, so it's perhaps to effectively infinite. Google doing this certainly doesn't prove in itself that it's necessary, but, you never know until you try. 15 years ago people thought the idea of "Just throw a bunch of neurons at it" was a stupid approach, but it ended up working better than anybody thought it would.
  - Re: In other words... (Score:1)
    
    by rantrantrant ( 4753443 ) writes:
    
    It's interesting that the more they copy the precessing characteristics of human brains, the more efficient & effective the models become. Therefore, we could say that natural selection is an incredibly powerful form of AI.
    - Re: (Score:2)
      
      by swillden ( 191260 ) writes:
      
      Therefore, we could say that natural selection is an incredibly powerful form of AI.
      Well, we could say that natural selection is an effective way to generate AI, at least. And it's efficient in the sense that it can do a lot with a little, by finding a structure that works well. It's not very efficient in terms of the amount of time and energy it takes, though.
- Re: (Score:1)
  
  by TechnoJoe ( 1173761 ) writes:
  
  In other words, Google finally has what is effectively an infinite number of monkeys devoted to language.
  And it still can't accurately translate the works of Shakespeare.
What? (Score:1)

by MobileC ( 83699 ) writes:

Did someone just pick buzzwords at random?
- - Re: (Score:1)
    
    by poptopdrop ( 6713596 ) writes:
    
    It's actually a very specialized algorithm using focus groups, caffeine, and copious amounts of LSD to write the article.
    And a Trillion input parameters.
    Okay, they watched 1000 old movies stored in AVI files.
Imagine a Beowulf cluster of those... (Score:2)

by Jeremi ( 14640 ) writes:

n/t
- Re: Imagine a Beowulf cluster of those... (Score:2)
  
  by BAReFO0t ( 6240524 ) writes:
  
  Sup dawg. I herd you like clusters ... !
'clean'? (Score:2)

by BAReFO0t ( 6240524 ) writes:

I hope this does not imply unnatural distortion by Despicable Catholiban Linguistic Cieansing (DC/LC).
- Re: (Score:2)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  Yes, it wastes enormous amount of power and produces new strings from old strings.
42! (Score:2)

by Arethan ( 223197 ) writes:

...but what's the question?
- Re: (Score:2)
  
  by hcs_$reboot ( 1536101 ) writes:
  
  ...but what's the question?
  How do I post a very redundant and not funny at all old joke?
  - Re: (Score:1)
    
    by poptopdrop ( 6713596 ) writes:
    
    I think he was making a valid comment.
    The headline tells us nothing about what this system DOES.
    A useful headline would do that.
    Next week, from the Slashdot "editors":
    "Poptopdrop develops a new AI system with a Trillion and one parameterization factor assemblies."
They can't really use them all (Score:2)

by fph il quozientatore ( 971015 ) writes:

Even after they are properly trained, their model can't possibly use all these input data every time it is run. So where are they cheating? Are most of these parameters set to zero, so that it is a *sparse* matrix with 1.6 trillion entries but just a handful of nonzeros?
- Re: (Score:2)
  
  by Vintermann ( 400722 ) writes:
  
  Correct, it's a sparse model.
And we are still missing a clue (Score:1)

by bradley13 ( 1118935 ) writes:

This is still the same kind of rote training on an existing corpus that we've been doing for decades. As you increase the size of the corpus and the number of parameters, you get ever-diminishing returns. Twice as big is not nearly twice as good.
There is still no *understanding* involved. It's just rote training. If you talk to a chatbot trained like this (as most are), it's just like the old Eliza: it reflects your thoughts back to you, maybe mixing in some stuff from the training corpus. For answering que
- Re: (Score:2)
  
  by Vintermann ( 400722 ) writes:
  
  This thread is full of n-zi spammers, and then there are posts like yours, which are almost as useless.
  The big deal with huge language models is precisely that the returns don't diminish as much as people thought they would. Bigger keeps doing better.
  There is still no *understanding* involved.
  Sure. Just like there's no understanding involved in your post. Your notion of understanding is vague and doesn't stand up to scrutiny.
  If you talk to a chatbot trained like this (as most are), it's just like the old El
  - Re: (Score:2)
    
    by dargaud ( 518470 ) writes:
    
    I don't know what model the various chatbots on customer support webpages use, but they all uniformly suck. Ask a pointed question and get a boilerplate answer, always completely unrelated to the question. So yes, maybe you can use those models to generate amusing short stories from a prompt. But they always seem 'off' at the very best. And trying to do anything serious like answering precise customer questions is a complete failure. But like I said, maybe those bots are 4 generations beyond the current sta
    - Re: (Score:2)
      
      by nagora ( 177841 ) writes:
      
      I don't know what model the various chatbots on customer support webpages use, but they all uniformly suck. Ask a pointed question and get a boilerplate answer, always completely unrelated to the question.
      To be fair, that's the same with a human call centre too.
      "AI" allows companies to be unhelpful for less money.
Energy use by the model vs. by humans? (Score:1)

by kvutza ( 893474 ) writes:

Regarding energy requirements of training the model, has anyone compared it to energy requirements of a group of humans (to grow up and) to learn and to do similar tasks?
I keep wondering (Score:1)

by Malifescent ( 7411208 ) writes:

I keep wondering: does this neural network even know what's saying or doing? There aren't even a trillion words in the English language and the number of distinct phrases is probably lower as well.

So how does this matter?
And yet a two parameter google search. (Score:3)

by Fly Swatter ( 30498 ) writes: on Thursday January 14, 2021 @12:07PM (#60943454) Homepage

Gets worse every month. Progress!

my favorite translations: (Score:2)

by gTsiros ( 205624 ) writes:

"out of sight, out of mind" -> blind and insane
"the spirit is willing but the flesh is weak" -> the wine is good but the meat is spoiled
- Re: 5mod 3own (Score:2)
  
  by BAReFO0t ( 6240524 ) writes:
  
  Written by the Google "AI", I presume?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Is this why Google translate sucks? (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: Is this why Google translate sucks? (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Is this why Google translate sucks? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

GIGO squared (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: GIGO squared (Score:2)

Re: (Score:2)

Compute magic (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: Compute magic (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: Compute magic (Score:1)

Re: Compute magic (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Fantastic! (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: Fantastic! (Score:3)

Comment removed (Score:5, Interesting)

Re: (Score:3)

Re: 1.6-trillion-parameter model (Score:1)

Re: 1.6-trillion-parameter model (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Trillion parameters? (Score:2)

Re: (Score:2)

In other words... (Score:3)

Re: (Score:2)

Re: In other words... (Score:1)

Re: (Score:2)

Re: (Score:1)

What? (Score:1)

Re: (Score:1)

Imagine a Beowulf cluster of those... (Score:2)