Meta's Next Llama AI Models Are Training on a GPU Cluster 'Bigger Than Anything' Else (wired.com) 28

Posted by msmash on Thursday October 31, 2024 @01:20PM from the size-contest dept.

Meta CEO Mark Zuckerberg laid down the newest marker in generative AI training on Wednesday, saying that the next major release of the company's Llama model is being trained on a cluster of GPUs that's "bigger than anything" else that's been reported. From a report: Llama 4 development is well underway, Zuckerberg told investors and analysts on an earnings call, with an initial launch expected early next year. "We're training the Llama 4 models on a cluster that is bigger than 100,000 H100s, or bigger than anything that I've seen reported for what others are doing," Zuckerberg said, referring to the Nvidia chips popular for training AI systems. "I expect that the smaller Llama 4 models will be ready first."

Increasing the scale of AI training with more computing power and data is widely believed to be key to developing significantly more capable AI models. While Meta appears to have the lead now, most of the big players in the field are likely working toward using compute clusters with more than 100,000 advanced chips. In March, Meta and Nvidia shared details about clusters of about 25,000 H100s that were used to develop Llama 3.

Meta's Next Llama AI Models Are Training on a GPU Cluster 'Bigger Than Anything' Else

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 28 Comments Log In/Create an Account

Comments Filter:

Get off my lawn with this AI shit (Score:1)

by Anonymous Coward writes:

I might be growing too old and crotchety, but I just don't see a reason to ever touch AI. As a human being I am perfectly capable of generating verbal bullshit at the speed of typing. Sure, AI is faster than that, but then it doesn't get context and is vastly inferior in delivering underhanded insults. Likewise, if I wanted shitty code fast, I would just copy-paste substack examples.

Someone please explain to me why all these tech companies setting all this perfectly good cash on fire via AI?
- Re: (Score:2)
  
  by Bumbul ( 7920730 ) writes:
  
  AI is faster than that, but then it doesn't get context and is vastly inferior in delivering underhanded insults.
  
  Your use case requires decensored models - I'm sure they will match your capabilities in delivering insults.
- Re: (Score:2)
  
  by ceoyoyo ( 59147 ) writes:
  
  Chatbots are AI. AI is not chatbots.
Garbage in, garbage out (Score:2)

by gkelley ( 9990154 ) writes:

Doesn't make a difference how many GPU's you use.
- Re: (Score:3)
  
  by NettiWelho ( 1147351 ) writes:
  
  but a supercluster does process a lot more garbage than a regular cluster
  - Comment removed (Score:4, Funny)
    
    by account_deleted ( 4530225 ) writes: on Thursday October 31, 2024 @02:41PM (#64909695)
    
    Comment removed based on user account deletion
    
  - Re: (Score:2)
    
    by Growlley ( 6732614 ) writes:
    
    it's super sized so it can ask you if you'd like fries with that .
  - Re: (Score:2)
    
    by arglebargle_xiv ( 2212710 ) writes:
    
    It's the Lemmy approach to "AI", "everything bigger than everyone else".
- Re: (Score:2)
  
  by larryjoe ( 135075 ) writes:
  
  Doesn't make a difference how many GPU's you use.
  This is incorrect. These models are all in the research stage, which requires lots of trial and error. The training times for these huge models is days and weeks. How big your cluster is significantly determines the turnaround times for the trial and error. That's why the hyperscalars are willing to spend tens of billions.
  Open AI just announced their first search product. Google cannot afford to not be first in developing generative AI-based search. It's an existential problem for them.
training models versus running them (Score:3)

by ZipNada ( 10152669 ) writes: on Thursday October 31, 2024 @01:40PM (#64909497)

Apparently it can require huge compute resources to train models, but not nearly so much to run the model. Apparently the Meta Llama models are available for free. I don't see how this is economically feasible for Meta but I'm not complaining.
https://huggingface.co/blog/ll... [huggingface.co]
"Llama 3.2 Vision comes in two sizes: 11B for efficient deployment and development on consumer-size GPU, and 90B for large-scale applications."
"These models are designed for on-device use cases, such as prompt rewriting, multilingual knowledge retrieval, summarization tasks, tool usage, and locally running assistants. They outperform many of the available open-access models at these sizes and compete with models that are many times larger."
I'm tempted to experiment with it. If you are in the EU you are out of luck.
"any individual domiciled in, or a company with a principal place of business in, the European Union is not being granted the license rights to use multimodal models included in Llama 3.2"

- Re: (Score:2)
  
  by toxonix ( 1793960 ) writes:
  
  NVIDIA's not complaining either. H100's are still ~$40k each. If there are 100k+ H100s, or "bigger than 100k" that will cost bigger than $4,000,000,000.
  - Re: (Score:2)
    
    by toxonix ( 1793960 ) writes:
    
    Or they're just upping the already 25k node cluster to 100k+ which is more likely.
  - Re: (Score:2)
    
    by ZipNada ( 10152669 ) writes:
    
    The Blackwell devices are said to be considerably more cost-effective.
    "build and run real-time generative AI on trillion-parameter large language models at up to 25x less cost and energy consumption than its predecessor"
    https://nvidianews.nvidia.com/... [nvidia.com]
    If that's the case anyone who spent $billions on H100 will feel a little miffed.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  "Apparently it can require huge compute resources to train models, but not nearly so much to run the model. Apparently the Meta Llama models are available for free. I don't see how this is economically feasible for Meta but I'm not complaining."
  Llama3 8B runs on a mid-range smartphone.
  "I'm tempted to experiment with it."
  Do it, you will learn a few things.
  "If you are in the EU you are out of luck."
  Nope. As normal user you want quantized versions anyway and they are neither country-walled nor do they require
- Re:training models versus running them (Score:4, Insightful)
  
  by ink ( 4325 ) writes: on Thursday October 31, 2024 @02:40PM (#64909689) Homepage
  
  "I don't see how this is economically feasible"
  We're in pre-enshitification of AI. Once one or two dominate the technology, they will start "monetizing" it. Also, the more AI shit you can spout on earnings calls, the more horny wall street gets for your sweet sweet stock.
  
How Big Is Your ... (Score:5, Insightful)

by Spinlock_1977 ( 777598 ) writes: <Spinlock_1977NO@SPAMyahoo.com> on Thursday October 31, 2024 @02:24PM (#64909633) Journal

GPU Farm... today's tech bro bragging right.

So? (Score:4, Funny)

by gweihir ( 88907 ) writes: on Thursday October 31, 2024 @02:39PM (#64909685)

If trained on crap data, it will still be a crap LLM. Like all of them, because only crap data is available for training and LLMs are pretty crappy even on good data.

It's not the size of your cluster... (Score:2)

by RedMage ( 136286 ) writes:

... It's how you use it.
On the plus side, there will be a whole lot of nice lightly used GPU servers for sale in a few years. Mostly these machines have moved away from the PCIe card format that consumer GPU's use, so it will be harder to sell them off individually.
- Re: (Score:2)
  
  by Fons_de_spons ( 1311177 ) writes:
  
  Yup, and we'll have a few spare nuclear power plants. These tech bubbles are getting pretty predictable. Let's hope I am wrong.
Bigger than everything else but xAI (Score:2)

by Yo,dog! ( 1819436 ) writes:

Almost two months ago, xAI announced its supercomputer was online sporting 100K H100s, with another 50K H100 and H200s scheduled to be added.
Why setting good cash on fire via AI? (Score:2)

by Mirnotoriety ( 10462951 ) writes:

Anonymous [slashdot.org]: “Someone please explain to me why all these tech companies setting all this perfectly good cash on fire via AI?”

It's to do with embrace, extend and extinguish free discourse on the Web. As in the future, all access to information will be funneled through ClippyAI.
--

Already I've been reported to the mothership for upsetting ChatGPT.
Am I the only one thinking of beowulf clusters? (Score:1)

by blue trane ( 110704 ) writes:

No one else is imagining welcoming a beowulf cluster of AI overlords?
Child sized robots are the key (Score:3)

by LostMyBeaver ( 1226054 ) writes: on Friday November 01, 2024 @01:33AM (#64910989)

Massive ingest of trillions of pieces of data is greatly flawed. It will never perform well since inference will be based purely on observation of non-interactive datasets. The training system could never verify its dataset, at least not on scale.
Now, take a swarm of robots in three sizes, child, pre-teen and young adult. Place them in schools, malls, airports, etc. Link them all to a single common training system. Make them interact with their environments. When a new piece of data that has a large impact on the model is encountered, make a group of the robots fact find and even post on forums asking for explanations.
I expect any AI trained in a way that allows it to confirm it's studies will perform far better than any AI simply blasted by data.

Alpacas better than Llamas ..... (Score:2)

by bsdetector101 ( 6345122 ) writes:

Alpacas are very gentle and shy. Llama is a family of autoregressive large language models that spit on you ! Getting cheeky !

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Meta's Next Llama AI Models Are Training on a GPU Cluster 'Bigger Than Anything' Else (wired.com) 28

Meta's Next Llama AI Models Are Training on a GPU Cluster 'Bigger Than Anything' Else More Login

Meta's Next Llama AI Models Are Training on a GPU Cluster 'Bigger Than Anything' Else

Get off my lawn with this AI shit (Score:1)

Re: (Score:2)

Re: (Score:2)

Garbage in, garbage out (Score:2)

Re: (Score:3)

Comment removed (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

training models versus running them (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re:training models versus running them (Score:4, Insightful)

How Big Is Your ... (Score:5, Insightful)

So? (Score:4, Funny)

It's not the size of your cluster... (Score:2)

Re: (Score:2)

Bigger than everything else but xAI (Score:2)

Why setting good cash on fire via AI? (Score:2)

Am I the only one thinking of beowulf clusters? (Score:1)

Child sized robots are the key (Score:3)

Alpacas better than Llamas ..... (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot