Meta Sets Up War Rooms To Analyze DeepSeek's Tech (businessinsider.com) 39
Meta has set up four war rooms to analyze DeepSeek's technology, including two focusing on how High-Flyer reduced training costs, and one on what data High-Flyer may have used, The Information's Kalley Huang and Stephanie Palazzolo report. China's DeepSeek is a large-language open source model that claims to rival offerings from OpenAI's ChatGPT and Meta Platforms, while using a much smaller budgets.
"Analyze DeepSeek's technology" (Score:5, Informative)
They literally released an open paper about it [github.com], so, I mean, wow, much analysis.
Also, everyone copies everyone else's advancements while incorporating their own new ones. That's how the field works. There's also an open project [github.com] from HuggingFace to replicate it open-source, incl. training code.
Re: (Score:2)
'copies' vs theft is a serious concern. and their 'paper' doesn't necessarily mean fact.
Re: (Score:2)
Re: (Score:2)
That may or may not be the point of this particular white paper (I haven't read it), but in general no, that's not the definition of nor requirement for something to be deemed a white paper.
Re: (Score:2)
also it's hosted in Meta's very own ollama repo: https://ollama.com/library/dee... [ollama.com] ahahaha
Re: (Score:3)
Good info, thanks.
According to the paper they "pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens" which certainly is a goodly amount. The main innovation appears to be optimization of the training; "requires only 2.788M H800 GPU hours for its full training". And then they helpfully describe the optimization techniques. I assume that Meta will wade through their code so as to thoroughly understand it all and incorporate key features into their own products.
This could save all the AI play
Re: (Score:2)
Including the Chinese tokens that may be of higher entropy it is really much. Llama3 was trained on 5T tokens.
Re: (Score:3)
My guess is that the parts that really matter, aren't in the paper.
Re: (Score:2)
Yes, "analysis" will be reading the paper, discussing it, and experimenting with the methods described. I suppose describing it as a "war room" isn't helpful but what did you think it meant?
Hopefully Facebook has taken the lesson that just copying stuff verbatim and it bigger might not be the most efficient method.
Re: (Score:2)
Meta has been doing plenty of their own research. One of the big things they've been pushing on is moving away from tokens to patches.
Re: (Score:2)
Also, everyone copies everyone else's advancements while incorporating their own new ones. That's how the field works.
That is definitely how Meta works. Well, not so much the "incorporating their own new ones" part.
DeepSeek (Score:3)
Really getting pumped this morning.
DDOS war (Score:2)
Looks like they have decided to go to war with a DDOS on DeepSeek.
Re: (Score:3)
Saw this pop up a bit ago: https://www.cnbc.com/2025/01/2... [cnbc.com]
Ran A Copy Locally (Score:2)
No body should be touching this.
Re: (Score:2)
How about an explanation rather than a blanket statement. Why shouldn't people touch this? What reason(s)?
Re: (Score:2)
No body should be touching this.
Really, you ran the R1 685B parameter model? You just happen to have about 700GB of vRAM available? Or did you run one of the smaller distills? Because those are not the same as running the full R1 model.
Copyrights aren't applicable to AI.. Not like that (Score:5, Insightful)
US AI companies in 2024: "Copyrights aren't applicable to training data."
US AI companies in 2025: "Chinese AI companies are stealing our user data!"
Re: (Score:2)
The IP in question here is a patent, not copyright. https://www.reuters.com/techno... [reuters.com]
Re: (Score:2)
Right. Tooooootttttaaaallllllyyyyy different.
Re: (Score:2)
Yes, actually they are...totally different.
A copyright protects your *specific work* from duplication without your permission.
A patent protects your *methods* from duplication without your permission.
Copyright is automatic, your work is copyrighted unless you specify otherwise.
Patents must be applied for and approved, and even then, you must protect them yourself by suing anyone who infringes.
Copy Pasted other's code (Score:2)
Re: (Score:2)
IS the code open source? The source I ran across earlier (on Slashdot) said the weights were open source, but not the code. Do you have a source that claims otherwise? (Preferably a link)
OT: in other news (Score:1)
War Rooms? Why? (Score:2)
That 'war room' should already be up and running. Competitive intelligence is a cruicial part of any significant business. Automakers take each other's cars apart. Sandy Munroe takes everybody's apart and sells the info. High tech companies are constantly studying each others' advancements to learn. If Meta really had to set up two new war rooms, I'd say it's asleep at the wheel.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
This is how competition is supposed to work (Score:2)
It causes stress for the companies producing the technology, but we all benefit from the competition.
Hype and Data. (Score:2)
Not really. You think in terms of old school selling products.
Tech companies in the US now are in the business of selling two things:
1) Data from their users.
2) Hype to feed speculation bubbles
Products are just a mean to achieve those goals.
Re: (Score:2)
Everybody, from nonprofits to big tech, has some kind of ulterior motivation. Thankfully, competition doesn't require "pure" motives to be effective. Any kind of (non-criminal) motivation will do.
Mainframe Decentralization Metaphor (Score:2)
First, I never met a phor I didn't like.
Isn't this the plot line in the story about mainframes with dumb terminals decentralized to distribute workload to PCs?
Western AI relies on ridiculous centralized data centers while DeepSeek R1 pushes the load to the desktop. Look at the specs to run a local copy of it.
Let's suppose this is true (Score:3)
For the sake of discussion, let's presume DeepSeek has actually found a far more efficient (~20x) way to train and run these models. That is, they can keep up with the likes of OpenAI, Google, xAI, Meta, whoever, which 1/20th the hardware and electricity. Let's just say that's true.
What happens when OpenAI/Google/xAI/Meta reverse-engineer and implement their own version of this and then run it on their massive compute platforms? Does that mean ChatCPT/Gemini/etc are now 20x more powerful?
I'm sure the curve isn't quite that straight, but if even close, I'm not sure I see how this makes DeepSeek so valuable, or conversely the other player less valuable. The standard of product per unit of input just gets higher.
That's assuming this is real, and that it can scale. There's a lot of "assume" in that.
Re: (Score:2)