Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Facebook Technology

Meta Sets Up War Rooms To Analyze DeepSeek's Tech (businessinsider.com) 39

Meta has set up four war rooms to analyze DeepSeek's technology, including two focusing on how High-Flyer reduced training costs, and one on what data High-Flyer may have used, The Information's Kalley Huang and Stephanie Palazzolo report. China's DeepSeek is a large-language open source model that claims to rival offerings from OpenAI's ChatGPT and Meta Platforms, while using a much smaller budgets.

Meta Sets Up War Rooms To Analyze DeepSeek's Tech

Comments Filter:
  • by Rei ( 128717 ) on Monday January 27, 2025 @11:56AM (#65122197) Homepage

    They literally released an open paper about it [github.com], so, I mean, wow, much analysis.

    Also, everyone copies everyone else's advancements while incorporating their own new ones. That's how the field works. There's also an open project [github.com] from HuggingFace to replicate it open-source, incl. training code.

    • 'copies' vs theft is a serious concern. and their 'paper' doesn't necessarily mean fact.

      • A white paper is meant to contain enough info to replicate the solution. That is the point.
        • by Hodr ( 219920 )

          That may or may not be the point of this particular white paper (I haven't read it), but in general no, that's not the definition of nor requirement for something to be deemed a white paper.

    • by hjf ( 703092 )

      also it's hosted in Meta's very own ollama repo: https://ollama.com/library/dee... [ollama.com] ahahaha

    • Good info, thanks.

      According to the paper they "pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens" which certainly is a goodly amount. The main innovation appears to be optimization of the training; "requires only 2.788M H800 GPU hours for its full training". And then they helpfully describe the optimization techniques. I assume that Meta will wade through their code so as to thoroughly understand it all and incorporate key features into their own products.

      This could save all the AI play

      • by allo ( 1728082 )

        Including the Chinese tokens that may be of higher entropy it is really much. Llama3 was trained on 5T tokens.

    • My guess is that the parts that really matter, aren't in the paper.

    • by ceoyoyo ( 59147 )

      Yes, "analysis" will be reading the paper, discussing it, and experimenting with the methods described. I suppose describing it as a "war room" isn't helpful but what did you think it meant?

      Hopefully Facebook has taken the lesson that just copying stuff verbatim and it bigger might not be the most efficient method.

      • by Rei ( 128717 )

        Meta has been doing plenty of their own research. One of the big things they've been pushing on is moving away from tokens to patches.

    • Also, everyone copies everyone else's advancements while incorporating their own new ones. That's how the field works.

      That is definitely how Meta works. Well, not so much the "incorporating their own new ones" part.

  • by ArchieBunker ( 132337 ) on Monday January 27, 2025 @12:08PM (#65122235)

    Really getting pumped this morning.

  • Looks like they have decided to go to war with a DDOS on DeepSeek.

  • No body should be touching this.

    • How about an explanation rather than a blanket statement. Why shouldn't people touch this? What reason(s)?

    • by EvilSS ( 557649 )

      No body should be touching this.

      Really, you ran the R1 685B parameter model? You just happen to have about 700GB of vRAM available? Or did you run one of the smaller distills? Because those are not the same as running the full R1 model.

  • by dmay34 ( 6770232 ) on Monday January 27, 2025 @12:42PM (#65122385)

    US AI companies in 2024: "Copyrights aren't applicable to training data."

    US AI companies in 2025: "Chinese AI companies are stealing our user data!"

    • The IP in question here is a patent, not copyright. https://www.reuters.com/techno... [reuters.com]

      • by dmay34 ( 6770232 )

        Right. Tooooootttttaaaallllllyyyyy different.

        • Yes, actually they are...totally different.

          A copyright protects your *specific work* from duplication without your permission.
          A patent protects your *methods* from duplication without your permission.

          Copyright is automatic, your work is copyrighted unless you specify otherwise.
          Patents must be applied for and approved, and even then, you must protect them yourself by suing anyone who infringes.

  • I would imagine DeepSeek is a literal copy paste of some other publicly available code+data with minor changes if any
  • Meta/Facebook has decided Linux is malware, check distrowatch weekly News
  • That 'war room' should already be up and running. Competitive intelligence is a cruicial part of any significant business. Automakers take each other's cars apart. Sandy Munroe takes everybody's apart and sells the info. High tech companies are constantly studying each others' advancements to learn. If Meta really had to set up two new war rooms, I'd say it's asleep at the wheel.

  • It causes stress for the companies producing the technology, but we all benefit from the competition.

    • Not really. You think in terms of old school selling products.
      Tech companies in the US now are in the business of selling two things:
      1) Data from their users.
      2) Hype to feed speculation bubbles

      Products are just a mean to achieve those goals.

      • Everybody, from nonprofits to big tech, has some kind of ulterior motivation. Thankfully, competition doesn't require "pure" motives to be effective. Any kind of (non-criminal) motivation will do.

  • First, I never met a phor I didn't like.

    Isn't this the plot line in the story about mainframes with dumb terminals decentralized to distribute workload to PCs?

    Western AI relies on ridiculous centralized data centers while DeepSeek R1 pushes the load to the desktop. Look at the specs to run a local copy of it.

  • by kwelch007 ( 197081 ) on Monday January 27, 2025 @05:45PM (#65123417) Homepage

    For the sake of discussion, let's presume DeepSeek has actually found a far more efficient (~20x) way to train and run these models. That is, they can keep up with the likes of OpenAI, Google, xAI, Meta, whoever, which 1/20th the hardware and electricity. Let's just say that's true.

    What happens when OpenAI/Google/xAI/Meta reverse-engineer and implement their own version of this and then run it on their massive compute platforms? Does that mean ChatCPT/Gemini/etc are now 20x more powerful?

    I'm sure the curve isn't quite that straight, but if even close, I'm not sure I see how this makes DeepSeek so valuable, or conversely the other player less valuable. The standard of product per unit of input just gets higher.

    That's assuming this is real, and that it can scale. There's a lot of "assume" in that.

    • The training datasets will simply get larger. Data is growing exponentially and faster than computing power. Even with a 20x speedup, that's a single blip in the computing power curve.

Never buy what you do not want because it is cheap; it will be dear to you. -- Thomas Jefferson

Working...