Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Facebook

'Mistral is Peanuts For Us': Meta Execs Obsessed Over Beating OpenAI's GPT-4 Internally, Court Filings Reveal (techcrunch.com) 25

Executives and researchers leading Meta's AI efforts obsessed over beating OpenAI's GPT-4 model while developing Llama 3, according to internal messages unsealed by a court in one of the company's ongoing AI copyright cases, Kadrey v. Meta. From a report: "Honestly... Our goal needs to be GPT-4," said Meta's VP of Generative AI, Ahmad Al-Dahle, in an October 2023 message to Meta researcher Hugo Touvron. "We have 64k GPUs coming! We need to learn how to build frontier and win this race."

Though Meta releases open AI models, the company's AI leaders were far more focused on beating competitors that don't typically release their model's weights, like Anthropic and OpenAI, and instead gate them behind an API. Meta's execs and researchers held up Anthropic's Claude and OpenAI's GPT-4 as a gold standard to work toward. The French AI startup Mistral, one of the biggest open competitors to Meta, was mentioned several times in the internal messages, but the tone was dismissive. "Mistral is peanuts for us," Al-Dahle said in a message. "We should be able to do better," he said later.

This discussion has been archived. No new comments can be posted.

'Mistral is Peanuts For Us': Meta Execs Obsessed Over Beating OpenAI's GPT-4 Internally, Court Filings Reveal

Comments Filter:
  • by jenningsthecat ( 1525947 ) on Wednesday January 15, 2025 @01:06PM (#65091285)

    This time around it's the Llama that wants to whip somebody else's ass.

  • Mistral is alright but Llama really is the best set of open-weights models out there. I don't really know why they give it away.

    GPT-4 is rapidly evolving into a different thing that's more than an LLM, with tie-ins to lots of different information sources, access to sandboxed phython runtime to calculate answers, a 'projects' construct for group discussions with related people and documents, and so on...

    • Re:Thank goodness (Score:5, Interesting)

      by Shaitan ( 22585 ) on Wednesday January 15, 2025 @02:17PM (#65091519)

      "I don't really know why they give it away."

      It's the only ethical thing to do for humanity... I say screw copyright, let them eat everything but also leave the weights fully open, all the functional tuning but without any censorship or guardrails applied whatsoever. If they also want to host it and build a service around that or make a more business friendly version with censorship/guardrails applied to that open core there is absolutely nothing wrong with that.

      Let's not forget, OpenAI was founded and created to do this as an open effort as well, they produced something with commercial potential and instantly screwed their funders and the world.

      • It's the only ethical thing to do for humanity...

        Okay, but we're talking about Meta/Facebook - so that seems unlikely to be even a minor consideration.

        • You would think- but llama is wide open, and llama3.3 is the baddest LLM you can run locally.
          I don't know exactly what to make of it, either.
          • I mean, I think what it comes down to is two things: 1) They weren't first 2) They aren't best So, they needed a niche that they can stake a claim to that will drive people to choose their product specifically. Being the definitive cornerstone of the "open" landscape of a groundbreaking piece of tech is a pretty damn good choice. The analogy is probably outdated, but they couldn't be McDonalds, so they chose to be Burger King.
          • You would think- but llama is wide open, and llama3.3 is the baddest LLM you can run locally.
            I don't know exactly what to make of it, either.

            DeepSeek v3 is better than llama3.3 or GPT-4 but you need a lot of RAM to run it.

            • Fair- but we should quantify lots ;)
              It's 671B parameters.
              I.e., realistically, you can't run it. Period. You need to use an API.
              • Fair- but we should quantify lots ;)

                Lots = 0.5 TB assuming 4-5 bit quant. When I last checked pricing nearly two years ago cost for registered DDR5 was about the same as a 4090. It's a lot but not crazy or unreasonably so.

                It's 671B parameters.
                I.e., realistically, you can't run it. Period. You need to use an API.

                Runs fine on my PC. While DeepSeek v3 is massive only 37B parameters are active. To put it into perspective I get about the same performance running llama3.3 with half of the model offloaded to VRAM as running DeepSeek v3 entirely from RAM.

                • Lots = 0.5 TB assuming 4-5 bit quant. When I last checked pricing nearly two years ago cost for registered DDR5 was about the same as a 4090. It's a lot but not crazy or unreasonably so.

                  The cost in loading a half TB into RAM isn't the RAM. It's the computer that can house the 1TB of RAM.

                  Runs fine on my PC. While DeepSeek v3 is massive only 37B parameters are active. To put it into perspective I get about the same performance running llama3.3 with half of the model offloaded to VRAM as running DeepSeek v3 entirely from RAM.

                  Are active *at a time*.
                  Different 21GB chunks at Q5 need to be shuffled in and out. So ya- it'll run fine on an RTX3090 or RTX4090 on a machine that can have 1TB of RAM at PCIe 3 or 4 speeds.
                  We just described $10,000 of computer, but sure.
                  Otherwise, your definition of fine, and mine, aren't one and the same.

                  • The cost in loading a half TB into RAM isn't the RAM. It's the computer that can house the 1TB of RAM.

                    Just checked and motherboard + matching low end 4th gen Xeon CPU is $1k

                    Are active *at a time*.
                    Different 21GB chunks at Q5 need to be shuffled in and out. So ya- it'll run fine on an RTX3090 or RTX4090 on a machine that can have 1TB of RAM at PCIe 3 or 4 speeds.

                    Like I said I run the model entirely on CPU. A GPU is neither used nor useful. Nothing is being shuffled. As bottleneck is memory bandwidth (/w AMX anyway) not compute nobody is transferring memory at inference time especially over a 4th gen bus as such operations would be far more costly than not offloading compute in the first place.

                    We just described $10,000 of computer, but sure.
                    Otherwise, your definition of fine, and mine, aren't one and the same.

                    Didn't even spend half that - low end more like $3.5k in line with cost of a high end gaming PC.

                    • Just checked and motherboard + matching low end 4th gen Xeon CPU is $1k

                      4th gen Xeon?! Uhhhhh.....

                      Like I said I run the model entirely on CPU. A GPU is neither used nor useful. Nothing is being shuffled. As bottleneck is memory bandwidth (/w AMX anyway) not compute nobody is transferring memory at inference time especially over a 4th gen bus as such operations would be far more costly than not offloading compute in the first place.

                      Missed that part. That explains it.
                      And yes- when you're using an MoE model, it needs to swap layers depending on which agents are being used.
                      This means you want the entire model in VRAM. This is usually done with a cluster of GPUs and different agents offloaded to different GPUs.

                      Didn't even spend half that - low end more like $3.5k in line with cost of a high end gaming PC.

                      Yes, what I said doesn't apply to CPU-only inference.

                      CPU-only inference you get, what, 2-4 t/s?
                      As I mentioned, I don't consider that usable- but you might.
                      I suppose, for the sake of argument, it's bet

                    • 4th gen Xeon?! Uhhhhh.....

                      Advantage of 4th gen (Sapphire Rapids) and later processors are huge array registers for matrix operations. It gets you 4x inference speedup vs. AVX-512.

                      CPU-only inference you get, what, 2-4 t/s?
                      As I mentioned, I don't consider that usable- but you might.
                      I suppose, for the sake of argument, it's bettern than 0 t/s. ...
                      realistically, you can't run it. Period ...
                      You would think- but llama is wide open, and llama3.3 is the baddest LLM you can run locally

                      Performance is the same as llama3.3 which is about normal reading speed for me. A far cry from unusable for better than GPT-4 quality locally.

                      Unless you've got multiple high end GPUs or like using severely bit-starved quants llama3.3 isn't going to run any faster than DeepSeek v3 on a given machine.

                    • Unless you've got multiple high end GPUs or like using severely bit-starved quants llama3.3 isn't going to run any faster than DeepSeek v3 on a given machine.

                      With Llama 3.3 70B Q8_0, full GPU offload, I get 8t/s. 3.53t/s with CPU-only. M4 Max.
                      I'm pretty sure my CPU handily outperforms any Sapphire Rapids Xeon ever made.

                      I'm pretty skeptical.
                      I'll concede that I'd have trouble running this model at acceptable performance on a PC of any reasonable cost- but that's kind of my point about DeepCoder v3.
                      I'll look for some benchmarks, but I'm betting it's pretty bad.

                    • *DeepSeek v3
    • Ya, Llama3.3-70B eats Mistral for breakfast.
      I use it in all of my ollama-connected applications.
    • by jma05 ( 897351 )

      It's not the best. Qwen is better and so is Deepseek. They are all open weights.

      None of these are first, second or even the third best. So the only way they can remain relevant is by being open weight models. If Llama ever out does everyone else, I doubt it will remain open.

      All the other features you mention around GPT-4 (web search, code interpreter) have nothing to do with the model, they are quite simple features and all other models can do them as well. Several open source implementations exist for thes

  • Rumors (Score:4, Funny)

    by 93 Escort Wagon ( 326346 ) on Wednesday January 15, 2025 @01:52PM (#65091439)

    Internally, OpenAI refers to its own model as "Winamp".

  • The advances Meta has brought to AR/VR and AI are big steps toward making up for the evils of data collection. Not to mention moving toward a community notes type system and helping empower people with free speech and truth rather than the 'truth cartel.'

    If you are going to steal my data, using it to make sure nobody owns AI is something worth doing that for.

  • I am beginning to feel that FB is not long for this world.

  • "Mistral is peanuts for us," Al-Dahle said in a message. "We should be able to do better," he said later.

    This implies they have less than peanuts.

    • It can. It can also (though definitely odd word choice) imply that they're better, and should aim for better.
      Since Llama3.3 is objectively better than Mistral at just about everything, I'd say that was probably the intended meaning.
  • TFA is about s copyright lawsuit, which is interesting. The fact that someone benchmarks their product against the leader is boring...

My problem lies in reconciling my gross habits with my net income. -- Errol Flynn Any man who has $10,000 left when he dies is a failure. -- Errol Flynn

Working...