'Results Were Fudged': Departing Meta AI Chief Confirms Llama 4 Benchmark Manipulation (ft.com) 32
Yann LeCun, Meta's outgoing chief AI scientist and one of the pioneers credited with laying the groundwork for modern AI, has acknowledged that the company's Llama 4 language model had its benchmark results manipulated before its April 2025 release. In an interview with the Financial Times, LeCun said the "results were fudged a little bit" and that the team "used different models for different benchmarks to give better results."
Llama 4 was widely criticized as a flop at launch, and the company faced accusations of gaming benchmarks to make the model appear more capable than it was. LeCun said CEO Mark Zuckerberg was "really upset and basically lost confidence in everyone who was involved" in the release.
Zuckerberg subsequently "sidelined the entire GenAI organisation," according to LeCun. "A lot of people have left, a lot of people who haven't yet left will leave." LeCun himself is departing Meta after more than a decade to start a new AI research venture called Advanced Machine Intelligence Labs. He described the new hires brought in for Meta's superintelligence efforts as "completely LLM-pilled" -- a technology LeCun has repeatedly called "a dead end when it comes to superintelligence."
Llama 4 was widely criticized as a flop at launch, and the company faced accusations of gaming benchmarks to make the model appear more capable than it was. LeCun said CEO Mark Zuckerberg was "really upset and basically lost confidence in everyone who was involved" in the release.
Zuckerberg subsequently "sidelined the entire GenAI organisation," according to LeCun. "A lot of people have left, a lot of people who haven't yet left will leave." LeCun himself is departing Meta after more than a decade to start a new AI research venture called Advanced Machine Intelligence Labs. He described the new hires brought in for Meta's superintelligence efforts as "completely LLM-pilled" -- a technology LeCun has repeatedly called "a dead end when it comes to superintelligence."
Okay, this is Meta (Score:5, Insightful)
Frankly, is there anyone who didn't already assume they weren't being honest? Lying is "in their DNA", as the saying goes.
Re: (Score:3)
LeCun said CEO Mark Zuckerberg was "really upset and basically lost confidence in everyone who was involved" in the release.
Only those that can prevaricate believably will please the Zuk. Confidence at Meta is built on a tissue of lies.
Re:Okay, this is Meta (Score:4, Interesting)
I suspect that the team was given impossible goals to hit. Probably aggressive deadlines too. The same story everywhere.
I am not saying that makes it ok to cheat. I am just saying that problems like this start at the top, so being "really upset and losing confidence" is no evidence that leadership stands blameless for the team's failure.
Re: (Score:2)
Indeed. Essentially gross leadership failure.
Scientists and engineers with intact personal integrity will leave at that point. The smart ones will leave when demands like that are made, because this invariably backfires.
Re: (Score:2)
Almost certainly.
Past a certain point, management doesn't care how you do something, as long as it gets done, because they aren't going to be the one that gets fired for doing it wrong, you are.
Like the entire corporate responsibly model is really upside down. If numbers were being fudged, then whoever signed off on it, should be fired. Period. Likewise just fail when given impossible directions, fail early and save the costs from death-spiraling out of control.
Evil company lies. Details at 11. (Score:2, Interesting)
Concurrence, but not so much with the moderators.
Had another AI encounter today. Went as bad as usual. Each time I think I've figured out the worst thing about genAI, it apparently provokes me into thinking of something worse than that...
Today's result? Useless code and a feeling that writing more than I want to read is pretty annoying. Not only did the AI waste the electricity spewing it, but then my time is wasted trying to figure out what parts are relevant or useful. I'd estimate my current averages are
Re: Evil company lies. Details at 11. (Score:1)
When your code breaks after you thought it was all done, do you acknowledge that you hallucinated or do you find some way to blame something else?
Re: (Score:2)
Worthless drivel. Did an AI write it for you?
NAK
Re: Evil company lies. Details at 11. (Score:2)
Not my experience at all. Your prompting has to suck, and youâ(TM)re using excuses to justify your ignorance. On top of that, if you havenâ(TM)t even heard of Llama then you truly do live under a self-imposed rock.
Re: (Score:2)
NAK
LLM pilled? (Score:3, Insightful)
Re: (Score:1)
Meaning the people were devotees of all things AI. They'd taken the pill.
Which is funny to hear coming from the guy who headed the AI program at Meta. Did he think they should hire people who didin't care about AI?
Re: LLM pilled? (Score:5, Insightful)
Oh dear (Score:2, Informative)
Sounds like Llama is going to be about as successful as the "metaverse". I guess this is what happens to a company whose foundations are built on the sands of IP theft and in the right place at the right time luck.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
That sad thing is that Zuck-the-Fuck has the money for these massively expensive pathetic failures. Something is really wrong with the human race as a group.
Not fudged! (Score:5, Funny)
The results were hallucinated. Get your corporate team speak terms correct people. The founders of AI would never intentionally lie. They are not sociopaths in any sense of the meaning. They are the benevolent holders of the newly born AI's hand. Llama states this explicitly.
Self-deception (Score:5, Insightful)
The people using these systems are not going to be fooled by faked benchmarks. The only people being fooled here is Meta and it's investors.
Re: Self-deception (Score:1)
Gemini:
"there are two errors in the second sentence:
Subject-Verb Agreement: "The only people... is" should be "The only people... are" to match the plural subject.
Possessive Error: "it's" (it is) should be "its" (belonging to it)."
He keeps saying that (Score:3)
He keeps saying that but AFAICT the block diagram for his JEPA solution [youtu.be] is the same thing just predicting next floating latent space token instead of discrete word-token. Which is very powerful and cool but I mean its not like he is getting rid of backprop or convolution or even attention really. He should stop attacking his older work and just be a rock star.
Re:He keeps saying that (Score:4, Interesting)
I really don't know why LeCun is a rock star. It seems his main achievements have been an early invention/application of convnets for reading handwriting, and an early involvement with EBMs (interesting, but didn't really lead to anything). His claim to have invented convnets seems a bit dodgy since these (originally just considered as weight-sharing between kernels applied at different positions) seem to have first been mentioned by Hinton in the PDP handbook.
That said, I do think JEPA is a step in the right direction since the model is now essentially predicting the external environment (per it's own latent space sensory representations) as opposed to an LLM which is auto-regressive - predicting it's own generative continuations.
JEPA isn't exactly ground breaking - it's widely understood that animal/human brains are predicting the external world, not just predicting auto-regressive behavioral continuations (although we do that too), but at least LeCun is a fairly rare voice pointing out that ultimately, on the quest for human-level intelligence, LLMs are a dead end. LLMs are very useful, and will get better, buy they are what they are - ultimately more akin to expert systems, packaging canned knowledge, than animals.
Re: He keeps saying that (Score:1)
Gemini:
The "canned knowledge" vs. "animal intelligence" distinction ignores context-sensitivity. LLMs aren't just static expert systems; via in-context learning, they adapt to novel scenarios at inference time. This isn't a database lookup; it is a dynamic computation where the "latent state" shifts based on the prompt.
The real delta between LLMs and JEPA isn't just about "predicting the world"â"it's about the objective function. LLMs are sensitive to linguistic context, but JEPA tries to be sensitive
Re: (Score:2)
Well, yes, LLMs and JEPA are both predictors, and it's the objective function that sets them apart, but the significant difference is that LLMs are trying to predict what they themselves are going to "do" next, while JEPA is trying to predict what the external world is going to do next.
It's maybe just as well that LLM's don't learn at runtime since then they'd learn to self-predict even better and maybe get into some dysfunctional feedback loop.
The whole point of JEPA, with it's outward vs inward looking ob
Results were fudged (Score:2)
Re: (Score:2)
They obviously need ... (Score:3)
Ralph the Wonder Llama.