

Meta Got Caught Gaming AI Benchmarks 24
Meta released two new Llama 4 models over the weekend -- Scout and Maverick -- with claims that Maverick outperforms GPT-4o and Gemini 2.0 Flash on benchmarks. Maverick quickly secured the number-two spot on LMArena, behind only Gemini 2.5 Pro.
Researchers have since discovered that Meta used an "experimental chat version" of Maverick for LMArena testing that was "optimized for conversationality" rather than the publicly available version.
In response, LMArena said "Meta's interpretation of our policy did not match what we expect from model providers" and announced policy updates to prevent similar issues.
Researchers have since discovered that Meta used an "experimental chat version" of Maverick for LMArena testing that was "optimized for conversationality" rather than the publicly available version.
In response, LMArena said "Meta's interpretation of our policy did not match what we expect from model providers" and announced policy updates to prevent similar issues.
Is this a surprise? (Score:5, Insightful)
Re:Is this a surprise? (Score:4, Insightful)
A lot of people say, "That's unethical AND illegal, I'm not going to do it." Facebook says, "That's unethical AND illegal...but when I get caught, can I blame someone else?" They're at a different level.
Re: (Score:2)
I was gonna say Facebook is more like garbage dumps, waste producing industrial plants, and commercial farmland (both mega livestock and corn/soy fields) where their runoff enters our water supply and poisons us all. We wouldn't accuse any of them for a lack of "ethics." That's the kind of pass Facebook gets.
So, mostly, I think you have the receiving body of water incorrect as well as the source of the slop.
Re: (Score:2)
Oh god. Mod parent and OP funny, even if it is ACs. So true.
Trained on slop (Score:2)
Any actual penalties? (Score:4)
Re: (Score:3)
At the university, if students are caught cheating (to gain advantage over their peers) there are penalties... What penalty/fine does Meta get for this I wonder, since they use these benchmarks to drive investment? Ah yeah, nothing - apparently the greater the stakes the less the accountability in the corporate space.
There was a time, though it now seems long ago, when publicly naming and shaming a company for what should be seen as outrageous gaming schemes would have resulted in a bit of public backlash, and a tendency among investors/purchasers to reconsider further business with the company until it cleaned up its act. Unfortunately, we're now in a time where this type of gamification is actually seen as a positive. "See? They're willing to cheat to win! That means they've got what it takes!" For some reason, we've
Re: (Score:2)
It is generational. We've seen here on /. over the past decade+ a generation of poster who believe the whole point is to lie and cheat.
Something I learned in business is that an organization evolves to reflect the values of its leadership. If a corporation values integrity, it retains people with integrity and loses employees who lack it. Same with greed, a greedy corporation retains and attracts greedy people and rids itself of the opposite.
What we see now is a manifestation of the values of leadership
Re: (Score:2)
It is generational. We've seen here on /. over the past decade+ a generation of poster who believe the whole point is to lie and cheat.
Something I learned in business is that an organization evolves to reflect the values of its leadership. If a corporation values integrity, it retains people with integrity and loses employees who lack it. Same with greed, a greedy corporation retains and attracts greedy people and rids itself of the opposite.
What we see now is a manifestation of the values of leadership we accepted. This is the end game of reaganomics, the sad result of decades of unbridled selfishness and dishonesty. The worshipping of Ayn Rand, the "fuck you I got mine" mentality, the pulling up the ladder at every opportunity. And it's accelerating.
It should also be clear that this has existed all along in US society, it's just that now it victimizes white people too.
The victim pool has to continue growing as wealth consolidation continues to shrink the pool of those considered "well off," by aggregating more of the value in fewer hands. I just wonder at what point will it finally be enough, or if there is no enough. Maybe it'll just come down to all out war between wealthy individuals until there is only one left.
Re: (Score:2)
You remember when Sony spent decades and multiple generations of physical media trying to get an incumbent technology that everyone else would have to pay them licensing for? First with betamax, then minidisc, finally with blu-ray they succeeded.
And... suddenly people found alternatives to the physical media market.
This is a pretty reductive anecdote, but the point is... It's possible to win a battle in a way that completely loses the war.
Re: (Score:2)
Probably none. The usual AI morons will ignore this and sane peopel already know most of the current AI hype is a scam.
Re: (Score:2)
What should happen to them? Should they be disallowed to release their models? Can someone prevent them from using a model with unfair benchmarks in their services? Nothing will happen at all.
Re: (Score:2)
The investor class actually admires a good liar. That's how you get CEOs who run a company into a ground and end up getting another CEO gig. The dishonesty is seen as a feature.
Re: (Score:2)
Hmm, Only 22% AI Articles This Morning : ) (Score:3, Insightful)
Re: (Score:2)
Yeah, those tariffs/trade wars are inciting a slew of "this company is raising prices" "this company is accelerating their downstream supply line" articles.
There's actually a bigger mess than AI out there rn.
Such a surprise (Score:3)
Benchmarks used for marketing are an invitation to gaming them. The only somewhat useful benchmarks are independently done secret ones, and even those are of limited use. The usual morons do not get that though, because they do not understand the real world at all.
Llama 4 (Score:2)