Meta Got Caught Gaming AI Benchmarks 24

Posted by msmash on Tuesday April 08, 2025 @10:00AM from the how-about-that dept.

Meta released two new Llama 4 models over the weekend -- Scout and Maverick -- with claims that Maverick outperforms GPT-4o and Gemini 2.0 Flash on benchmarks. Maverick quickly secured the number-two spot on LMArena, behind only Gemini 2.5 Pro.

Researchers have since discovered that Meta used an "experimental chat version" of Maverick for LMArena testing that was "optimized for conversationality" rather than the publicly available version.

In response, LMArena said "Meta's interpretation of our policy did not match what we expect from model providers" and announced policy updates to prevent similar issues.

Meta Got Caught Gaming AI Benchmarks

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 24 Comments Log In/Create an Account

Comments Filter:

Is this a surprise? (Score:5, Insightful)

by Snotnose ( 212196 ) writes: on Tuesday April 08, 2025 @10:08AM (#65289587)

Meta isn't exactly a paragon of corporate virtue. More like the swamp the sewers flow into.

- Re:Is this a surprise? (Score:4, Insightful)
  
  by phantomfive ( 622387 ) writes: on Tuesday April 08, 2025 @10:26AM (#65289629) Journal
  
  Facebook ethics are criminal ethics, and with the current hype environment, with open source benchmarks [slashdot.org] there's a lot of incentive to cheat.
  
  A lot of people say, "That's unethical AND illegal, I'm not going to do it." Facebook says, "That's unethical AND illegal...but when I get caught, can I blame someone else?" They're at a different level.
  
- Re: (Score:2)
  
  by Torodung ( 31985 ) writes:
  
  I was gonna say Facebook is more like garbage dumps, waste producing industrial plants, and commercial farmland (both mega livestock and corn/soy fields) where their runoff enters our water supply and poisons us all. We wouldn't accuse any of them for a lack of "ethics." That's the kind of pass Facebook gets.
  So, mostly, I think you have the receiving body of water incorrect as well as the source of the slop.
- - Re: (Score:2)
    
    by Torodung ( 31985 ) writes:
    
    Oh god. Mod parent and OP funny, even if it is ACs. So true.
Trained on slop (Score:2)

by xack ( 5304745 ) writes:

And once again it's humans that save the day by providing custom training. Your new job is cleaning up after AI forever.
Any actual penalties? (Score:4)

by fleeped ( 1945926 ) writes: on Tuesday April 08, 2025 @10:17AM (#65289609)

At the university, if students are caught cheating (to gain advantage over their peers) there are penalties... What penalty/fine does Meta get for this I wonder, since they use these benchmarks to drive investment? Ah yeah, nothing - apparently the greater the stakes the less the accountability in the corporate space.

- Re: (Score:3)
  
  by nightflameauto ( 6607976 ) writes:
  
  At the university, if students are caught cheating (to gain advantage over their peers) there are penalties... What penalty/fine does Meta get for this I wonder, since they use these benchmarks to drive investment? Ah yeah, nothing - apparently the greater the stakes the less the accountability in the corporate space.
  There was a time, though it now seems long ago, when publicly naming and shaming a company for what should be seen as outrageous gaming schemes would have resulted in a bit of public backlash, and a tendency among investors/purchasers to reconsider further business with the company until it cleaned up its act. Unfortunately, we're now in a time where this type of gamification is actually seen as a positive. "See? They're willing to cheat to win! That means they've got what it takes!" For some reason, we've
  - Re: (Score:2)
    
    by dfghjk ( 711126 ) writes:
    
    It is generational. We've seen here on /. over the past decade+ a generation of poster who believe the whole point is to lie and cheat.
    Something I learned in business is that an organization evolves to reflect the values of its leadership. If a corporation values integrity, it retains people with integrity and loses employees who lack it. Same with greed, a greedy corporation retains and attracts greedy people and rids itself of the opposite.
    What we see now is a manifestation of the values of leadership
    - Re: (Score:2)
      
      by nightflameauto ( 6607976 ) writes:
      
      It is generational. We've seen here on /. over the past decade+ a generation of poster who believe the whole point is to lie and cheat.
      Something I learned in business is that an organization evolves to reflect the values of its leadership. If a corporation values integrity, it retains people with integrity and loses employees who lack it. Same with greed, a greedy corporation retains and attracts greedy people and rids itself of the opposite.
      What we see now is a manifestation of the values of leadership we accepted. This is the end game of reaganomics, the sad result of decades of unbridled selfishness and dishonesty. The worshipping of Ayn Rand, the "fuck you I got mine" mentality, the pulling up the ladder at every opportunity. And it's accelerating.
      It should also be clear that this has existed all along in US society, it's just that now it victimizes white people too.
      The victim pool has to continue growing as wealth consolidation continues to shrink the pool of those considered "well off," by aggregating more of the value in fewer hands. I just wonder at what point will it finally be enough, or if there is no enough. Maybe it'll just come down to all out war between wealthy individuals until there is only one left.
  - Re: (Score:2)
    
    by Gideon Fubar ( 833343 ) writes:
    
    You remember when Sony spent decades and multiple generations of physical media trying to get an incumbent technology that everyone else would have to pay them licensing for? First with betamax, then minidisc, finally with blu-ray they succeeded.
    And... suddenly people found alternatives to the physical media market.
    This is a pretty reductive anecdote, but the point is... It's possible to win a battle in a way that completely loses the war.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Probably none. The usual AI morons will ignore this and sane peopel already know most of the current AI hype is a scam.
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  What should happen to them? Should they be disallowed to release their models? Can someone prevent them from using a model with unfair benchmarks in their services? Nothing will happen at all.
- Re: (Score:2)
  
  by hey! ( 33014 ) writes:
  
  The investor class actually admires a good liar. That's how you get CEOs who run a company into a ground and end up getting another CEO gig. The dishonesty is seen as a feature.
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  This story should be bookmarked and brought up every time Facebook has some research story.
Hmm, Only 22% AI Articles This Morning : ) (Score:3, Insightful)

by BrendaEM ( 871664 ) writes: on Tuesday April 08, 2025 @10:28AM (#65289635) Homepage

Let's see, seeing that there is still some other Information and technology news in the world. And also seeing that Slashdot has 8 characters, so we would only need to change 1.8 characters to AI. So it would Only be Alashdot, for today.

- Re: (Score:2)
  
  by Torodung ( 31985 ) writes:
  
  Yeah, those tariffs/trade wars are inciting a slew of "this company is raising prices" "this company is accelerating their downstream supply line" articles.
  There's actually a bigger mess than AI out there rn.
Such a surprise (Score:3)

by gweihir ( 88907 ) writes: on Tuesday April 08, 2025 @12:30PM (#65289903)

Benchmarks used for marketing are an invitation to gaming them. The only somewhat useful benchmarks are independently done secret ones, and even those are of limited use. The usual morons do not get that though, because they do not understand the real world at all.

Llama 4 (Score:2)

by hcs_$reboot ( 1536101 ) writes:

https://www.reddit.com/r/Local... [reddit.com]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Meta Got Caught Gaming AI Benchmarks 24

Meta Got Caught Gaming AI Benchmarks More Login

Meta Got Caught Gaming AI Benchmarks

Is this a surprise? (Score:5, Insightful)

Re:Is this a surprise? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Trained on slop (Score:2)

Any actual penalties? (Score:4)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Hmm, Only 22% AI Articles This Morning : ) (Score:3, Insightful)

Re: (Score:2)

Such a surprise (Score:3)

Llama 4 (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot