Google Unveils Gemini 2.5 Pro, Its Latest AI Reasoning Model With Significant Benchmark Gains (blog.google) 7

Posted by msmash on Tuesday March 25, 2025 @03:30PM from the moving-forward dept.

Google DeepMind has launched Gemini 2.5, a new family of AI models designed to "think" before responding to queries. The initial release, Gemini 2.5 Pro Experimental, tops the LMArena leaderboard by what Google claims is a "significant margin" and demonstrates enhanced reasoning capabilities across technical tasks. The model achieved 18.8% on Humanity's Last Exam without tools, outperforming most competing flagship models. In mathematics, it scored 86.7% on AIME 2025 and 92.0% on AIME 2024 in single attempts, while reaching 84.0% on GPQA's diamond benchmark for scientific reasoning.

For developers, Gemini 2.5 Pro demonstrates improved coding abilities with 63.8% on SWE-Bench Verified using a custom agent setup, though this falls short of Anthropic's Claude 3.7 Sonnet score of 70.3%. On Aider Polyglot for code editing, it scores 68.6%, which Google claims surpasses competing models. The reasoning approach builds on Google's previous experiments with reinforcement learning and chain-of-thought prompting. These techniques allow the model to analyze information, incorporate context, and draw conclusions before delivering responses. Gemini 2.5 Pro ships with a 1 million token context window (approximately 750,000 words). The model is available immediately in Google AI Studio and for Gemini Advanced subscribers, with Vertex AI integration planned in the coming weeks.

Google Unveils Gemini 2.5 Pro, Its Latest AI Reasoning Model With Significant Benchmark Gains

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 7 Comments Log In/Create an Account

Comments Filter:

Benchmarks are meaningless (Score:1)

by gweihir ( 88907 ) writes:

Whenever a peddler of LLM-crap stresses their artificial moron is doing better on benchmarks, that just means they have given up and are cheating now.
- Re: (Score:2)
  
  by dinfinity ( 2300094 ) writes:
  
  Are you still doing this? Move on, man. The world has.
  Your contentless ranting is just noise that pollutes Slashdot.
- Re: (Score:2)
  
  by smooth wombat ( 796938 ) writes:
  
  You never have benchmarks in your life? When putting together your new system, you don't look at how well the various components perform? When hiring for a position, you don't look at their credentials or what they've done? When judging which ar to buy you don't look at its 0-60 times, its fuel mileage, its reliability?
  Explain how one is to gauge the good or bad of something without a consistent benchmark to compare against.
- Re: Benchmarks are meaningless (Score:2)
  
  by electroniceric ( 468976 ) writes:
  
  I'm not sure they're cheating, but I think the significance of the benchmarks is pretty overstated.
  I suspect that even though the benchmarks are supposed to test something other than information retrieval and interpolation in practice they end up being amenable to being "solved" by information retrieval and interpolation.
  But a lot of what makes us go is our ability to switch out of that mode and into other modes, like pure logic, or other modes that are typically disparaged like emotional states or interac
there is that stupid shit again (Score:2)

by dfghjk ( 711126 ) writes:

"...designed to "think" before responding to queries..."
Literally every piece of software EVER was "designed to think before responding to queries". It is impossible to do otherwise.
I am so sick of this anthropomorphizing of AI. It is computer software.
"...demonstrates enhanced reasoning capabilities across technical tasks."
Does better than some other things at some tasks.
"For developers, Gemini 2.5 Pro demonstrates improved coding abilities ..."
Not to be confused with "coding abilities" of developers.
"Th
- Re: (Score:1)
  
  by CallMeTim ( 6454842 ) writes:
  
  In this case 'reasoning' describes the technique used to improve the LLMs that is different (https://en.wikipedia.org/wiki/Reasoning_language_model). You may disagree with the name, but it isn't just marketing hype. It is what the technique is called in the industry.
  - Re: (Score:2)
    
    by sound+vision ( 884283 ) writes:
    
    So it sounds like they are separating internal reasoning/logic from the communication functions. Are these also the AI that we hear about becoming "deceptive"?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Google Unveils Gemini 2.5 Pro, Its Latest AI Reasoning Model With Significant Benchmark Gains (blog.google) 7

Google Unveils Gemini 2.5 Pro, Its Latest AI Reasoning Model With Significant Benchmark Gains More Login

Google Unveils Gemini 2.5 Pro, Its Latest AI Reasoning Model With Significant Benchmark Gains

Benchmarks are meaningless (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: Benchmarks are meaningless (Score:2)

there is that stupid shit again (Score:2)

Re: (Score:1)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot