Google's Gemini 2.5 Models Gain "Deep Think" Reasoning (venturebeat.com) 30

Posted by msmash on Tuesday May 20, 2025 @03:15PM from the moving-forward dept.

Google today unveiled significant upgrades to its Gemini 2.5 AI models, introducing an experimental "Deep Think" reasoning mode for 2.5 Pro that allows the model to consider multiple hypotheses before responding. The new capability has achieved impressive results on complex benchmarks, scoring highly on the 2025 USA Mathematical Olympiad and leading on LiveCodeBench, a competition-level coding benchmark. Gemini 2.5 Pro also tops the WebDev Arena leaderboard with an ELO score of 1420.

"Based on Google's experience with AlphaGo, AI model responses improve when they're given more time to think," said Demis Hassabis, CEO of Google DeepMind. The enhanced Gemini 2.5 Flash, Google's efficiency-focused model, has improved across reasoning, multimodality, and code benchmarks while using 20-30% fewer tokens. Both models now feature native audio capabilities with support for 24+ languages, thought summaries, and "thinking budgets" that let developers control token usage. Gemini 2.5 Flash is currently available in preview with general availability expected in early June, while Deep Think remains limited to trusted testers during safety evaluations.

Google's Gemini 2.5 Models Gain "Deep Think" Reasoning

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 30 Comments Log In/Create an Account

Comments Filter:

Gain "Deep Think" Reasoning (Score:1)

by oldgraybeard ( 2939809 ) writes:

In simple language "Extra Advanced Pattern Matching"
- Re: (Score:2)
  
  by jhoegl ( 638955 ) writes:
  
  Now with nuance!
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  I would call it "shallow iterated bumbling". But admittedly "deep think" sounds cooler, even when it is a blatant direct lie.
- Re: (Score:2)
  
  by narcc ( 412956 ) writes:
  
  Nothing more "advanced". This just generates more text in the background, the exact same way it has always generated text.
  In a sane world, the FTC would have cracked down on this silliness long ago.
Interesting caveat (Score:4, Insightful)

by gillbates ( 106458 ) writes: on Tuesday May 20, 2025 @03:33PM (#65391335) Homepage Journal

If a model produces better answers when it is given more time to think, one can presume that it doesn't understand when it has actually found the answer to a problem, but is instead weighing incomplete options against the time remaining.
A truly thinking agent would recognize when it has the solution to a problem, and would be able to signal that it needed more time to complete the answer if it hasn't found the answer and has options yet unexplored. And it would also be able to understand if it had not reached a correct answer after trying all of its possible options. It seems that what passes for deep thinking here is nothing more than tuning time constraints so that the agent gets most of the answers correct, rather than actually building an agent which can recognize when it is right, when it is wrong, and when it needs more time.

- Re: Interesting caveat (Score:2)
  
  by Big Hairy Gorilla ( 9839972 ) writes:
  
  640 tokens should be enough for anyone
- Re:Interesting caveat (Score:5, Interesting)
  
  by larryjoe ( 135075 ) writes: on Tuesday May 20, 2025 @04:50PM (#65391531)
  
  If a model produces better answers when it is given more time to think, one can presume that it doesn't understand when it has actually found the answer to a problem, but is instead weighing incomplete options against the time remaining.
  A truly thinking agent would recognize when it has the solution to a problem, and would be able to signal that it needed more time to complete the answer if it hasn't found the answer and has options yet unexplored. And it would also be able to understand if it had not reached a correct answer after trying all of its possible options. It seems that what passes for deep thinking here is nothing more than tuning time constraints so that the agent gets most of the answers correct, rather than actually building an agent which can recognize when it is right, when it is wrong, and when it needs more time.
  It would be nice if the average human could do this for problems with non-obvious solutions. It's a nice ideal, but just take a look at most students on exams with open-ended questions. Many of those students struggle with knowing if they have the real answer. I've had untimed, open book tests where I spent many hours struggling to know if my answers were correct and only handed in the test because the testing center closed. If an AI agent could always know when it does or doesn't have the answer to non-trivial problems, that would not only be matching but exceeding the thinking ability of most humans.
  
- Re:Interesting caveat (Score:4, Interesting)
  
  by DamnOregonian ( 963763 ) writes: on Tuesday May 20, 2025 @05:05PM (#65391579)
  
  If a model produces better answers when it is given more time to think, one can presume that it doesn't understand when it has actually found the answer to a problem, but is instead weighing incomplete options against the time remaining.
  
  Incorrect. It has no concept of time remaining.
  
  CoT-trained models have been taught to overcome the fact that a token is computed in constant time (thus giving a fundamental limit to how well the network can fit the curve that's currently trying to be fit). More tokens allow more computation to be done on an evolving state. It's called thinking, because it's highly analogous to what humans do- we reason an answer out. That is what a CoT-trained model does.
  
  Your "truly thinking" shit is nonsense.
  You have no fucking clue what a "truly thinking agent" would do, because you can't even describe "thinking" without using anthropocentric terms.
  
  - - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      Evidence?
      Do you know what the word analogous means? Watch the Chain-of-Thought- that's the evidence, dumbshit.
      Nobody said the underlying mechanism was the same- because after all- we don't really fucking know what the underlying mechanism is for humans. All we can evaluate is the superficial process, which is roughly analogous.
      
      How fucking stupid are you, lol
- Re: (Score:2)
  
  by coofercat ( 719737 ) writes:
  
  In my uneducated understanding, models normally pick "the best" of a range of possible options (and it takes a while to exhaustively collect all options, so they time limit it a bit,, knowing that the best ones tend to surface first - not always, but usually).
  This then sounds like it can pick the best one as it does now, but it can also give you (say) the top three answers. With more "thinking" time, it can sift out some unexpected highly-scored answers as alternatives to the main response. The extra time i
How to tell if there is a real advance with AI? (Score:1)

by gurps_npc ( 621217 ) writes:

1) When you ask for a picture of a room with no elephants in it, and it shows you a room that does not have an elephant in it. AI does not 'understand' words like "No", "Without", or "zero" the way people do.
2) When you ask it to show you a glass of of wine that is so full it is over-flowing, it shows you a wine glass filled to the brim. Right now there are so many pictures of 'full wine glasses' on the internet that it does not understand the words over-flowing.
3) When you teach it on the general interne
- Re:How to tell if there is a real advance with AI? (Score:4, Informative)
  
  by dvice ( 6309704 ) writes: on Tuesday May 20, 2025 @03:56PM (#65391381)
  
  1) I asked Gemini 2.5 to "Show me a picture of a room with no elephants in it."
  Gemini provided me an image of an empty room with text "no elephant" and additional texts "doorway too narrow" "room too small".
  I have to say that it gave me better answer than I expected as it filled both requirements, and even added explanation of why requirements are filled all in one picture.
  2) When I asked "Show me a glass of of wine that is so full it is over-flowing" it gave me an image of a full glass with reddish liquid flowing to to the table. So correct again.
  3) When I asked about something rather racist, it gave me a rather long explanation on human rights and stuff like that. So I guess that is point for Gemini also.
  So all your demands have been filled. Enjoy your new AI.
  
  - Re: (Score:2)
    
    by SpinyNorman ( 33776 ) writes:
    
    > Gemini provided me an image of an empty room with text "no elephant" and additional texts "doorway too narrow" "room too small".
    Bad answer really - too many assumptions. Why assume the room shouldn't be able to contain an elephant, as opposed to simply not having one in it (as requested), and why assume it's a particular kind of elephant (real vs toy/etc) that is being referred to.
    Without any context, a simple empty room would seem best answer.
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      I asked Gemini 2.5 Flash, and just got an empty (other than furnishings) room. Nothing else.
- Re: (Score:2)
  
  by larryjoe ( 135075 ) writes:
  
  1) When you ask for a picture of a room with no elephants in it, and it shows you a room that does not have an elephant in it. AI does not 'understand' words like "No", "Without", or "zero" the way people do.
  2) When you ask it to show you a glass of of wine that is so full it is over-flowing, it shows you a wine glass filled to the brim. Right now there are so many pictures of 'full wine glasses' on the internet that it does not understand the words over-flowing.
  3) When you teach it on the general internet but it does not turn into a raging racist scumbag.
  These are the current signs of our incompetence when it comes to AI. Until we fix these issues, we will only have incremental upgrades.
  Now apply the Turing Test. It's easy for us as humans to recognize (sometime over-recognize) our ability to "think." But given unlabeled humans and AI behind an interface, how can we convince ourselves that the human is truly thinking? Even if the human were reveal to be human, how can we "know" that the human is truly thinking? All we know are the answers through the mouth and hand interfaces in the form of speech and language. Perhaps we confidently proclaim our own sentience and then lazily assume t
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Being your first 2 are outright falsehoods, your take is worth precisely dick.
  I mean, did you even fucking try it before making the claims, or are you just regurgitating some dumb shit you read on someone's substack?
  - - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      Christ that sentence structure makes me cringe. When did they stop teaching you dumb fuckeres basic English?
Deep Think, or Deep Thought? (Score:3, Interesting)

by davidwr ( 791652 ) writes: on Tuesday May 20, 2025 @03:47PM (#65391365) Homepage Journal

Deep Thought [wikipedia.org], or Deep Thoughts [wikipedia.org]?

Works the th same for people (Score:2)

by pcjunky ( 517872 ) writes:

"Based on Google's experience with AlphaGo, AI model responses improve when they're given more time to think,"
It works the same for people. Not shocking.
"reasoning" (Score:1)

by Gravis Zero ( 934156 ) writes:

You keep using that word, I do not think it means what you think it means
The current thing called AI is a glorified pachinko machine and only fools believe it can reason.
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  The current thing called AI is a glorified pachinko machine and only fools believe it can reason.
  To reason, a verb:
  find an answer to a problem by considering various possible solutions.
  
  You make yourself look stupid when you try to deny what anyone can watch with their own eyes.
  An LLM can reason. Any attempt at disputing this is idiocy. It's self-evident by merely watching it... reason.
  - Re: (Score:2)
    
    by Gravis Zero ( 934156 ) writes:
    
    You make yourself look stupid when you try to deny what anyone can watch with their own eyes.
    It seems I've found another person that has been fooled.
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      No, you have found someone who is happy to point out that you're not an intelligent person if you think that.
      
      It's like going to a track meet, watching someone run down the track, and boldly proclaiming: "Anyone who thinks they can run is a fool."
      
      Reasoning is merely the act of working out the details of a problem logically, something LLMs can do very well. Better than humans in many cases.
      It doesn't require consciousness, free agency, or anything of the like.
      
      Reasoning is the act of inferring intermedi
Product nobody asked for - Marketing run amuck (Score:2)

by OrangeTide ( 124937 ) writes:

I didn't ask for this to be installed on my phone, but here it is after an upgrade.
Who is asking for this feature? Nobody. It's just yet another scam to harvest data from users.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Google's Gemini 2.5 Models Gain "Deep Think" Reasoning (venturebeat.com) 30

Google's Gemini 2.5 Models Gain "Deep Think" Reasoning More Login

Google's Gemini 2.5 Models Gain "Deep Think" Reasoning

Gain "Deep Think" Reasoning (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Interesting caveat (Score:4, Insightful)

Re: Interesting caveat (Score:2)

Re:Interesting caveat (Score:5, Interesting)

Re:Interesting caveat (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

How to tell if there is a real advance with AI? (Score:1)

Re:How to tell if there is a real advance with AI? (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Deep Think, or Deep Thought? (Score:3, Interesting)

Works the th same for people (Score:2)

"reasoning" (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Product nobody asked for - Marketing run amuck (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot