Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Google AI

Google Gemini 1.5 Pro Leaps Ahead In AI Race, Challenging GPT-4o (venturebeat.com) 11

An anonymous reader quotes a report from VentureBeat: Google launched its latest artificial intelligence powerhouse, Gemini 1.5 Pro, today, making the experimental "version 0801" available for early testing and feedback through Google AI Studio and the Gemini API. This release marks a major leap forward in the company's AI capabilities and has already sent shockwaves through the tech community. The new model has quickly claimed the top spot on the prestigious LMSYS Chatbot Arena leaderboard (built with Gradio), boasting an impressive ELO score of 1300.

This achievement puts Gemini 1.5 Pro ahead of formidable competitors like OpenAI's GPT-4o (ELO: 1286) and Anthropic's Claude-3.5 Sonnet (ELO: 1271), potentially signaling a shift in the AI landscape. Simon Tokumine, a key figure in the Gemini team, celebrated the release in a post on X.com, describing it as "the strongest, most intelligent Gemini we've ever made." Early user feedback supports this claim, with one Redditor calling the model "insanely good" and expressing hope that its capabilities won't be scaled back.
"A standout feature of the 1.5 series is its expansive context window of up to two million tokens, far surpassing many competing models," adds VentureBeat. "This allows Gemini 1.5 Pro to process and reason about vast amounts of information, including lengthy documents, extensive code bases, and extended audio or video content."
This discussion has been archived. No new comments can be posted.

Google Gemini 1.5 Pro Leaps Ahead In AI Race, Challenging GPT-4o

Comments Filter:
  • prestigious? (Score:2, Insightful)

    by itamblyn ( 867415 )
    Since when are LLM leaderboards prestigious? And is anyone surprised there are new models coming out with larger context lengths? I wouldn't call that a shockwave through the tech community...
    • by Junta ( 36770 )

      Yeah, qualitative experience hasn't tracked the scale of the quantitative measures being bragged about for a while...

    • Re:prestigious? (Score:5, Insightful)

      by Rei ( 128717 ) on Friday August 02, 2024 @09:05AM (#64675260) Homepage

      LMSYS is different from most leaderboards [lmsys.org]. It's not some fixed open set of questions, but rather it's A-B testing by humans, manually rating which answers they think are better.

      A naive implementation of transformers suffers from O(N^2) scaling with respect to context windows, whereas tricks like rope scaling to extend windows reduce quality. So achieving large contexts at high quality is an achievement. Some alternative architectures like Mamba don't suffer from O(N^2) scaling but aren't as mature.

    • yeah the tone in this "article" (built with Gradio) is not right.

  • Though so. That score is really meaningless.

  • They may have something better than ChatGPT 4o but the free version was horrible compared to the GPT3.5 so I stopped trying to use it.
    I don't feel an incentive to pay to try their 'better' production.

    At this point Google would have to pay me to use their shoddy software. I'm banking on Gemini to end up in the Google graveyard.
    • They may have something better than ChatGPT 4o but the free version was horrible compared to the GPT3.5 so I stopped trying to use it.

      I don't feel an incentive to pay to try their 'better' production.

      My experience mirrors yours. Gemini seems primarily focused on manipulating Google Docs files, FWIW. Maybbe it was trained on 'anonymous' Google Docs user data. Made anonymous the same way gmail is anonymous yet still indexed and scored 'for the user's overall experience within Google'.

      Also for what it is worth, I have great success 'coding by prompt' writing Drupal Form API code, presumably because claude.ai and chatGPT 4o were trained on open-source github/ gitlab examples.

      And since my code is also open-s

      • Yes I have been using ChatGPT for coding sinple stuff like Powershell and Python scripts. I appreciated it's ability to clean up and help me improve code.
        I tried to do the same with Gemini and when it hit a barrier, I put the code over to ChatGPT that was able to figure out what eluded Gemini. Not good when one A.I has to be used to correct another one.
  • /. is a cess pool of corporate promotion but reality is a bitch i use gemini 1.5 pro daily and it's about half as useful as those two others it hallucinates constantly which the others don't anymore and the code is nonsense
  • Ugh, the world leaders have made no progress on limiting energy use. LLMs account for a growing and significant amount of the world's energy usage, and we still can't coordinate to limit individual organizations from using as much power as they like. The cost of electricity (and other fuels) simply does not balance the growing harm. The article next to this one is about a 10 degree C rise in Arctic temperature. That's huge! It's not like I don't want summers to be less sweltering, but this will wreak havoc

"I got everybody to pay up front...then I blew up their planet." "Now why didn't I think of that?" -- Post Bros. Comics

Working...