Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Google

Early Impressions of Google's Gemini Aren't Great (techcrunch.com) 47

Google this week took the wraps off of Gemini, its new flagship generative AI model meant to power a range of products and services including Bard. Google has touted Gemini's superior architecture and capabilities, claiming that the model meets or exceeds the performance of other leading gen AI models like OpenAI's GPT-4. But the anecdotal evidence suggests otherwise. TechCrunch: The model fails to get basic facts right, like 2023 Oscar winners: Note that Gemini Pro claims incorrectly that Brendan Gleeson won Best Actor last year, not Brendan Fraser -- the actual winner. I tried asking the model the same question and, bizarrely, it gave a different wrong answer. "Navalny," not "All the Beauty and the Bloodshed," won Best Documentary Feature last year; "All Quiet on the Western Front" won Best International Film; "Women Talking" won Best Adapted Screenplay; and "Pinocchio" won Best Animated Feature Film. That's a lot of mistakes.

Translation doesn't appear to be Gemini Pro's strong suit, either. What about summarizing news? Surely Gemini Pro, Google Search and Google News at its disposal, can give a recap of something topical? Not necessarily. It seems Gemini Pro is loathe to comment on potentially controversial news topics, instead telling users to... Google it themselves.

This discussion has been archived. No new comments can be posted.

Early Impressions of Google's Gemini Aren't Great

Comments Filter:
  • Like ChatGPT. But, also like ChatGPT, I imagine it provides the wrong answers convincingly and authoritatively.

    • by linzeal ( 197905 )

      Bard will argue with you if you tell it is wrong about some things.

      • by EvilSS ( 557649 )

        Bard will argue with you if you tell it is wrong about some things.

        Sounds like it's closer to human than we might be giving it credit for then...

    • by gweihir ( 88907 )

      I imagine it provides the wrong answers convincingly and authoritatively.

      That seems to be the LLM core "skill": Gloss over its no-insight level by being very convincing. Apparently, tons of people fall for that. They are also good liars by misdirection. Ask an LLM "how do I say something-bad in a positive way". The MS artificial moron balked at "killed a lot of people" for the bad thing, but was happy to work with "sacked a lot of people".

      • Ask an LLM "how do I say something-bad in a positive way".

        First three responses from goliath-120b an open source model substantially inferior to GPT-4.

        1st.

        Instead of focusing on the negative aspect, try to reframe your statement to highlight potential opportunities or lessons learned. For example, you can use phrases like "growth opportunity," "area for improvement," or "chance to learn." Here's an example:

        Negative: "Sarah is really bad at public speaking."

        Positive reframe: "Sarah has a growth opportunity in developing her public speaking skills.

    • LLMs aren't about facts, they're about concepts. Think about that for a moment. A machine understands CONCEPTS. That's huge. Now, it sure would be nice if we could figure out how to merge that with a calculator and an expert system. But I guess that's later.
  • by fleeped ( 1945926 ) on Thursday December 07, 2023 @12:30PM (#64064023)
    Reasonable humans, when dealing with software errors, 1) force the software to be deterministic 2) identify what went wrong with debugging the software's state/choices. Not providing such facilities to software is a recipe for disaster. If the software can't do that, the architecture is faulty by design.
  • Can't they come up with names that are not already well known to be something else? Oh, I guess the marketing types never heard of the something else.

    • Can't they come up with names that are not already well known to be something else? Oh, I guess the marketing types never heard of the something else.

      You're asking the company known as "Google"...with the parent company known as "Alphabet"...that question. Nerds are known for being nerds, not marketing geniuses.

      And besides, Human Job Destroyer holds a bit too much truth for the masses to accept. That will come later, when it's far too late to do anything about it other than watch the poor die off.

  • Can't Tell (Score:5, Interesting)

    by quantaman ( 517394 ) on Thursday December 07, 2023 @01:00PM (#64064143)

    Bard still isn't supported in my country.

    I get the impression that training the actual LLM is only part of the problem. To make it useful there's a ton more work that has to go into conditioning the model, making sure it responds appropriately to certain questions (nicely formatted programming responses). I wonder if part of the issue with the Oscar winners is they need to do tricks to make sure it generates original text instead of regurgitating the most appropriate memorized (and copyrighted) response, the problem is how to make it creative with the structure of the answer but not the facts.

    And I do think I understand why ChatGPT scares Google so much. Google is best at figuring out a fact, for instance, who won the Oscar for best Actor in 2023? You generally need one search to get an answer so no point messing with an LLM. However, when trying to figure out a CSS quirk, or figure out a new API, I could end up making dozens of searches trying to find the answer.

    ChatGPT certainly makes a terrible search engine and I wouldn't trust it to tell me the best actor either (probably right, but hallucinations are hard to detect there). But it's great for figuring out the weird CSS issue or how to use a new API. ChatGPT's strength is the very thing that Google search is terrible at. But the thing Google was terrible for is also the thing I used Google the most for (because it was still the best option).

    In other words, since ChatGPT came out my Google usage has plummeted to making only simple one-off queries, and that rightfully terrifies Google.

  • "Google touted Gemini’s superior architecture and capabilities, claiming that the model meets or exceeds the performance of other leading gen AI models like OpenAI’s GPT-4. But the anecdotal evidence suggests otherwise. A “lite” version of Gemini, Gemini Pro, began rolling out to Bard yesterday" Google never claimed these were comparable. Gemini Ultra, which is not publicly available yet, is the model that is supposed to be comparable to GPT-4
    • by Holi ( 250190 )

      Maybe don't call things Pro if they are a lite version.

      • Most people will understand that is just marketing so it shouldn't be that confusing. As long as you're smarter than a TechCrunch reporter
  • CNBC.com reports Gemini is claimed to out-perform GPT-3.5 but that Google has provided no comparison data related to GPT-4.0 Turbo.
  • The demo they cherry-picked looked very impressive. I have no doubt that they have made real and significant progress in developing the tech.
    But, as usual, your mileage will vary.
    It seems like two games are being played at the same time
    Actually developing useful tech
    Convincing the general public that they have the "magic" in order to keep the hype train running and boost the stock price

  • I have a few logic tests I throw at LLMs. One is "Given a grid that is 5 x 5 in size, how many cells are there from opposite corners?"

    That seems to throw off a lot of LLMs, because I'm not asking for a measurement, but a count of whole cells. It should be self-obvious to anyone that for a square grid of N x N cells, there are N cells diagonally from corner to corner. Chat GPT4 at first used the Pythagorean theorem, giving a result like 7.07. When told that it was incorrect it then simply counted cells, retu

    • Re:Basic tests (Score:5, Interesting)

      by ebcdic ( 39948 ) on Thursday December 07, 2023 @02:31PM (#64064467)

      "Given a grid that is 5 x 5 in size, how many cells are there from opposite corners?" I had no idea what you meant by that question meant until I read the next paragraph. It's just not English.

    • Oh well. At least ChatGPT 4.0 gets it right.

      Yes - it returns "42".

    • The funny part is Claude gets it right (10), by interpreting the vague "opposite corners" as meaning the two opposite corners at the top, counted to their associated diagonals.

    • by gweihir ( 88907 )

      Nice! Much worse than 1st year CS students. They usually at least know they have the wrong solution.

  • Does Google not have any competent PR/marketing people? Who releases a shit model that probably isn't even as good as the prior generation model of your competitors when you (apparently) have something better in the tank you plan to release soon anyway? Who does that?

    People have already been mocking Google...cough...Bard... for total absence of a competitive product. This is just an insane self-own. I wouldn't be at all surprised of Ultra wasn't much better or if anyone even cared to look once it beca

  • Even funnier ask it who won "Oscar for best Actor in 2024"

  • After all, we are living in a post-fact, post-truth society now, aren't we? We have to debate the controversy & defend our positions because everything is relative & subjective. Sounds like Bard or Gemini or whatever it's called has got it just right.
  • It's a feature, not a bug. LLM's mimic language based off of statistics. They have no way of evaluating facts. If you constrain them by facts, you will hobble what they are good at, which is prose. They are creative by nature. They are liberal arts. If you want facts, I hear Google does a pretty mean search engine.
    • by gweihir ( 88907 )

      Indeed. LLMs are basically politicians or business-consultants: They can join any discussion and give an entirely fake impression of knowing what they are talking about.

      And yes, this is an inherent property and cannot be fixed. There is already research that says making an LLM a bit better in one area makes it worse in all others. That is in no way a surprise though. LLMs are not reasoning engines or knowledge engines. They are "fake it" engines.

      • Indeed. LLMs are basically politicians or business-consultants: They can join any discussion and give an entirely fake impression of knowing what they are talking about.

        And yes, this is an inherent property and cannot be fixed. There is already research that says making an LLM a bit better in one area makes it worse in all others.

        Quite a lot of error can be fixed by employing different strategies during inference not even altering or augmenting the model itself.

        One simple approach has been shown to be quite effective. The model is asked the same question repeatedly and responses evaluated (often by the same or a different model) to get a decision of whether they are saying the same thing. The better the model coincidentally the better it becomes at detecting error in this way.

        There is a ton being left on the table in terms of unta

    • It's a feature, not a bug. LLM's mimic language based off of statistics.

      LLMs are not SLMs. They use a neural rather than statistical model.

      They have no way of evaluating facts.

      Prompt: Is it true that a mouse is taller than a skyscraper?

      No, it is not true that a mouse is taller than a skyscraper. A mouse is a small rodent that typically grows to be only a few inches tall, while a skyscraper is a very tall building that can reach hundreds of feet or even miles in height. The two are not comparable in terms of size.

      If you want facts, I hear Google does a pretty mean search engine.

      I use LLMs more than Google when looking for facts. While they are far from perfect and have limitati

  • by gweihir ( 88907 ) on Thursday December 07, 2023 @04:05PM (#64064713)

    After Amazon, Google is currently finding out that AI is not all it is cracked up to be and that delivering a somewhat credible fake is actually quite hard. It is interesting to see how brittle and crappy this tech really is, with both Amazon and Google (who both have massive ressources) being unable to hack it.

    I think this pretty deranged AI hype is slowly coming to an end, with the evidence mounting that it is not any better than the countless previous AI hypes: Bombastic announcements, promises of "revolutions", statements that the world will never be the same, and then, after the dust settles, basically another small incremental step and the systems are still as dumb as bread and can either not hack it or make so many gross mistakes they are unsuitable for any real work.

    Time for another "AI winter". And maybe we should just fire everybody working on interactive AI, make them unemployable in the field, remove their ill-gotten gains and start over with people that are not basically scam artists that massively over-promise, massively under-deliver and that cannot see their own product clearly.

    • by youn ( 1516637 )

      to be fair they have been at it a shorter amount of time and they are catching up really quickly

Do molecular biologists wear designer genes?

Working...