Early Impressions of Google's Gemini Aren't Great (techcrunch.com) 47

Posted by msmash on Thursday December 07, 2023 @01:20PM from the tough-luck dept.

Google this week took the wraps off of Gemini, its new flagship generative AI model meant to power a range of products and services including Bard. Google has touted Gemini's superior architecture and capabilities, claiming that the model meets or exceeds the performance of other leading gen AI models like OpenAI's GPT-4. But the anecdotal evidence suggests otherwise. TechCrunch: The model fails to get basic facts right, like 2023 Oscar winners: Note that Gemini Pro claims incorrectly that Brendan Gleeson won Best Actor last year, not Brendan Fraser -- the actual winner. I tried asking the model the same question and, bizarrely, it gave a different wrong answer. "Navalny," not "All the Beauty and the Bloodshed," won Best Documentary Feature last year; "All Quiet on the Western Front" won Best International Film; "Women Talking" won Best Adapted Screenplay; and "Pinocchio" won Best Animated Feature Film. That's a lot of mistakes.

Translation doesn't appear to be Gemini Pro's strong suit, either. What about summarizing news? Surely Gemini Pro, Google Search and Google News at its disposal, can give a recap of something topical? Not necessarily. It seems Gemini Pro is loathe to comment on potentially controversial news topics, instead telling users to... Google it themselves.

Early Impressions of Google's Gemini Aren't Great

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 47 Comments Log In/Create an Account

Comments Filter:

Doesn't get basic facts right (Score:2)

by 93 Escort Wagon ( 326346 ) writes:

Like ChatGPT. But, also like ChatGPT, I imagine it provides the wrong answers convincingly and authoritatively.
- Re: (Score:2)
  
  by linzeal ( 197905 ) writes:
  
  Bard will argue with you if you tell it is wrong about some things.
  - Re: (Score:2)
    
    by EvilSS ( 557649 ) writes:
    
    Bard will argue with you if you tell it is wrong about some things.
    Sounds like it's closer to human than we might be giving it credit for then...
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  I imagine it provides the wrong answers convincingly and authoritatively.
  That seems to be the LLM core "skill": Gloss over its no-insight level by being very convincing. Apparently, tons of people fall for that. They are also good liars by misdirection. Ask an LLM "how do I say something-bad in a positive way". The MS artificial moron balked at "killed a lot of people" for the bad thing, but was happy to work with "sacked a lot of people".
  - Re: (Score:2)
    
    by WaffleMonster ( 969671 ) writes:
    
    Ask an LLM "how do I say something-bad in a positive way".
    First three responses from goliath-120b an open source model substantially inferior to GPT-4.
    1st.
    Instead of focusing on the negative aspect, try to reframe your statement to highlight potential opportunities or lessons learned. For example, you can use phrases like "growth opportunity," "area for improvement," or "chance to learn." Here's an example: Negative: "Sarah is really bad at public speaking." Positive reframe: "Sarah has a growth opportunity in developing her public speaking skills.
- Re: Doesn't get basic facts right (Score:2)
  
  by LindleyF ( 9395567 ) writes:
  
  LLMs aren't about facts, they're about concepts. Think about that for a moment. A machine understands CONCEPTS. That's huge. Now, it sure would be nice if we could figure out how to merge that with a calculator and an expert system. But I guess that's later.
Nondeterminism of a black box (Score:4, Interesting)

by fleeped ( 1945926 ) writes: on Thursday December 07, 2023 @01:30PM (#64064023)

Reasonable humans, when dealing with software errors, 1) force the software to be deterministic 2) identify what went wrong with debugging the software's state/choices. Not providing such facilities to software is a recipe for disaster. If the software can't do that, the architecture is faulty by design.

- Re:New Hires Totally Worthless (Score:5, Funny)
  
  by Calydor ( 739835 ) writes: on Thursday December 07, 2023 @01:40PM (#64064073)
  
  The difference, as far as I can tell, is that your recent hires admitted to not knowing. They didn't try to convince you that Nixon's Vice President was Barbara Streisand.
  
  - Re: (Score:2)
    
    by youn ( 1516637 ) writes:
    
    Wasn't she? she feels like she could have been. After all, in a way, he too was a victim of the Streisand Effect
  - Re: (Score:2)
    
    by 93 Escort Wagon ( 326346 ) writes:
    
    They didn't try to convince you that Nixon's Vice President was Barbara Streisand.
    Well it's understandable. Barbara Streisand was Spiro Agnew's wife, after all.
- Re: (Score:2)
  
  by groobly ( 6155920 ) writes:
  
  Had they heard of Nixon?
- Re: (Score:2)
  
  by war4peace ( 1628283 ) writes:
  
  People don't know everything. News at 11.
  - - Re: (Score:2)
      
      by ebcdic ( 39948 ) writes:
      
      "It's unreal how stupid people still are when they hold a planet's worth of knowledge in their pocket." Being able to name the winner of some entertainment industry award doesn't make you less stupid.
    - Re: (Score:2)
      
      by war4peace ( 1628283 ) writes:
      
      Fractured logic much?
      You seem to fail to differentiate between easy access to information and knowing it by heart.
      When someone asks me "Who was the vice-president of $PRESIDENT?", according to your logic, the one who asks the question is stupid because they can't find the answer by themselves, so they gotta ask me, right?
      Therefore, according to you, the proper answer for damn near everything would be "ha-ha, moron, can't search!".
      - Re: (Score:2)
        
        by chmod a+x mojo ( 965286 ) writes:
        
        Ahh, I'd say you must be new here, but then I looked at your UID and confirmed it myself.
        Stick around here for another couple of decades, eventually we will get back to the SECOND of the two questions posted to slashdot as dupes of dupes.
        Slightly more seriously though, if your answer to a question is "I don't know" and ends at that, you are either lazy or stupid. Not knowing doesn't make you stupid, but apathy to the point of refusing to learn something after being asked DOES make you either stupid or lazy.
        
        Re: (Score:2)
        
        by war4peace ( 1628283 ) writes:
        
        Slightly more seriously though, if your answer to a question is "I don't know" and ends at that, you are either lazy or stupid.
        You forgot 3. I don't give a fuck about that specific topic.
        Maybe that "I don't know" could be followed by "and I don't care". That applies to who was Nixon's VP, Grammy awards of last year (or any year, for that matter), etc. I happen to know which year the US joined WW2, as well as a few hundred other WW2-related dates, but that's because I happen to be interested in that particular subject.
        Stupid is typing out a long winded diatribe on a forum to demand someone else spoon feed you answers to a question that could have been answered by searching for the title of your post in said forum
        You do realize that was for testing purposes, don't you? But then again, I'm not so sure you do.
        And believing UID ma
        
        Re: (Score:2)
        
        by nicubunu ( 242346 ) writes:
        
        As a human Is not always a firm "I don't know", many times is "I am not sure". Me, I don't live in the US so wasn't supposed to learn those names but is possible I read or heard some of them, so I may know, I may not, I may confuse them. It is possible to think I know but not really know and provide a bad answer ("hallucination" in AI terms). Depending on my personality, I may be even very confident when providing the wrong answer.
- Re:New Hires Totally Worthless at pointless trivia (Score:2)
  
  by Hasaf ( 3744357 ) writes:
  
  What weird trivia factory do you work at that makes these questions essential, or even tangental, to job performance?
  
  For me the first two are easy. That said, on any given day, with no meaningful context, I might not even be able to tell you what the Grammies are, let alone who won them. They are just unimportant to me.
- Re: (Score:2)
  
  by nicubunu ( 242346 ) writes:
  
  I don't know either who was Nixon's VP, how that info would be useful to me? It happened before I was born and on a different continent than where I live.
namespace collisions (Score:2)

by groobly ( 6155920 ) writes:

Can't they come up with names that are not already well known to be something else? Oh, I guess the marketing types never heard of the something else.
- Re: (Score:2)
  
  by geekmux ( 1040042 ) writes:
  
  Can't they come up with names that are not already well known to be something else? Oh, I guess the marketing types never heard of the something else.
  You're asking the company known as "Google"...with the parent company known as "Alphabet"...that question. Nerds are known for being nerds, not marketing geniuses.
  And besides, Human Job Destroyer holds a bit too much truth for the masses to accept. That will come later, when it's far too late to do anything about it other than watch the poor die off.
Can't Tell (Score:5, Interesting)

by quantaman ( 517394 ) writes: on Thursday December 07, 2023 @02:00PM (#64064143)

Bard still isn't supported in my country.
I get the impression that training the actual LLM is only part of the problem. To make it useful there's a ton more work that has to go into conditioning the model, making sure it responds appropriately to certain questions (nicely formatted programming responses). I wonder if part of the issue with the Oscar winners is they need to do tricks to make sure it generates original text instead of regurgitating the most appropriate memorized (and copyrighted) response, the problem is how to make it creative with the structure of the answer but not the facts.
And I do think I understand why ChatGPT scares Google so much. Google is best at figuring out a fact, for instance, who won the Oscar for best Actor in 2023? You generally need one search to get an answer so no point messing with an LLM. However, when trying to figure out a CSS quirk, or figure out a new API, I could end up making dozens of searches trying to find the answer.
ChatGPT certainly makes a terrible search engine and I wouldn't trust it to tell me the best actor either (probably right, but hallucinations are hard to detect there). But it's great for figuring out the weird CSS issue or how to use a new API. ChatGPT's strength is the very thing that Google search is terrible at. But the thing Google was terrible for is also the thing I used Google the most for (because it was still the best option).
In other words, since ChatGPT came out my Google usage has plummeted to making only simple one-off queries, and that rightfully terrifies Google.

The article is comparing Gemini Pro to GPT-4 (Score:1)

by CallMeTim ( 6454842 ) writes:

"Google touted Gemini’s superior architecture and capabilities, claiming that the model meets or exceeds the performance of other leading gen AI models like OpenAI’s GPT-4. But the anecdotal evidence suggests otherwise. A “lite” version of Gemini, Gemini Pro, began rolling out to Bard yesterday" Google never claimed these were comparable. Gemini Ultra, which is not publicly available yet, is the model that is supposed to be comparable to GPT-4
- Re: (Score:2)
  
  by Holi ( 250190 ) writes:
  
  Maybe don't call things Pro if they are a lite version.
  - Re: (Score:1)
    
    by CallMeTim ( 6454842 ) writes:
    
    Most people will understand that is just marketing so it shouldn't be that confusing. As long as you're smarter than a TechCrunch reporter
Inaccurate reporting (Score:2)

by Yo,dog! ( 1819436 ) writes:

CNBC.com reports Gemini is claimed to out-perform GPT-3.5 but that Google has provided no comparison data related to GPT-4.0 Turbo.
All aboard the hype train (Score:2)

by MpVpRb ( 1423381 ) writes:

The demo they cherry-picked looked very impressive. I have no doubt that they have made real and significant progress in developing the tech.
But, as usual, your mileage will vary.
It seems like two games are being played at the same time
Actually developing useful tech
Convincing the general public that they have the "magic" in order to keep the hype train running and boost the stock price
Basic tests (Score:2)

by Dan East ( 318230 ) writes:

I have a few logic tests I throw at LLMs. One is "Given a grid that is 5 x 5 in size, how many cells are there from opposite corners?"
That seems to throw off a lot of LLMs, because I'm not asking for a measurement, but a count of whole cells. It should be self-obvious to anyone that for a square grid of N x N cells, there are N cells diagonally from corner to corner. Chat GPT4 at first used the Pythagorean theorem, giving a result like 7.07. When told that it was incorrect it then simply counted cells, retu
- Re:Basic tests (Score:5, Interesting)
  
  by ebcdic ( 39948 ) writes: on Thursday December 07, 2023 @03:31PM (#64064467)
  
  "Given a grid that is 5 x 5 in size, how many cells are there from opposite corners?" I had no idea what you meant by that question meant until I read the next paragraph. It's just not English.
  
- Re: (Score:2)
  
  by 93 Escort Wagon ( 326346 ) writes:
  
  Oh well. At least ChatGPT 4.0 gets it right.
  Yes - it returns "42".
- Re: (Score:1)
  
  by bad-badtz-maru ( 119524 ) writes:
  
  The funny part is Claude gets it right (10), by interpreting the vague "opposite corners" as meaning the two opposite corners at the top, counted to their associated diagonals.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Nice! Much worse than 1st year CS students. They usually at least know they have the wrong solution.
PageRank is all you need (Score:2)

by WaffleMonster ( 969671 ) writes:

Does Google not have any competent PR/marketing people? Who releases a shit model that probably isn't even as good as the prior generation model of your competitors when you (apparently) have something better in the tank you plan to release soon anyway? Who does that?
People have already been mocking Google...cough...Bard... for total absence of a competitive product. This is just an insane self-own. I wouldn't be at all surprised of Ultra wasn't much better or if anyone even cared to look once it beca
Even funnier (Score:1)

by BrookSmith ( 2949941 ) writes:

Even funnier ask it who won "Oscar for best Actor in 2024"
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
LLM's Don't Produce Facts (Score:1)

by drrilll ( 2593537 ) writes:

It's a feature, not a bug. LLM's mimic language based off of statistics. They have no way of evaluating facts. If you constrain them by facts, you will hobble what they are good at, which is prose. They are creative by nature. They are liberal arts. If you want facts, I hear Google does a pretty mean search engine.
- Re: (Score:3)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. LLMs are basically politicians or business-consultants: They can join any discussion and give an entirely fake impression of knowing what they are talking about.
  And yes, this is an inherent property and cannot be fixed. There is already research that says making an LLM a bit better in one area makes it worse in all others. That is in no way a surprise though. LLMs are not reasoning engines or knowledge engines. They are "fake it" engines.
  - Re: (Score:2)
    
    by WaffleMonster ( 969671 ) writes:
    
    Indeed. LLMs are basically politicians or business-consultants: They can join any discussion and give an entirely fake impression of knowing what they are talking about.
    And yes, this is an inherent property and cannot be fixed. There is already research that says making an LLM a bit better in one area makes it worse in all others.
    Quite a lot of error can be fixed by employing different strategies during inference not even altering or augmenting the model itself.
    One simple approach has been shown to be quite effective. The model is asked the same question repeatedly and responses evaluated (often by the same or a different model) to get a decision of whether they are saying the same thing. The better the model coincidentally the better it becomes at detecting error in this way.
    There is a ton being left on the table in terms of unta
- Re: (Score:2)
  
  by WaffleMonster ( 969671 ) writes:
  
  It's a feature, not a bug. LLM's mimic language based off of statistics.
  LLMs are not SLMs. They use a neural rather than statistical model.
  They have no way of evaluating facts.
  Prompt: Is it true that a mouse is taller than a skyscraper?
  No, it is not true that a mouse is taller than a skyscraper. A mouse is a small rodent that typically grows to be only a few inches tall, while a skyscraper is a very tall building that can reach hundreds of feet or even miles in height. The two are not comparable in terms of size.
  If you want facts, I hear Google does a pretty mean search engine.
  I use LLMs more than Google when looking for facts. While they are far from perfect and have limitati
Another one bites the dust... (Score:3)

by gweihir ( 88907 ) writes: on Thursday December 07, 2023 @05:05PM (#64064713)

After Amazon, Google is currently finding out that AI is not all it is cracked up to be and that delivering a somewhat credible fake is actually quite hard. It is interesting to see how brittle and crappy this tech really is, with both Amazon and Google (who both have massive ressources) being unable to hack it.
I think this pretty deranged AI hype is slowly coming to an end, with the evidence mounting that it is not any better than the countless previous AI hypes: Bombastic announcements, promises of "revolutions", statements that the world will never be the same, and then, after the dust settles, basically another small incremental step and the systems are still as dumb as bread and can either not hack it or make so many gross mistakes they are unsuitable for any real work.
Time for another "AI winter". And maybe we should just fire everybody working on interactive AI, make them unemployable in the field, remove their ill-gotten gains and start over with people that are not basically scam artists that massively over-promise, massively under-deliver and that cannot see their own product clearly.

- Re: (Score:2)
  
  by youn ( 1516637 ) writes:
  
  to be fair they have been at it a shorter amount of time and they are catching up really quickly

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Doesn't get basic facts right (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Doesn't get basic facts right (Score:2)

Nondeterminism of a black box (Score:4, Interesting)

Re:New Hires Totally Worthless (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:New Hires Totally Worthless at pointless trivia (Score:2)

Re: (Score:2)

namespace collisions (Score:2)

Re: (Score:2)

Can't Tell (Score:5, Interesting)

The article is comparing Gemini Pro to GPT-4 (Score:1)

Re: (Score:2)

Re: (Score:1)

Inaccurate reporting (Score:2)

All aboard the hype train (Score:2)

Basic tests (Score:2)

Re:Basic tests (Score:5, Interesting)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

PageRank is all you need (Score:2)

Even funnier (Score:1)

Re: (Score:2)

LLM's Don't Produce Facts (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Another one bites the dust... (Score:3)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals