Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
AI

AI Mistakes Are Very Different from Human Mistakes (schneier.com) 114

Bruce Schneier and Nathan E. Sanders, writing in a post: Someone who makes calculus mistakes is also likely to respond "I don't know" to calculus-related questions. To the extent that AI systems make these human-like mistakes, we can bring all of our mistake-correcting systems to bear on their output. But the current crop of AI models -- particularly LLMs -- make mistakes differently.

AI errors come at seemingly random times, without any clustering around particular topics. LLM mistakes tend to be more evenly distributed through the knowledge space. A model might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats. And AI mistakes aren't accompanied by ignorance. A LLM will be just as confident when saying something completely wrong -- and obviously so, to a human -- as it will be when saying something true. The seemingly random inconsistency of LLMs makes it hard to trust their reasoning in complex, multi-step problems. If you want to use an AI model to help with a business problem, it's not enough to see that it understands what factors make a product profitable; you need to be sure it won't forget what money is.

[...] Humans may occasionally make seemingly random, incomprehensible, and inconsistent mistakes, but such occurrences are rare and often indicative of more serious problems. We also tend not to put people exhibiting these behaviors in decision-making positions. Likewise, we should confine AI decision-making systems to applications that suit their actual abilities -- while keeping the potential ramifications of their mistakes firmly in mind.

This discussion has been archived. No new comments can be posted.

AI Mistakes Are Very Different from Human Mistakes

Comments Filter:
  • We also tend not to put people exhibiting these behaviors in decision-making positions.

    Except when we put one of them in charge of our country... Twice....

    • Re: (Score:2, Troll)

      by TWX ( 665546 )

      yeah, this part stood out to me:

      A LLM will be just as confident when saying something completely wrong -- and obviously so, to a human -- as it will be when saying something true.

      as pretty much now empowering both ignorance as if it's equivalent to knowledge and experience, and now asserting that the ignorant person's views are fully valid even when based on bogus, "research."

      Assertion while ignorant or actively wrong is the sort of thing that a conman does, because the root of what a conman relies on is confidence, that's where the con- part comes from. AI may as well be a conman.

      • Its also the sort of thing someone with mental illness does as well.

        • AI is making mistakes because it is a few years old (3?). You know how shitty CPUs were at 3 years of development? Memory? Digital Camera sensors? You know how bad hard drives were? Or network connections? AI is that now. Wait 10 years, and development will make mistakes unimaginably rare and negligible, like mistakes in memory now. Some form of AI ECC (error correction code for those of us not hardware nerds) will probably come out, balancing checks across the LLMs, questioning the answers in portions prio
          • by Sique ( 173459 )
            AI is about 60 years old right now, starting with ELIZA (1966), then we have MYCIN (1972) and CYC (1984). DeepBlue was beating Garri Kasparov in 1996. Watson was playing Jeopardy in 2011. AlphaGo beat Fan Hui in 2015, Lee Seedol in 2016 and Ke Lie in 2017. If you want to compare AI to Microprocessors, then LLMs are something like the 32bit architecture - quite mature and capable, and already scaling up to 1 GHz and more.
            • AI is about 60 years old right now, starting with ELIZA (1966), then we have MYCIN (1972) and CYC (1984). DeepBlue was beating Garri Kasparov in 1996. Watson was playing Jeopardy in 2011. AlphaGo beat Fan Hui in 2015, Lee Seedol in 2016 and Ke Lie in 2017. If you want to compare AI to Microprocessors, then LLMs are something like the 32bit architecture - quite mature and capable, and already scaling up to 1 GHz and more.

              While field of AI generally is ancient the transformer craze just recently started circa 2017.

              • by Sique ( 173459 )
                There is quite a difference between the public reception of things and the State of Art. When I was at the university studying CS in the 1990ies, neural networks were all the rage. In the early 2000s, I was working as sysadmin in the CS department, and one research group was training an LLM, called "Semantic Search" at the time.
        • Comment removed based on user account deletion
      • ... empowering both ignorance as if it's equivalent to knowledge and experience, and now asserting that the ignorant person's views are fully valid even when based on bogus, "research." ...

        Often found in the phrase "Don't take my word, do your own research!," this word no longer connotes tirelessly combing over vast troves of available information on the topic, viewing each claim found there with a skeptical eye and weighing it on its merits, etc. These days it's more about seeking confirmation, helped out by the social media algorithms. Directly beneath my flat-earth video where I admonish you to do your own research, you'll find a helpful list of more flat-earth videos which you can use t

      • yeah, this part stood out to me:

        A LLM will be just as confident when saying something completely wrong -- and obviously so, to a human -- as it will be when saying something true.

        I find striking similarities between LLMs and very confident humans. They both have no problems making statements confidently because they are oblivious to or discount completely any consideration of being incorrect.

        I remember on one occasion my coworkers and I went to visit a renowned expert who was a VP and fellow at Amazon, in addition to having a bunch of national awards. During the conversation, we asked him a question, and he responded with an answer that was incorrect, which we knew was completely in

    • by Tablizer ( 95088 ) on Thursday January 23, 2025 @01:06PM (#65112573) Journal

      Thank You for absorbing mod points for saying what many of us want to but don't want the mod hit. As a reward, we'll name a small Gulf after you.

      Boring leaders are probably underrated. They keep the wheels turning smoothly without drama. I remember one Linux admin was upset that management claimed they didn't need a full-time admin because problems were rare. This person prided themselves in preventative maintenance.

      So they let them go and replaced them with a split-duty desktop support and server admin person who would let things rot and catch fire (perhaps because of lack of time), then act like a hero for putting the fire out. Management seemed to admire that to the dismay of the fired person. Drama sells; humans are merely yappity chimps who follow the shiny ball.

      • by gweihir ( 88907 )

        Pragmatic leaders are usually boring. But they can actually get stuff done, slowly and persistently. The flashy ones are doing damage and lighting straw-fires. And get people killed. Much better show, much worse outcome.

        • Pragmatic leaders are usually boring. But they can actually get stuff done, slowly and persistently. The flashy ones are doing damage and lighting straw-fires. And get people killed. Much better show, much worse outcome.

          And which is better for generating clickbait headlines and social media outrage streams? Gee, I wonder why the broligarchs are so happy about the new administration?

      • by Tyr07 ( 8900565 )

        You don't get to create a dumpster fire with everyone suffering then argue now that someone else is in charge it'll become a dumpster fire with everyone suffering.

        The lament of those who benefited from the government at the cost of everyone else. The entitled, elite who quietly bought expensive mansions on the back of everyone else. Silently, quietly, while telling you you're right, and that your special and you should have more rights than others. Being morally superior as long as someone else has to pay f

    • Except when we put one of them in charge of our country... Twice....

      That's democracy for you. Except their side keeps insisting that America isn't a democracy when I bring it up. So I'm wondering if Trump is actually President by their own rules?

      If the military thinks he's commander-in-chief then at the very least we're a military junta. And that's something at least.

      • by gweihir ( 88907 )

        Military Junta? No, _that_ requires a General in charge and Trump doe not even remotely have what that takes. For one thing, you need to be able to read for that rank. And little Elon really needs to work on his Nazi-Salute to be taken seriously.

    • Comment removed based on user account deletion
  • Synergy of Fuckage!

  • False (Score:1, Interesting)

    by Anonymous Coward

    That's simply not true.

    All LLMs start by default to answer they don't know if the topic isn't in their model.
    The couple public LLM services you're referring to are specifically programmed so after an "I don't know" answer, additional code kicks in to force it to throw out an answer.

    Whatever answer has the highest confidence score is returned. Even if that is 3% chance of being right and a 97% chance of being wrong.

    To compare like things, you can put the human in a Pulp Fiction scenario, with a gun to their

    • Re:False (Score:5, Interesting)

      by ThosLives ( 686517 ) on Thursday January 23, 2025 @01:15PM (#65112595) Journal

      That's not true for the small sample I've used of the free online LLMs. I asked a question and it simply made a statement as if it were true, and it was clearly not. Then I said in the prompt "that's not true; that person is not in that role any more" and the AI said, "oh yeah, you're right, I was limited by my information cutoff at the end of 2023."

      So why didn't it say "as of 2023, the information is..." or "I don't know right now, because I don't have up-to-date information"? Instead it simply answered with unfounded certainty.

      At least in this instance, I knew the information was wrong ahead of time, but I can imagine scenarios where I wouldn't have the knowledge (which is why I'd be asking the LLM for information) and wouldn't be able to detect the error.

      That's the worst problem with information - taking as fact without being able (or willing) to know if it is correct.

      • That's not true for the small sample I've used of the free online LLMs. I asked a question and it simply made a statement as if it were true, and it was clearly not.

        Most of the free online LLMs are complete trash.

        So why didn't it say "as of 2023, the information is..." or "I don't know right now, because I don't have up-to-date information"? Instead it simply answered with unfounded certainty.

        LLMs have no clue what today's date even is and only know they are an AI because the system prompt and RL bludgeon.

        At least in this instance, I knew the information was wrong ahead of time, but I can imagine scenarios where I wouldn't have the knowledge (which is why I'd be asking the LLM for information) and wouldn't be able to detect the error.

        Relying on output of an LLM to be accurate is like relying on the output of Google or the musings of randos in online forums the wrong tool for the job.

    • Re:False (Score:5, Interesting)

      by ClickOnThis ( 137803 ) on Thursday January 23, 2025 @01:37PM (#65112641) Journal

      All LLMs start by default to answer they don't know if the topic isn't in their model.

      My experience has been that LLMs can be quite conciliatory, when prompted appropriately.

      I remember seeing an example of this. A friend asked an LLM about the similarities between two works of art. Unfortunately, she made a mistake: one of works she mentioned was the wrong work of art, and she meant a different one. The LLM went ahead anyway, listing and explaining the various similarities it thought were present. It seemed as though the LLM thought it had been asked a valid question, and went ahead with an answer, instead of saying "that question makes no sense" or "I don't know."

      • What's really sad about that is that they should be able to use the AI itself to fix that by asking it if your question makes sense. But they don't seem to have figured out this obvious step.

  • by Kelxin ( 3417093 ) on Thursday January 23, 2025 @01:01PM (#65112555)
    This is the true age of FAFO. We have companies putting billions of dollars into these systems, trusting them with high stakes situations, and having these systems forced into our daily lives, cell phones, cars, etc. Isn't this going to be fun.
  • self driving cars can escape liability that humans can't.

    And in an really bad crash some human may be forced to take the criminal liability for an AI mistake.

  • by Smidge204 ( 605297 ) on Thursday January 23, 2025 @01:05PM (#65112567) Journal

    "...makes it hard to trust their reasoning in complex, multi-step problems."

    That's the problem. they do not do "reasoning." They string words together based on statistical models generated by analyzing the association of words with other words in their training data.

    You can't trust their reasoning because they don't use reasoning. There is no conceptual understanding underpinning the output.
    =Smidge=

    • So you proclaim to know how human understanding works?
    • You can't trust their reasoning because they don't use reasoning. There is no conceptual understanding underpinning the output. =Smidge=

      But AI is going to replace every job, except for management, by 4:30p tomorrow. Ironically enough, not using reasoning is what most management types are best at.

    • What do you mean by statiscal models?
      • by Smidge204 ( 605297 ) on Thursday January 23, 2025 @02:01PM (#65112725) Journal

        I mean statistical models.

        For every word, LLMs generate a table of values (called "embeddings") that represent statistical weights for when that word appears relative to other words. Literally thousands, sometimes tens of thousands, of data points. Per word.

        You then feed all that data into a machine learning algorithm, whose job it is to partition all this data and ultimately determine what the best words to output are given what words were input.

        If you ask an LLM for a description of a car, it will take the word "describe" and the word "car," process the huge dataset of words for more words that were associated with "describe" and "car" in the vast amount of data it was trained on, and put those found words together in a way that is also concordant with their relation to each other based and the rules of human language.

        That, in a nutshell, is how they work.

        This also makes it easier to understand how they hallucinate; If you ask an LLM to describe a car, and by some quirk of how the question was asked it determines the word "bird" is statistically relevant as it processes the data (perhaps part of the training data involved a journal of an ornithologist driving across South America in their Ford Falcon) and suddenly it's explaining how cars have distinctive feathers and specialized beaks. To doesn't understand what cars or birds are, only the statistical relation of those words to other words.
        =Smidge=

        • by marcle ( 1575627 )

          Thanks for your lucid explanation. You said exactly what I came here to say, only you said it better.

        • You are saying that the neural net is trained on a large set of statistical models of words and their relationship?
        • Which also explains why their errors are random. Unlike human beings who usually make mistakes that are based on poor reasoning or poor knowledge that can be corrected. If your description is truly accurate then it may not be possible to correct the errors and end the hallucinations. Random error may be built into the model.
        • If you ask an LLM for a description of a car, it will take the word "describe" and the word "car," process the huge dataset of words for more words that were associated with "describe" and "car" in the vast amount of data it was trained on, and put those found words together in a way that is also concordant with their relation to each other based and the rules of human language.

          While your explanation is fine, it can be misleading. Emergence is a thing. I can give someone Schrodinger's equation and claim

    • by ljw1004 ( 764174 )

      "...makes it hard to trust their reasoning in complex, multi-step problems." That's the problem. they do not do "reasoning." They string words together based on statistical models generated by analyzing the association of words with other words in their training data.

      By that token, I don't do reasoning either. (I assume that goes for other folks too but I can only introspect on my own mind not theirs so I'm not sure). The things I string together in my mind aren't words but concepts -- implications, logical connectives, contexts. What I'm writing write now is the output of (I think) two such concepts strung one after the other, and fleshed out. These concepts I recognize within my own thought processes feel very similar to the word-vectors that LLMs use.

      I'm not new to t

      • Stringing together concepts is reasoning. Concepts have semantic meaning by definition. To the LLM, the words are just tokens, which are statistically associated with other tokens. LLMs do not have any concept of the semantics behind those tokens, which we as thinking humans recognize as words.

        • by ljw1004 ( 764174 )

          Stringing together concepts is reasoning. Concepts have semantic meaning by definition. To the LLM, the words are just tokens, which are statistically associated with other tokens. LLMs do not have any concept of the semantics behind those tokens, which we as thinking humans recognize as words.

          That's a misleading way of describing LLMs. A better description is that each word in an LLM is a complex set of associations, 12,288 associations in GPT-3. Those aren't associations to other words though; in the first layers of the LLM they're associated to things like syntactic part of speech and referent; in the middle layers of the LLM they're associations to more complex things like "military bases" or "from Friday 7pm until"; in higher layers of the LLM they're associations to things like "the origina

    • That's the problem. they do not do "reasoning." They string words together based on statistical models generated by analyzing the association of words with other words in their training data.

      I wish more people understood this. ChatGPT et al are really just glorified ELIZA except with a bigger vocabulary. If your pool of possible complete sentences and keyword recognition is in the trillions instead a couple dozen, it's still just a call-and-response rote ritual.

    • That's the problem. they do not do "reasoning." They string words together based on statistical models generated by analyzing the association of words with other words in their training data.

      You might be thinking of the old n-gram models that did something like this. LLMs run a neural model and have the ability to generalize.

  • by MpVpRb ( 1423381 ) on Thursday January 23, 2025 @01:12PM (#65112583)

    AI is a great research project, and I believe that it will eventually allow us to solve previously intractable problems.
    Today's AI is a work in progress, with lots of problems.
    Unfortunately, investors want profit now, salesweasels exaggerate the capabilities of their products, clueless managers believe the pitches and spend lots of money on crap generators.
    It will get a lot worse before it gets better.

  • They would never elect an anti-constitutional narcissists vengeance seeker for president. makes total sense.
    • "don't get mad, get even." - John Kennedy

      Successful politician and "narcissist" are almost synonyms.

      • There is nothing special about the current president as a leader.
      • What is different is that he is a professional celebrity in an age of celebrity and most politicians are just amateurs.
      • He has a hit show going right now with a celebrity cast that is holding everyone's imagination.
      • The show's biggest and best promoters are people opposed to him who keep it in the limelight.

      "I don't care what you say about m

  • AI Explanations (Score:5, Interesting)

    by alvinrod ( 889928 ) on Thursday January 23, 2025 @01:17PM (#65112597)
    I don't think these AI programs will be useful until they can explain the reasoning behind their answers. Current LLMs can't do this beyond further regurgitating what they rank as the most likely good response, but I think an AI that is wrong but can show you how it got to that result is much more useful than one that just outputs something even if that something is correct more than the AI that can explain the steps used to arrive at the result.
    • It seems that most human "reasoning" is rationalization after the fact. Humans are not very good at formal reasoning. Yes, we can be trained to do it, but most never receive that training and even those that do make most of their every day decisions in a "quick and dirty" way. The quick and dirty way is "sub-conscious" (though I think that term is going out of favor). Try 'Thinking Fast and Slow' by Kahneman. It is a "popular science" book, but the author is an authority. In any case we don't actually have

      • An interesting question is whether you can gain the useful characteristics of human intelligence and retain the correctness of formal logic.

        I would say yes and no. The AI can use NN techniques to "guess" good solutions to a formal system and then use resolution techniques to verify. This will make them very good at math. The problem is, as you point out, formal systems are so restrictive.

        I don't know, but it seems that the limitations of the formal logic approach are rooted in the practical impossibil

  • Why can AI not be designed such that it rates its own output based on the strength of the pattern match to training data... And put a floor so some responses come back as "not enough data for a meaningful answer"?

    • by gweihir ( 88907 )

      It already uses that "strength" to come up with its answers. Doing it twice does not get you anything. Now, a confidence score would probably be possible, but then most of what LLMs output would not qualify and the illusion of competence would completely go away.

      The whole thing is a scam. It depends on "answers" being given without quality check or the thing collapses.

    • Why can AI not be designed such that it rates its own output based on the strength of the pattern match to training data... And put a floor so some responses come back as "not enough data for a meaningful answer"?

      You can't sell a "better than a human" device if it's main response is, "Not enough data for a meaningful answer." You can sell a device that spouts out wrong answers with great confidence, because we've somehow turned into a culture that values confident idiocy over carefully reasoned answers or circumspect recalcitrance in the face of something that can't be answered.

  • by Lavandera ( 7308312 ) on Thursday January 23, 2025 @01:23PM (#65112613)

    MAGA folks are absolutely confident when they make mistakes and their mistakes are not clustered around specific subject...

  • by smooth wombat ( 796938 ) on Thursday January 23, 2025 @01:25PM (#65112619) Journal
    A LLM will be just as confident when saying something completely wrong -- and obviously so, to a human

    Now we know we're living in a simulation. All those people sounding so confident when blatantly lying about one thing or another.
    • by gweihir ( 88907 )

      Actually, there is a nice research result: Conservatives judge truthfulness of a statement by the confidence displayed by the person making them. While people with actually working minds do fact-checking.

      • There is a tweet out there from a MAGA who is complaining how difficult it is to go after someone on the other side because they use facts. In essence, admitting their side lies while the other tells the truth.

    • by dfghjk ( 711126 )

      And LLMs do not exhibit "confidence" at all, they are computer programs, nothing more. They do not have egos.

  • Funny anecdote. (Score:4, Interesting)

    by Anonymous Coward on Thursday January 23, 2025 @01:41PM (#65112661)

    I was curious what chatgpt knew about almost-integer results in math. Think like where expressions including pi and e and other irrational numbers result in a number that is close to an integer in under a part per million.

    So one of the results it gave me was the result that sqrt(2)^10 ~= 1000.0004 which isn't even close to correct (it's 32, which is a trivial uninteresting result). But the blather and verbiage in that paragraph sounded very convincing, which is what chatgpt excels at: generating tokens that appears like convincing text.

  • Make a composite model. The same way MoralityGPT adds a correcting layer to GPT4. A composite AI model, models of models, where multiple layers interact in a timey wimey wibbly wobbly... thing. Just like layers of natural self control, layers of artificial self control could be built around the core model. Will this bring about the Age of Strife? Not in my children's children's life time.
  • A model might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats..

    What about the cabbages serving the goats at their restaurants?

    A LLM will be just as confident when saying something completely wrongâ"and obviously so, to a humanâ"as it will be when saying something true..

    So there's this guy...

    We also tend not to put people exhibiting these behaviors in decision-making positions.

    Either you're proposing that the newly re-elected President of the U

  • Just because we have never seen it happen ...
  • by laughingskeptic ( 1004414 ) on Thursday January 23, 2025 @02:12PM (#65112773)
    Played with DeepSeek and found:
    1. It was very good at taking a prose description of a fairly complex linear algebra problem and correctly turning it into an equation
    2. Rendered the matrix based equations nicely.
    3. Generated correct matrices and Python code to represent solving the problem.
    4. Generated a completely INCORRECT output vector for a solution that would not have been the result of actually running the Python code.

    All the nicely displayed work could lead one to have confidence in the output, but the reality is, EVERYTHING is generated even the "Answer" that a human would have obtained by running the code.
  • "We also tend not to put people exhibiting these behaviors in decision-making positions"

      I want to be with the We in this statement.

  • by quax ( 19371 ) on Thursday January 23, 2025 @03:52PM (#65113087)

    I've been testing various LLMs to assist with coding.

    I found they make utterly "inhumane" mistakes. For instance they have expert knowledge on Makefile syntax, yet a Makefile that one model generated for me contained a utterly stupid, beginner level mistake. Astoundingly I found models to be also unable to correct coding mistakes after you pointed these mistakes out to them.

    This kind of untrustworthiness is unfortunately build into the very fabric of these stochastic machines. Their reasoning is not grounded in formal inference, and that's a huge problem, it makes it difficult for humans to check their work, and it means the models fall short in explaining their reasoning and their ability to catch errors and self-correct.

  • We would put someone who acts inconstantly in a high level position, except maybe for 45 and 46 POTUS. We would trust humans that say false stuff with confidence, except if they are a CEO or a consultant.

    AI is built to empower those who can say stuff with confidence without knowing what they're talking about. Anyone else will doubt the AI so much that the productivity increase it provides will be less, even less so if we account that actually competent people produce a quality of work that is harder to repl

  • It is fundamentally wrong to describe undesirable output from AI as a "mistake". AI is deterministic, it is computer software. It produces what it is made to produce.

    Just because programmers do not understand and cannot predict the behavior of the systems they create does not mean that those systems have human qualities. They want you to believe they do, so don't play along by describing the behaviors in human terms.

  • There's some great existing research on this by Cesar Hidalgo. showing that people are far more forgiving when the mistake is made by a human vs when made by a machine. Search for "How Humans Judge Machines" and "Cesar Hidalgo" There are youtube videos and you can download his book fpr free with all the research details.

  • by WaffleMonster ( 969671 ) on Thursday January 23, 2025 @05:27PM (#65113383)

    Similarities between people and LLMs are interesting. You can be really terse using terrible spelling and grammar and it'll still understand what you are saying or at least qualify response to avoid ambiguity.

    It misremembers things in ways where it gives you the gist of the text but don't get exact wording right. The fidelity is mostly related to relative frequency of exposure to the material. It can be reminded/lead into remembering or considering something it didn't consider before.

    LLMs are terrible at answering counterintuitive questions, terrible at attribution, terrible at recalling random sequences, URLs, phone numbers especially smaller models.

    LLMs seem rather superhuman in what they are able to recall and _do_ by rote alone but rote seems to be all there is.

  • Press keeps glossing over that LLM "AI" has no understanding, it's just a fancy autocomplete.
    It's never making a mistake at all, it's just producing from the data provided.
    Any mistake is on the user not vetting the provided info from the tool.
    The tool has way of knowing or measuring the veracity at all.
    A hammer doesn't understand if it's hitting a nail or screw, neither is a mistake by it.

  • We also tend not to put people exhibiting these behaviors in decision-making positions. ....except for in election years..

  • I have the impression LLMs are more close to our inner monologue. It lacks meta cognition and it's other executive functions simply are 't there yet. Still a lot of work to do.

Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing?

Working...