
AI Mistakes Are Very Different from Human Mistakes (schneier.com) 114
Bruce Schneier and Nathan E. Sanders, writing in a post: Someone who makes calculus mistakes is also likely to respond "I don't know" to calculus-related questions. To the extent that AI systems make these human-like mistakes, we can bring all of our mistake-correcting systems to bear on their output. But the current crop of AI models -- particularly LLMs -- make mistakes differently.
AI errors come at seemingly random times, without any clustering around particular topics. LLM mistakes tend to be more evenly distributed through the knowledge space. A model might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats. And AI mistakes aren't accompanied by ignorance. A LLM will be just as confident when saying something completely wrong -- and obviously so, to a human -- as it will be when saying something true. The seemingly random inconsistency of LLMs makes it hard to trust their reasoning in complex, multi-step problems. If you want to use an AI model to help with a business problem, it's not enough to see that it understands what factors make a product profitable; you need to be sure it won't forget what money is.
[...] Humans may occasionally make seemingly random, incomprehensible, and inconsistent mistakes, but such occurrences are rare and often indicative of more serious problems. We also tend not to put people exhibiting these behaviors in decision-making positions. Likewise, we should confine AI decision-making systems to applications that suit their actual abilities -- while keeping the potential ramifications of their mistakes firmly in mind.
AI errors come at seemingly random times, without any clustering around particular topics. LLM mistakes tend to be more evenly distributed through the knowledge space. A model might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats. And AI mistakes aren't accompanied by ignorance. A LLM will be just as confident when saying something completely wrong -- and obviously so, to a human -- as it will be when saying something true. The seemingly random inconsistency of LLMs makes it hard to trust their reasoning in complex, multi-step problems. If you want to use an AI model to help with a business problem, it's not enough to see that it understands what factors make a product profitable; you need to be sure it won't forget what money is.
[...] Humans may occasionally make seemingly random, incomprehensible, and inconsistent mistakes, but such occurrences are rare and often indicative of more serious problems. We also tend not to put people exhibiting these behaviors in decision-making positions. Likewise, we should confine AI decision-making systems to applications that suit their actual abilities -- while keeping the potential ramifications of their mistakes firmly in mind.
We also tend not to put people exhibiting these... (Score:2, Insightful)
We also tend not to put people exhibiting these behaviors in decision-making positions.
Except when we put one of them in charge of our country... Twice....
Re: (Score:2, Troll)
yeah, this part stood out to me:
A LLM will be just as confident when saying something completely wrong -- and obviously so, to a human -- as it will be when saying something true.
as pretty much now empowering both ignorance as if it's equivalent to knowledge and experience, and now asserting that the ignorant person's views are fully valid even when based on bogus, "research."
Assertion while ignorant or actively wrong is the sort of thing that a conman does, because the root of what a conman relies on is confidence, that's where the con- part comes from. AI may as well be a conman.
Re: (Score:3)
Its also the sort of thing someone with mental illness does as well.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
AI is about 60 years old right now, starting with ELIZA (1966), then we have MYCIN (1972) and CYC (1984). DeepBlue was beating Garri Kasparov in 1996. Watson was playing Jeopardy in 2011. AlphaGo beat Fan Hui in 2015, Lee Seedol in 2016 and Ke Lie in 2017. If you want to compare AI to Microprocessors, then LLMs are something like the 32bit architecture - quite mature and capable, and already scaling up to 1 GHz and more.
While field of AI generally is ancient the transformer craze just recently started circa 2017.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
... empowering both ignorance as if it's equivalent to knowledge and experience, and now asserting that the ignorant person's views are fully valid even when based on bogus, "research." ...
Often found in the phrase "Don't take my word, do your own research!," this word no longer connotes tirelessly combing over vast troves of available information on the topic, viewing each claim found there with a skeptical eye and weighing it on its merits, etc. These days it's more about seeking confirmation, helped out by the social media algorithms. Directly beneath my flat-earth video where I admonish you to do your own research, you'll find a helpful list of more flat-earth videos which you can use t
Re: (Score:2)
Re: (Score:2)
yeah, this part stood out to me:
A LLM will be just as confident when saying something completely wrong -- and obviously so, to a human -- as it will be when saying something true.
I find striking similarities between LLMs and very confident humans. They both have no problems making statements confidently because they are oblivious to or discount completely any consideration of being incorrect.
I remember on one occasion my coworkers and I went to visit a renowned expert who was a VP and fellow at Amazon, in addition to having a bunch of national awards. During the conversation, we asked him a question, and he responded with an answer that was incorrect, which we knew was completely in
Re: (Score:3)
Re:We also tend not to put people exhibiting these (Score:5, Interesting)
Thank You for absorbing mod points for saying what many of us want to but don't want the mod hit. As a reward, we'll name a small Gulf after you.
Boring leaders are probably underrated. They keep the wheels turning smoothly without drama. I remember one Linux admin was upset that management claimed they didn't need a full-time admin because problems were rare. This person prided themselves in preventative maintenance.
So they let them go and replaced them with a split-duty desktop support and server admin person who would let things rot and catch fire (perhaps because of lack of time), then act like a hero for putting the fire out. Management seemed to admire that to the dismay of the fired person. Drama sells; humans are merely yappity chimps who follow the shiny ball.
Re: (Score:2)
Pragmatic leaders are usually boring. But they can actually get stuff done, slowly and persistently. The flashy ones are doing damage and lighting straw-fires. And get people killed. Much better show, much worse outcome.
Re: (Score:2)
Pragmatic leaders are usually boring. But they can actually get stuff done, slowly and persistently. The flashy ones are doing damage and lighting straw-fires. And get people killed. Much better show, much worse outcome.
And which is better for generating clickbait headlines and social media outrage streams? Gee, I wonder why the broligarchs are so happy about the new administration?
Re: (Score:2)
Indeed. Dark times.
Re: (Score:2)
You don't get to create a dumpster fire with everyone suffering then argue now that someone else is in charge it'll become a dumpster fire with everyone suffering.
The lament of those who benefited from the government at the cost of everyone else. The entitled, elite who quietly bought expensive mansions on the back of everyone else. Silently, quietly, while telling you you're right, and that your special and you should have more rights than others. Being morally superior as long as someone else has to pay f
Re: We also tend not to put people exhibiting thes (Score:2)
Trump and Elon will do much more of all the things you say Biden did (some of them are even true) so yes, it will be much worse.
Re: (Score:2)
Except when we put one of them in charge of our country... Twice....
That's democracy for you. Except their side keeps insisting that America isn't a democracy when I bring it up. So I'm wondering if Trump is actually President by their own rules?
If the military thinks he's commander-in-chief then at the very least we're a military junta. And that's something at least.
Re: (Score:1)
Military Junta? No, _that_ requires a General in charge and Trump doe not even remotely have what that takes. For one thing, you need to be able to read for that rank. And little Elon really needs to work on his Nazi-Salute to be taken seriously.
Re: (Score:2)
Re: (Score:3, Insightful)
Maybe every 80 years or so we collectively have to see the carnage of a demagogue first hand to truly understand that surviving requires accepting nuance rather than a big gimmicky mallet.
Re: (Score:2, Offtopic)
Sad as it is, it does look like it. Most people have trouble learning from personal experience. That they are incapable of learning from history doe not come as much of a surprise. And little Elon proudly displaying his skill at the Nazi-Salute really says it all.
Re: (Score:2)
I mean you're saying a lot to twist it that way. You're trying to say the rich guy who makes electric cars actually wants people to get put into interment camps, horribly and brutally murdered and tortured for years doing unspeakable horrors to people. You're comparing him to that. Can you actually see him and any evidence of his activities point to that?
Horrible people are the ones who immediately compare someone who hasn't killed anyone to one of the worst killers in the recent history because they didn't
Re: (Score:1)
Re: We also tend not to put people exhibiting thes (Score:1)
The Doctor Strangelove salute that seems to have affected control of his arm is concerning. That and his allegiance to a man Goodwin himself says is legitimate to compare to Hitler are a couple of red flags. Along with his, erm, fascination with the AfD and Tommy Robinson. It does strengthen the case.
Although,
Re: (Score:2)
Nope. He put himself in that group. Or he just crapped on all the millions the fascists murdered for a gesture. That is really not much better. And do not even dare to tell me he did not know exactly what he was doing.
Seriously, you sound positively deranged trying to make this about me. It is not. It is about Musk and what he clearly did.
Re: We also tend not to put people exhibiting thes (Score:2)
There is no twisting. He threw the Nazi salute twice, pretended it was something else, and edited it out of the video he posted of the event. Elon may not literally be a Nazi, but he is definitely a white supremacist (which is anything but surprising for a rich white South African) and he definitely is deliberately boosting Nazi speech.
You know what's shit? Making excuses for him.
You're going to have to make up your mind between "Elon is a genius" and "Elon accidentally used a Nazi salute" as they are wholl
Re: (Score:3)
Henry Ford.
Is there something about automakers that makes them love fascists and hate Jews?
Reference: https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:3)
You know who also hated Jews and liked Adolph Hitler? Henry Ford.
Henry Ford liked Hitler, but the feeling was mutual. Hitler idolized Ford.
Is there something about automakers that makes them love fascists and hate Jews?
Building organizations as big as Ford and Tesla requires laser focus and dedication. It is mostly about solving problems about "things" where people are just an annoying necessity.
"Why is it that I always get a whole person when all I want is a pair of hands?" -- Henry Ford
Both Ford and Musk were/are likely Aspies, leading to poor judgment about what is socially acceptable.
Ford had problems with bankers, especially early in his life
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I don't know why people want to appropriate Nazi culture and apply it to other aspects, it shows how this is just twisted manipulation is.
Nazi's butchered jewish people. are you saying he wants to hurt jewish people and purge them? No, you're trying to take one of the worst people in history and make that him to further your own goals, and say it somehow applies to other minorities or something. He's not a regular Nazi! He's a SUPER Nazi.
Even if he is as evil as people want to make him, why the hell would
Re: (Score:2)
Re: (Score:1)
Biden did do a lot of damage in only four years. He outdid Harding's corruption level even considering inflation.
Re: (Score:1)
Bullshit. Inflation was mostly world-caused, Joe can't control factor shutdowns in Asia. Many countries had inflation much higher than ours as evidence. As far as "corruption", I've seen ZERO clear evidence, only Fox pundit rumors. You are gullible, Repent!
Re: (Score:2)
Correction: "Joe can't control factory shutdowns in Asia."
My work lingo habits leaked out.
Addendum: US pumping more oil would only change grocery prices a fraction of the cent. OPEC has far more control over oil prices than a US President.
Re: (Score:2)
Democracy is difficult if you have a lot of idiots involved. Usually the idiots stay home and dismiss the political process as this annoying inconvenience.
When people actively stir up the idiots and get them to vote, and both parties do this, you end up with multiple elections that a best described as a dumpster fire. Nobody knows what's going on. Nobody can predict the results. And people can't even accept the results when its all over.
We should just like Elon Musk pick the head of state for us. It would n
Re: (Score:2)
Democracy is difficult if you have a lot of idiots involved. Usually the idiots stay home and dismiss the political process as this annoying inconvenience. When people actively stir up the idiots and get them to vote, and both parties do this, you end up with multiple elections that a best described as a dumpster fire. Nobody knows what's going on. Nobody can predict the results. And people can't even accept the results when its all over.
We should just like Elon Musk pick the head of state for us. It would not be a fair system. But at least we'd be honest with ourselves about what kind of system we have.
In a lot of ways, we pretty much let the oligarchs pick for us now, Musk being a part of them. The algorithms of social media have been shaping our society for long enough to have, if not an outright stranglehold, at least a fairly firm grip on the overall direction of the information the voting public consider true. The other media outlets lap up whatever the algorithms are pumping out and report it as news. The vote is simply a reactionary outlet for the public based on whatever media they've managed to s
Re: (Score:2)
I for one welcome our new (Score:1)
Synergy of Fuckage!
False (Score:1, Interesting)
That's simply not true.
All LLMs start by default to answer they don't know if the topic isn't in their model.
The couple public LLM services you're referring to are specifically programmed so after an "I don't know" answer, additional code kicks in to force it to throw out an answer.
Whatever answer has the highest confidence score is returned. Even if that is 3% chance of being right and a 97% chance of being wrong.
To compare like things, you can put the human in a Pulp Fiction scenario, with a gun to their
Re:False (Score:5, Interesting)
That's not true for the small sample I've used of the free online LLMs. I asked a question and it simply made a statement as if it were true, and it was clearly not. Then I said in the prompt "that's not true; that person is not in that role any more" and the AI said, "oh yeah, you're right, I was limited by my information cutoff at the end of 2023."
So why didn't it say "as of 2023, the information is..." or "I don't know right now, because I don't have up-to-date information"? Instead it simply answered with unfounded certainty.
At least in this instance, I knew the information was wrong ahead of time, but I can imagine scenarios where I wouldn't have the knowledge (which is why I'd be asking the LLM for information) and wouldn't be able to detect the error.
That's the worst problem with information - taking as fact without being able (or willing) to know if it is correct.
Re: (Score:2)
That's not true for the small sample I've used of the free online LLMs. I asked a question and it simply made a statement as if it were true, and it was clearly not.
Most of the free online LLMs are complete trash.
So why didn't it say "as of 2023, the information is..." or "I don't know right now, because I don't have up-to-date information"? Instead it simply answered with unfounded certainty.
LLMs have no clue what today's date even is and only know they are an AI because the system prompt and RL bludgeon.
At least in this instance, I knew the information was wrong ahead of time, but I can imagine scenarios where I wouldn't have the knowledge (which is why I'd be asking the LLM for information) and wouldn't be able to detect the error.
Relying on output of an LLM to be accurate is like relying on the output of Google or the musings of randos in online forums the wrong tool for the job.
Re:False (Score:5, Interesting)
All LLMs start by default to answer they don't know if the topic isn't in their model.
My experience has been that LLMs can be quite conciliatory, when prompted appropriately.
I remember seeing an example of this. A friend asked an LLM about the similarities between two works of art. Unfortunately, she made a mistake: one of works she mentioned was the wrong work of art, and she meant a different one. The LLM went ahead anyway, listing and explaining the various similarities it thought were present. It seemed as though the LLM thought it had been asked a valid question, and went ahead with an answer, instead of saying "that question makes no sense" or "I don't know."
Re: False (Score:2)
What's really sad about that is that they should be able to use the AI itself to fix that by asking it if your question makes sense. But they don't seem to have figured out this obvious step.
We've been watching it on the horizon. (Score:5, Insightful)
self driving cars can escape liability that humans (Score:2, Interesting)
self driving cars can escape liability that humans can't.
And in an really bad crash some human may be forced to take the criminal liability for an AI mistake.
Re: (Score:1)
You mean the stockholders of the auto company that made the AI car? I think it's fair that they take that liability - their company didn't make the car safe enough.
Re: (Score:1)
they should but they will with the EULA push it off the renter (rider) or some local franchise (think amazon DSP) that has little power over the cars software and does have the funds to pay out much.
Re: (Score:2)
There's a definite limit to EULAs. There are basic legal responsibilities that cannot be abrogated by any contract.
Re: self driving cars can escape liability that hu (Score:1)
Yes, the entire point of the stock market is that it separates responsibility from profit. It is inherently harmful as a result.
Well there's your problem (Score:5, Informative)
"...makes it hard to trust their reasoning in complex, multi-step problems."
That's the problem. they do not do "reasoning." They string words together based on statistical models generated by analyzing the association of words with other words in their training data.
You can't trust their reasoning because they don't use reasoning. There is no conceptual understanding underpinning the output.
=Smidge=
Re: Well there's your problem (Score:1)
Re: Well there's your problem (Score:5, Insightful)
Of course not. Fortunately, how human understanding works is completely irrelevant; we know how LLMs work plenty well enough to understand why they shouldn't be trusted.
=Smidge=
Re: (Score:2)
You can't trust their reasoning because they don't use reasoning. There is no conceptual understanding underpinning the output. =Smidge=
But AI is going to replace every job, except for management, by 4:30p tomorrow. Ironically enough, not using reasoning is what most management types are best at.
Re: Well there's your problem (Score:2)
Re: Well there's your problem (Score:5, Informative)
I mean statistical models.
For every word, LLMs generate a table of values (called "embeddings") that represent statistical weights for when that word appears relative to other words. Literally thousands, sometimes tens of thousands, of data points. Per word.
You then feed all that data into a machine learning algorithm, whose job it is to partition all this data and ultimately determine what the best words to output are given what words were input.
If you ask an LLM for a description of a car, it will take the word "describe" and the word "car," process the huge dataset of words for more words that were associated with "describe" and "car" in the vast amount of data it was trained on, and put those found words together in a way that is also concordant with their relation to each other based and the rules of human language.
That, in a nutshell, is how they work.
This also makes it easier to understand how they hallucinate; If you ask an LLM to describe a car, and by some quirk of how the question was asked it determines the word "bird" is statistically relevant as it processes the data (perhaps part of the training data involved a journal of an ornithologist driving across South America in their Ford Falcon) and suddenly it's explaining how cars have distinctive feathers and specialized beaks. To doesn't understand what cars or birds are, only the statistical relation of those words to other words.
=Smidge=
Re: (Score:2)
Thanks for your lucid explanation. You said exactly what I came here to say, only you said it better.
Re: Well there's your problem (Score:2)
Re: (Score:2)
Re: (Score:2)
While your explanation is fine, it can be misleading. Emergence is a thing. I can give someone Schrodinger's equation and claim
Re: (Score:2)
"...makes it hard to trust their reasoning in complex, multi-step problems." That's the problem. they do not do "reasoning." They string words together based on statistical models generated by analyzing the association of words with other words in their training data.
By that token, I don't do reasoning either. (I assume that goes for other folks too but I can only introspect on my own mind not theirs so I'm not sure). The things I string together in my mind aren't words but concepts -- implications, logical connectives, contexts. What I'm writing write now is the output of (I think) two such concepts strung one after the other, and fleshed out. These concepts I recognize within my own thought processes feel very similar to the word-vectors that LLMs use.
I'm not new to t
Re: (Score:3)
Stringing together concepts is reasoning. Concepts have semantic meaning by definition. To the LLM, the words are just tokens, which are statistically associated with other tokens. LLMs do not have any concept of the semantics behind those tokens, which we as thinking humans recognize as words.
Re: (Score:3)
Stringing together concepts is reasoning. Concepts have semantic meaning by definition. To the LLM, the words are just tokens, which are statistically associated with other tokens. LLMs do not have any concept of the semantics behind those tokens, which we as thinking humans recognize as words.
That's a misleading way of describing LLMs. A better description is that each word in an LLM is a complex set of associations, 12,288 associations in GPT-3. Those aren't associations to other words though; in the first layers of the LLM they're associated to things like syntactic part of speech and referent; in the middle layers of the LLM they're associations to more complex things like "military bases" or "from Friday 7pm until"; in higher layers of the LLM they're associations to things like "the origina
Re: (Score:2)
That's the problem. they do not do "reasoning." They string words together based on statistical models generated by analyzing the association of words with other words in their training data.
I wish more people understood this. ChatGPT et al are really just glorified ELIZA except with a bigger vocabulary. If your pool of possible complete sentences and keyword recognition is in the trillions instead a couple dozen, it's still just a call-and-response rote ritual.
Re: (Score:2)
That's the problem. they do not do "reasoning." They string words together based on statistical models generated by analyzing the association of words with other words in their training data.
You might be thinking of the old n-gram models that did something like this. LLMs run a neural model and have the ability to generalize.
We need more skepticism (Score:3)
AI is a great research project, and I believe that it will eventually allow us to solve previously intractable problems.
Today's AI is a work in progress, with lots of problems.
Unfortunately, investors want profit now, salesweasels exaggerate the capabilities of their products, clueless managers believe the pitches and spend lots of money on crap generators.
It will get a lot worse before it gets better.
people are rational! (Score:1)
Re: (Score:2)
"don't get mad, get even." - John Kennedy
Successful politician and "narcissist" are almost synonyms.
"I don't care what you say about m
AI Explanations (Score:5, Interesting)
Re: (Score:2)
It seems that most human "reasoning" is rationalization after the fact. Humans are not very good at formal reasoning. Yes, we can be trained to do it, but most never receive that training and even those that do make most of their every day decisions in a "quick and dirty" way. The quick and dirty way is "sub-conscious" (though I think that term is going out of favor). Try 'Thinking Fast and Slow' by Kahneman. It is a "popular science" book, but the author is an authority. In any case we don't actually have
Re: (Score:2)
I would say yes and no. The AI can use NN techniques to "guess" good solutions to a formal system and then use resolution techniques to verify. This will make them very good at math. The problem is, as you point out, formal systems are so restrictive.
I don't get it (Score:2)
Why can AI not be designed such that it rates its own output based on the strength of the pattern match to training data... And put a floor so some responses come back as "not enough data for a meaningful answer"?
Re: (Score:2)
It already uses that "strength" to come up with its answers. Doing it twice does not get you anything. Now, a confidence score would probably be possible, but then most of what LLMs output would not qualify and the illusion of competence would completely go away.
The whole thing is a scam. It depends on "answers" being given without quality check or the thing collapses.
Re: (Score:3)
Why can AI not be designed such that it rates its own output based on the strength of the pattern match to training data... And put a floor so some responses come back as "not enough data for a meaningful answer"?
You can't sell a "better than a human" device if it's main response is, "Not enough data for a meaningful answer." You can sell a device that spouts out wrong answers with great confidence, because we've somehow turned into a culture that values confident idiocy over carefully reasoned answers or circumspect recalcitrance in the face of something that can't be answered.
Sounds like AI is MAGA... (Score:3, Funny)
MAGA folks are absolutely confident when they make mistakes and their mistakes are not clustered around specific subject...
Re: (Score:3, Funny)
Great, now we have artificial Nazis. Technology is really wonderful.
That explains things (Score:4, Funny)
Now we know we're living in a simulation. All those people sounding so confident when blatantly lying about one thing or another.
Re: (Score:1)
Actually, there is a nice research result: Conservatives judge truthfulness of a statement by the confidence displayed by the person making them. While people with actually working minds do fact-checking.
Re: (Score:1)
There is a tweet out there from a MAGA who is complaining how difficult it is to go after someone on the other side because they use facts. In essence, admitting their side lies while the other tells the truth.
Re: (Score:2)
They use facts? How dare they! That is fighting dirty when we just have beliefs!
Re: (Score:2)
You really believe that, do you?
Well, failure at being a rational human with some actual understanding of reality does not get much more pronounced than what you just demonstrated.
Re: (Score:2)
So you're saying liberals are fact-checkers and conservatives aren't?
Well then, you tell me who has a better grasp on the truth.
Re: (Score:2)
And LLMs do not exhibit "confidence" at all, they are computer programs, nothing more. They do not have egos.
Funny anecdote. (Score:4, Interesting)
I was curious what chatgpt knew about almost-integer results in math. Think like where expressions including pi and e and other irrational numbers result in a number that is close to an integer in under a part per million.
So one of the results it gave me was the result that sqrt(2)^10 ~= 1000.0004 which isn't even close to correct (it's 32, which is a trivial uninteresting result). But the blather and verbiage in that paragraph sounded very convincing, which is what chatgpt excels at: generating tokens that appears like convincing text.
Give ChatGPT Schizophrenia (Score:1)
The author clearly hasn't been paying attention. (Score:2)
What about the cabbages serving the goats at their restaurants?
So there's this guy...
Either you're proposing that the newly re-elected President of the U
but we cant disprove cabbages eat goats (Score:2)
My favorite: The didn't actually do the math error (Score:4, Insightful)
1. It was very good at taking a prose description of a fairly complex linear algebra problem and correctly turning it into an equation
2. Rendered the matrix based equations nicely.
3. Generated correct matrices and Python code to represent solving the problem.
4. Generated a completely INCORRECT output vector for a solution that would not have been the result of actually running the Python code.
All the nicely displayed work could lead one to have confidence in the output, but the reality is, EVERYTHING is generated even the "Answer" that a human would have obtained by running the code.
We in this statement: (Score:1)
"We also tend not to put people exhibiting these behaviors in decision-making positions"
I want to be with the We in this statement.
We make the worst mistake makers into presidents (Score:2)
Both sides send their worst.
A foundational problem (Score:5, Interesting)
I've been testing various LLMs to assist with coding.
I found they make utterly "inhumane" mistakes. For instance they have expert knowledge on Makefile syntax, yet a Makefile that one model generated for me contained a utterly stupid, beginner level mistake. Astoundingly I found models to be also unable to correct coding mistakes after you pointed these mistakes out to them.
This kind of untrustworthiness is unfortunately build into the very fabric of these stochastic machines. Their reasoning is not grounded in formal inference, and that's a huge problem, it makes it difficult for humans to check their work, and it means the models fall short in explaining their reasoning and their ability to catch errors and self-correct.
Have you read the news lately ;p (Score:2)
We would put someone who acts inconstantly in a high level position, except maybe for 45 and 46 POTUS. We would trust humans that say false stuff with confidence, except if they are a CEO or a consultant.
AI is built to empower those who can say stuff with confidence without knowing what they're talking about. Anyone else will doubt the AI so much that the productivity increase it provides will be less, even less so if we account that actually competent people produce a quality of work that is harder to repl
mistakes? (Score:2)
It is fundamentally wrong to describe undesirable output from AI as a "mistake". AI is deterministic, it is computer software. It produces what it is made to produce.
Just because programmers do not understand and cannot predict the behavior of the systems they create does not mean that those systems have human qualities. They want you to believe they do, so don't play along by describing the behaviors in human terms.
Existing research on how humans judge machines (Score:1)
There's some great existing research on this by Cesar Hidalgo. showing that people are far more forgiving when the mistake is made by a human vs when made by a machine. Search for "How Humans Judge Machines" and "Cesar Hidalgo" There are youtube videos and you can download his book fpr free with all the research details.
Uncanny similarities (Score:3)
Similarities between people and LLMs are interesting. You can be really terse using terrible spelling and grammar and it'll still understand what you are saying or at least qualify response to avoid ambiguity.
It misremembers things in ways where it gives you the gist of the text but don't get exact wording right. The fidelity is mostly related to relative frequency of exposure to the material. It can be reminded/lead into remembering or considering something it didn't consider before.
LLMs are terrible at answering counterintuitive questions, terrible at attribution, terrible at recalling random sequences, URLs, phone numbers especially smaller models.
LLMs seem rather superhuman in what they are able to recall and _do_ by rote alone but rote seems to be all there is.
"Understanding" (Score:2)
Press keeps glossing over that LLM "AI" has no understanding, it's just a fancy autocomplete.
It's never making a mistake at all, it's just producing from the data provided.
Any mistake is on the user not vetting the provided info from the tool.
The tool has way of knowing or measuring the veracity at all.
A hammer doesn't understand if it's hitting a nail or screw, neither is a mistake by it.
Exception to that rule... (Score:2)
We also tend not to put people exhibiting these behaviors in decision-making positions. ....except for in election years..
it lacks executive functions. (Score:2)