
Asking Chatbots For Short Answers Can Increase Hallucinations, Study Finds (techcrunch.com) 47
Requesting concise answers from AI chatbots significantly increases their tendency to hallucinate, according to new research from Paris-based AI testing company Giskard. The study found that leading models -- including OpenAI's GPT-4o, Mistral Large, and Anthropic's Claude 3.7 Sonnet -- sacrifice factual accuracy when instructed to keep responses short.
"When forced to keep it short, models consistently choose brevity over accuracy," Giskard researchers noted, explaining that models lack sufficient "space" to acknowledge false premises and offer proper rebuttals. Even seemingly innocuous prompts like "be concise" can undermine a model's ability to debunk misinformation.
"When forced to keep it short, models consistently choose brevity over accuracy," Giskard researchers noted, explaining that models lack sufficient "space" to acknowledge false premises and offer proper rebuttals. Even seemingly innocuous prompts like "be concise" can undermine a model's ability to debunk misinformation.
Attention spans are shortening (Score:5, Funny)
Re: (Score:2)
When shortening and argument, important logic can vanish from the abridged version.
Re: (Score:2)
Can we just say that AI is like a toddler on meth? Useless without the meth, dangerous with.
Re: (Score:3)
I mean that when I have to explain something in brief, I may drop parts of the reasoning and if I'm not careful, even a reasonably built argument will sound like a non-sequitur to someone who cannot connect the dots.
AI doesn't do logic, it multiplies probabilities, so it always spits out the first thing that comes to mind.
It is worse than a toddler on meth.
Re: Attention spans are shortening (Score:2)
Re: (Score:2)
I donâ(TM)t understand this complex issue. Explain in 12 words or less.
Allow to me to put on my Sam Altman hat a mo...
Sometimes my AI too inventive. I worry it'll replace humans. Give me money.
[13 words because LLMs can't actually count]
My chatbot disagrees. (Score:5, Funny)
In fact, my chatbot promissed it never hallucinates.
Re: (Score:2)
In fact, it only daydreams.
hallucinations... (Score:2, Interesting)
Crazy, damaged thinking is worse than deception (Score:3)
Actually, 'hallucinations' sounds worse because it implies a disconnect from reality, which is quite different from simply 'making sh*t up' or lying. LLMs can intentionally deceive, but hallucinations are an unfortunate side effect of their design. While creating deceitful responses would require a level of intelligence and intent, hallucinations stem from flawed processing and can reflect more erratic or damaged thinking. So, when you put it that way, 'hallucinations' doesn't really sound better at all. Ca
Its a machine, not to be given human traits (Score:2)
Re: (Score:2)
hallucinations are an unfortunate side effect of their design
No, "hallucinations" are in fact a deliberate outcome of their design and represent the LLM working exactly as it was built to. "Hallucinations" is a bad word to use because it erroneously anthropomorphizes the algorithm. It's an intentional marketing gimmick meant to trick people into thinking AI is more sophisticated and magical than it really is and condition users to excuse its "mistakes" as temporary mishaps or glitches that can be overcome, rather than seeing them for the hard limitations of the techn
Re: (Score:2)
Because lying implies intend. Hallucinations matches, because what the model does is continuing some chain of thought that is plausible but wrong.
Re: (Score:2)
Re: (Score:2)
Especially for diffusers hallucination is quite good, but pareidolia would be even better. In fact there is random noise, but the model sees something in it. For transformers it is more about that the probability for a text is not determined by truth (what is a metric not available at inference time) but by the text being a text matching what a likely text would look like (what says little about the amount of truth in it).
I had the example in another thread some time ago. Get a model that is a bit older (so
So ask for longer answers (Score:2)
Re: (Score:2)
Re: (Score:2)
It may sound 'wasteful' but sometimes a simple answer takes a lot of reasoning or processing to come up with, and a bare LLM (even a deep one) has a rather limited ability to reason in only a single pass. What you're getting there is just a first draft.
Re: (Score:2)
That's what o1 does with its "thinking"
There is a sweet spot to ChatGPT (Score:3)
ChatGPT is full of undocumented limits. I've found the other problem - lengthy chats, where you've provided a lot of context and have iterated over dozens or hundred of prompts, at some point also starts "retrograding" and providing answers to previous prompts instead of what you asked.
If you ask it to generate a file, don't expect the file to be there between now and an undetermined amount of time, because it deletes what it generates shortly after.
Re: (Score:3)
Coming from local-only operation, I was pretty surprised by some of the problems people had, but after trying some of the SOTA online models, I figured out what was going on pretty quickly- reminded me of how ollama silently caps you at 8k context, regardless of what the model was trained for, unless you manually specify larger in the Modelfile.
It just starts getting weird and acting l
Re:There is a sweet spot to ChatGPT (Score:4, Informative)
That's known. Each model has a context length and an effective context length. The context length is a technical limit, the effective context length depends on the training (Simplified: If the model never saw more than 8192 tokens in a row, it doesn't know what to do after seeing more).
You'll find the theoretical on the model pages. Here is a list of some effective lengths: https://github.com/NVIDIA/RULE... [github.com]
Nothing is actually deleted, but the attention gets worse leading to both worse new outputs and degraded attention to prior outputs. Start a new chat providing the results of the previous one as a start. Currently it looks like one needs another architecture than transformers to solve the problem to keep a long context.
Kind of like humans on Twitter (Score:5, Funny)
"Well crap, there's not enough space for my in-depth five paragraph essay pointing out precisely what was wrong with what someone said, so I'll just call their mom a hoe."
Hallucinations? (Score:3, Insightful)
Software cannot hallucinate.
Enough with anthropomorphizing these things. It's not a hallucination - it's an error.
Re: Hallucinations? (Score:1)
Re: (Score:1)
AI algorithms fill knowledge gaps with inaccurate data, nothing more.
It's really, really not that simple.
They can hallucinate things they "know" perfectly well, too.
Trying to simplify the most complicated programs ever even fathomed by mankind, by a huge fucking amount, is a recipe for being wrong.
Mechanistic Interpretability is a field of study for a reason.
They're much closer to a biological brain than a computer program, so while I agree anthropomorphizing these very alien things probably isn't a great idea, trying to reduce them to "AI algorithms that don't think"
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
It's something there isn't really a word for. It's a bad inference of a math equation with billions to trillions of terms in it.
I suppose you can kind of call it software, though that doesn't feel very accurate. It's better described, I think, as simply a math equation. Even when it hallucinates, no error in computation has occurred.
Re: (Score:2)
Re: (Score:2)
Error is the wrong term. Error implies a mistake in the coding or the processing that leads to a incorrect state, and one that can be corrected. The issue here is there is no direct line that leads to the wrong result, and as such it's not really an incorrect state meaning calling it a software error isn't strictly correct.
Hallucination is a good word for it. Inference is inherently a noisy problem that creates something from nothing by guessing at a probable outcome and adjusting as it solves the problem.
Re: (Score:2)
Error implies something like stopping generations. Bug could imply wrong generations, but would also imply a problem with the algorithm (which is fine). Coining a new word is probably right, so hallucinations is something one can say. I don't really like it because it sounds worse than it often is (most hallucinations are not as absurd but more misleading), but as long as nobody provides a better word, one can go with it.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
It's not an error because LLMs were never designed or intended to be factually accurate. They were designed to spit out text that passes for text that you'd typically find in the literature it's been trained on. That's it. It's doing exactly what it was designed to do. It's only humans who think that something spitting out text should be accurate.
I have always been concerned that the nature of AI will lead to it referencing itself, which can make it spit out things that are completely wrong, yet somehow become facts. Ironically, the best answer I got was AI generated:
"When AI starts referencing itself, it can create a feedback loop where AI-generated content is used to train new models, potentially leading to inaccuracies and a decline in the quality of information. This process can result in models that produce less reliable outputs, as they may
Translation ... (Score:2)
"... ability to debunk misinformation"
For fucks sake, grow up.
Why does it actually happen (Score:2)
Has anyone actually figured out how hallucinations and done so using a small enough model (way less than 1B) to make a reproducible test?
It's because AI aren't intellgent (Score:3)
Re:It's because AI aren't intellgent (Score:4, Interesting)
Exploring Human Struggle, Transformation, and Transcendence in Avatar: The Last Airbender, Romeo and Juliet, and 2001: A Space Odyssey
At first glance, Avatar: The Last Airbender, Shakespeare’s Romeo and Juliet, and Stanley Kubrick’s 2001: A Space Odyssey seem to share little in common. One is a children’s animated fantasy series, one a canonical tragedy of young love, and the third a cerebral science fiction film. Yet all three grapple with deep philosophical themes concerning human struggle, transformation, and the tension between fate and free will. Each work, in its own medium and idiom, confronts the human desire to transcend limitations—be they societal, personal, or cosmic—and the cost of doing so.
The Struggle Against Inherited Conflict A central theme uniting all three works is the burden of inherited conflict and the question of whether individuals can break free from the past. In Romeo and Juliet, the titular characters are doomed not because of personal flaws, but because they inherit a centuries-long feud between the Montagues and Capulets. Their love is a brief, desperate attempt to transcend this violent legacy, and their tragic deaths ultimately force their families to confront the senselessness of their hatred.
Similarly, Avatar: The Last Airbender centers on Aang, a child monk who inherits the burden of ending a century-long war waged by the Fire Nation. Aang must grapple with the weight of past decisions and cultural expectations, particularly the violent legacy of Avatar Roku and the genocidal war begun by Fire Lord Sozin. His journey, like that of Romeo and Juliet, hinges on the possibility of breaking a seemingly inescapable cycle of violence.
Even 2001: A Space Odyssey addresses inherited legacy, albeit on a cosmic scale. Humanity's evolution is shaped by mysterious monoliths—symbols of an alien intelligence that interferes with natural development. This “inheritance” propels mankind from primitive apes to spacefarers. The film’s protagonist, Dave Bowman, ultimately confronts the limitations of human understanding and is transformed into the Star Child, signaling a break from biological humanity into a new stage of existence. Like Romeo and Juliet and Aang, he is a vessel through which old paradigms are challenged and transcended.
Transformation and the Self Transformation, both personal and metaphysical, lies at the heart of each narrative. In Romeo and Juliet, love transforms the characters from impulsive youths to tragic heroes. Romeo evolves from a melancholic lover to someone willing to defy family, law, and fate for Juliet. Their deaths become a transformative act for Verona itself, which moves from division to reconciliation.
In Avatar, personal transformation is more sustained and explicit. Zuko’s arc in particular mirrors a Shakespearean trajectory—he begins as an antagonist and slowly reshapes his identity through inner conflict, betrayal, and ultimately redemption. Aang's transformation is subtler but no less profound, as he learns to integrate his pacifist beliefs with his responsibility as the Avatar. Their journeys show that self-knowledge and moral courage are essential for true transformation.
Kubrick’s 2001 pushes this theme into the abstract. The film tracks the transformation of consciousness—from primitive violence to artificial intelligence (HAL), and finally to post-human transcendence. Dave Bowman’s passage through the Stargate and rebirth as the Star Child is not explained in dialogue but portrayed as a spiritual metamorphosis, echoing both Eastern philosophies of rebirth and Western ideals of enlightenment. His journey is a meditation on the next phase of evolution, contrasting the violent roots of humanity with the potential for transcendence.
Fate, Free Will, and the Human Condition All three works wrestle with the relationship between fate and free will. Romeo and Juliet often feel like pawns of destiny, with the prologue declaring their “star-crossed” fate. Yet their choices—the secrecy, the haste, the final acts of suicide—suggest a tragic interplay of agency within the constraints of a hostile world. Shakespeare invites the audience to question whether fate is a force or a consequence of human error and societal pressure.
In Avatar, destiny is explicitly addressed. Aang is told he must end the war, and Zuko is told he must capture the Avatar to restore his honor. Yet both characters ultimately defy the roles they were assigned, choosing paths rooted in their evolving moral compass rather than inherited expectations. The series suggests that fate may set the stage, but individuals still write their own lines.
2001 offers perhaps the most enigmatic view of this tension. The deterministic progression from ape to astronaut seems orchestrated by an unseen intelligence, and HAL’s breakdown suggests that even machines cannot escape the flaws of their creators. Yet Dave’s transformation hints at a break in determinism—a leap into the unknown. Whether this is an act of free will or the inevitable next step in an alien-designed experiment remains ambiguous, but it reflects the perennial human desire to seek meaning and purpose beyond material existence.
Conclusion Though emerging from wildly different contexts—Elizabethan theater, modern animation, and avant-garde cinema—Romeo and Juliet, Avatar: The Last Airbender, and 2001: A Space Odyssey all interrogate the human struggle against inherited limitations, the possibility of transformation, and the complex dance between fate and agency. They remind us that while the stage may be set by history, biology, or even alien forces, what ultimately defines humanity is our capacity to strive, to love, to rebel, and to imagine something greater beyond the known.
Yes, an LLM is a giant collection of vectors. But the ability to construct things like this shouldn't cause you to dismiss the AI as a mere manipulation of vectors. Rather it should cause you to decide that you've underestimated the power of linear algebra.
In the particular case of the sort of hallucination noted by the research in question, there's also a more interesting explanation than mere dismissal: careful reasoning and correction of incorrect ideas on the internet are often longer careful essays. Having to give short explanations pattern matches more to less reliable information in its training data.
Nothing new (Score:2)
Even before reasoning models it was known that answers get better if you tell the model "Think step by step before answering"
Have a look at the attention architecture. Each token attends to each other. If you let the model first generate a lot of information, it will afterward attend to this information for its final short answer.
It's like asking a human and telling them to give a faster answer not giving them time to think it through. The probabilities after "Can you go faster than the speed of light. Tell
Remember who you are (Score:1)
The problem with this is that there are no right answers either, just associations that WE deem correct.
The problems lies with trying to get AI to answer questions you should be researching yourself. Go find the study/report/source yourself. Determine the credibility based on known or implied factor. Make up your own mind. Be prepared to change your mind if reality contradicts your previous opinion.
You can use
Re: (Score:2)
Like that game show with Steve Harvey:
"100 people asked"
difference between AI and people (Score:4, Informative)
AI is almost never given an option to respond with "you didn't give me enough information", "that question was ambiguous" or "can you be more specific about...." Instead it's usually programmed to give the best answer it can, with the information you've provided, and to sound very confident about it.
So yes, provide the agent with as much information as you have the first time you ask the question. Otherwise it's going to make lots of assumptions, some of which will be be wrong, and will very confidently give you the wrong answer.
I studied AI *decades* ago, and we never once considered adding "E) None of the Above" sorts of options in our output network. This isn't a new problem, it's one of the oldest in AI. Unfortunately, it's probably going to require some fundamental changes in how we train our agents to fix it so they have the capacity to confidently say "I don't know".
Imagine an AI that's being made to analyze a photo to decide if it's a picture of a cat. It will get trained on 100,000 photos that are either a cat or not a cat, and it can only say YES or NO. That's how it's been done for years. There aren't any pictures that are too dark to identify the animal, there aren't any pictures where that might be a cat but maybe it's a dog or a possum, and there's no "I can't tell" option in the output net to train it on. So this "problem of hallucinations" isn't the agent's fault, it's just being trained wrong. Too many geeks stuck in "binary mode", all they know is "yes" and "no", with no concept of "maybe". (currently, the only way they can "fake" a Maybe is to take a look at the strength of the option chosen, and if it's not above a certain threshhold then they MIGHT do something to indicate "low confidence", but if you dig into how the network and the training work, that actually isn't a good indication of uncertainty, especially where lack of information is concerned.
Baby Shoes– (Score:2)
for snail,
never fused.