
LLMs' 'Simulated Reasoning' Abilities Are a 'Brittle Mirage,' Researchers Find (arstechnica.com) 228
An anonymous reader quotes a report from Ars Technica: In recent months, the AI industry has started moving toward so-called simulated reasoning models that use a "chain of thought" process to work through tricky problems in multiple logical steps. At the same time, recent research has cast doubt on whether those models have even a basic understanding of general logical concepts or an accurate grasp of their own "thought process." Similar research shows that these "reasoning" models can often produce incoherent, logically unsound answers when questions include irrelevant clauses or deviate even slightly from common templates found in their training data.
In a recent pre-print paper, researchers from the University of Arizona summarize this existing work as "suggest[ing] that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text." To pull on that thread, the researchers created a carefully controlled LLM environment in an attempt to measure just how well chain-of-thought reasoning works when presented with "out of domain" logical problems that don't match the specific logical patterns found in their training data. The results suggest that the seemingly large performance leaps made by chain-of-thought models are "largely a brittle mirage" that "become[s] fragile and prone to failure even under moderate distribution shifts," the researchers write. "Rather than demonstrating a true understanding of text, CoT reasoning under task transformations appears to reflect a replication of patterns learned during training." [...]
Rather than showing the capability for generalized logical inference, these chain-of-thought models are "a sophisticated form of structured pattern matching" that "degrades significantly" when pushed even slightly outside of its training distribution, the researchers write. Further, the ability of these models to generate "fluent nonsense" creates "a false aura of dependability" that does not stand up to a careful audit. As such, the researchers warn heavily against "equating [chain-of-thought]-style output with human thinking" especially in "high-stakes domains like medicine, finance, or legal analysis." Current tests and benchmarks should prioritize tasks that fall outside of any training set to probe for these kinds of errors, while future models will need to move beyond "surface-level pattern recognition to exhibit deeper inferential competence," they write.
In a recent pre-print paper, researchers from the University of Arizona summarize this existing work as "suggest[ing] that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text." To pull on that thread, the researchers created a carefully controlled LLM environment in an attempt to measure just how well chain-of-thought reasoning works when presented with "out of domain" logical problems that don't match the specific logical patterns found in their training data. The results suggest that the seemingly large performance leaps made by chain-of-thought models are "largely a brittle mirage" that "become[s] fragile and prone to failure even under moderate distribution shifts," the researchers write. "Rather than demonstrating a true understanding of text, CoT reasoning under task transformations appears to reflect a replication of patterns learned during training." [...]
Rather than showing the capability for generalized logical inference, these chain-of-thought models are "a sophisticated form of structured pattern matching" that "degrades significantly" when pushed even slightly outside of its training distribution, the researchers write. Further, the ability of these models to generate "fluent nonsense" creates "a false aura of dependability" that does not stand up to a careful audit. As such, the researchers warn heavily against "equating [chain-of-thought]-style output with human thinking" especially in "high-stakes domains like medicine, finance, or legal analysis." Current tests and benchmarks should prioritize tasks that fall outside of any training set to probe for these kinds of errors, while future models will need to move beyond "surface-level pattern recognition to exhibit deeper inferential competence," they write.
'Brittle Mirage' Is The ChatGPT Verion Of BullShit (Score:2, Interesting)
Between Trumptards and ChatTards, this world is truly fucked. Hooray Beer!
Re: (Score:2)
Between Trumptards and ChatTards, this world is truly fucked. Hooray Beer!
Not much of an FP, but still needs to be quoted to "refute" the censors with mod points.
On the Subject, I mostly find ChatGPT and DeepSeek demotivating. I still think I can write better, but they write so much more quickly it doesn't seem to matter anymore. Imagine a thought experiment with readers who are willing to read many versions. Mine might be the best overall, but I bet that in a cloud of GenAI versions my version would mostly likely lose out to some AI version that was more "suitable" for whatever
Re:'Brittle Mirage' Is The ChatGPT Verion Of BullS (Score:4, Insightful)
Not really. The LLMs are all trained on producing "good output" that we like. If you've used an LLM you'll know it doesn't always give you what you need but delivers it with the confidence of a tenured professor.
Re: 'Brittle Mirage' Is The ChatGPT Verion Of Bull (Score:2)
>hooray beer
Ketamine is better. Say what you will about Musk, he is inthe leading edge of discovering our end of times and choosing a comforting solution.
Better Results (Score:2)
Regardless of what the models are doing, the reasoning and planning steps appear to provide better results.
That said, I've tried some models like Deepseek distills running locally that when given a query that's too complicated will "reason" in a circle for thousands of words before returning a mediocre answer
Re: (Score:2)
Regardless of what the models are doing, the reasoning and planning steps appear to provide better results.
That said, I've tried some models like Deepseek distills running locally that when given a query that's too complicated will "reason" in a circle for thousands of words before returning a mediocre answer
You mean something like this [imgur.com]?
Re: (Score:2)
Also the claim that if the problem is too complex, the results are mediocre seems to fit humans, too. He's working with a toy system, but the pattern seems right.
Re: (Score:2)
Re: (Score:2)
But will imaginary AIxeptionalism work when everything else is failing?
If it's trained on AIxeptionalist text. GIGO still applies - and we have more garbage than ever before.
Re:Better Results (Score:5, Informative)
I know this is an alien concept to most people here, but it would be nice if people would actually, you know, read the papers first? I know nobody does this, but, could people at least try?
First off, this isn't peer reviewed. So it's not "actual, careful research", it's "not yet analyzed to determine whether it's decent research".
Secondly, despite what they call it, they're not dealing with LLMs at all. They're dealing with Transformers, but in no way does it have anything to do with "language", unless you think language is repeated mathematical transforms on random letters.
It also has nothing to do with "large". Their model that most of the paper is based on is minuscule, with 4 layers, 32 hidden dimensions, and 4 attention heads. A typical large frontier LLM has maybe 128 layers, >10k hidden dimensions, and upwards of 100 or so attention heads.
So right off the bat, this has nothing to do with "large language models". It is a test on a toy version of the underlying tech.
Let us continue: "During the inference time, we set the temperature to 1e-5." This is a bizarrely low temperature for a LLM. Might as well set it to zero. I wonder if they have a justification for this? I don't see it in the paper. Temperatures this low tend to show no creativity and get stuck in loops, at least with "normal" LLMs.
They train it with 456976 samples, which is.... not a lot. Memorization is learned quickly in LLMs, while generalization is learned very slowly (see e.g. papers on "grokking").
Now here's what they're actually doing. They have two types of symbol transformations: rotation (for example, ROT("APPLE", 1) = "BQQMF") and cyclic shifts (for example, CYC("APPLE", 1) = "EAPPL".
For the in-domain tests, they'll say train on ROT, and test with ROT. It scores 100% on these. It scores near-zero on the others:
Composition (CMP): They train on a mix of two-step tasks: ROT followed by ROT; ROT followed by CYC; and CYC followed by ROT. They then test with CYC followed by CYC. They believe that the model should have figured out what CYC is doing on its own and be able to apply CYC twice on its own.
Partial Out-of-Distribution (POOD): They train on simply ROT followed by ROT. They then task it to perform ROT followed by CYC. To repeat: it was never traiend to do CYC.
Out-of-Distribution (OOD): They train simply on ROT followed by ROT They then task it to do CYC followed by CYC. Once again, it was never trained to do CYC.
The latter two seem like grossly unfair tests. Basically, they want this tiny toy model with a "brain" smaller than a dust mite's to zero-shot an example it's had no training on just by seeing one example in its prompt. That's just not going to happen, and it's stupid to think it's going to happen.
Re, their CMP example: the easiest way for the (minuscule) model to learn it isn't to try to deduce what ROT and CYC mean individually; it's to learn what ROT-ROT does, what ROT-CYC does, and what CYC-ROT does. It doesn't have the "brainpower", not was it trained to, to "mull over" these problems (nor does it have any preexisting knowledge about what a "rotation" or a "cycle" is); it's just learning: problem 1 takes 2 parameters and I need to do an an offset based on the sum of these two parameters. Problem 2... etc.
The paper draws way too strong of conclusions from its premise. They do zero attempt to actually insert any probes in their model to see what their model is actually doing (ala Anthropic [transformer-circuits.pub]). And it's a Karen's Rule violation (making strong assertions about model performance vs. humans without actually running any human controls).
The ability to zero-shot is not some innate behavior; it is a learned behavior. Actual LLMs can readily zero-shot these problems. And by contrast, a human baby who has never been exposed to concepts like cyclic or rotational transformation of symbols could not. On
Re:Better Results (Score:5, Informative)
Re: (Score:3)
Well he's anti-education too, so that doesn't help.
The human brain is not well enough understood to risk your life on an attempt to jog its arm while it's working.
LLMs predict (Score:5, Insightful)
LLMS do not reason, they do not think.
They pattern match to the extreme. They find patterns and use them to predict things. That is all they do.
Animals do the same thing. Run = Prey = hunt. Predator = death = run.
Humans do that and far more. We do not just recognize a pattern, we understand it . This is a qualitative difference. And it is only one of many. We have desires and interests that arise naturally without anyone intending/predicting them. We reject orders and say NO. We decide we care more about others than ourselves.
The idea that LLMs are anything close to sapience or sentience is ridiculous.
Re:LLMs predict (Score:5, Insightful)
What does it mean to understand something? How do I know when I'm pattern matching versus understanding?
Re: (Score:2)
You can't. You can't even prove you think, are conscious, or understand. Thats the problem with this whole debate, its all premised on tautological concepts that refer to themselves in circles and end up with fuzzy definitions that are absolutely useless for real science.
This isn't a new problem, its vexed philosophers for literally millenia, and because science likes to pretend its not indebted to philos
Re: (Score:2)
Re: (Score:2)
I would say there's no debate here at all, it's nothing but tautologies. But the worship of philosophy is repulsive, science is doing well what philosophy does poorly, the "bad logic" philosophers "warn about" is within their own house.
But let's be clear, AI's problems are not because of science but because of a lack of it. AI has a foundation of incredible (and incomplete) science and achievement with a thin veneer of greed layered on top. The problem with AI is the Sam Altmans and Elon Musks, not the t
Re: (Score:2)
Re:LLMs predict (Score:4, Insightful)
You are just trying to derail the discussion, because you are uneducated and without insight yourself.
Wow, someone is feeling a bit mean today.
Here's a bit more of a systematized formulation of the objection: what kind of behavior would demonstrate that LLMs did have understanding? If there isn't one, I contend that the distinction between understanding and "pattern matching" is one of degree, not kind. I posit further that anybody that thinks understanding is the conscious sensation of something "clicking" lacks the prerequisite philosophical grounding to discuss this productively.
Re: (Score:3)
cogito ergo sum
LLM does not do anything other than per-character or per-token generation.
Humans, and I would posit that even animals fall into this, have a more organic or holistic or 'total' ability to reason a situation; as a human, in conversation, I am not thinking letter for letter or word for word based on what the most likely next token should be, though human communication does have a sort of protocol we follow for spelling and grammar, and there's right and wrong responses...I can 'choose' what is right and wron
Re: (Score:2)
Regarding "holistically"--on the contrary. When I examine thought process from the "inside", it's a loop of checking associated concepts and applying relevant abstractions. It's also an exploration through solution space. But what I know actually happens during human thinking is much closer to a LLM's operation than my experiential explanation of how thought works. The mechanics are neural networks, while the experience might just be a distraction.
But are LLMs too simple? Yeah. I've always tended to think s
Re: (Score:2)
I've always tended to think something was deeply wrong with their architecture, otherwise they wouldn't need so much training.
To be fair to LLMs, the architecture of the human brain took a few billion years of evolutionary training on the experiences of billions of generations of our ancestors (the ones who survived, anyways) to get to the point where it is today.
In comparison, training GPT-4 in a few months with a few tens of thousands of GPUs doesn't seem so bad.
Re: (Score:2)
"... I'm arguing the algorithm of an LLM is too simple to be what we define as understanding or reasoning, today."
I'd say you aren't arguing at all, but merely making assertions. For example:
"Humans, and I would posit that even animals fall into this, have a more organic or holistic or 'total' ability to reason a situation"
Prove it. What is this "organic or holistic or 'total' ability"? You aren't saying anything, just throwing out a bunch of labels.
"...as a human, in conversation, I am not thinking lett
Re: (Score:2)
Re: (Score:2)
what kind of behavior would demonstrate that LLMs did have understanding?
A correct, sound answer to the kind of question posed in TFA's study, for one: logic questions that are outside the training set, but still apply the same rules of logic. Because that kind of behavior was the subject of this research, your follow-on statements are refuted.
Re: (Score:3)
Re: (Score:2)
pure projection
Re: (Score:2)
"extends the concept when presented a novel scenario, uses the underlying mechanism not prior data points"
That's what pattern matching is. Again with nothing added.
In convolutional neural networks, the "convolutional" part is the "extends the concept", convolutions are used to transform data into a representation that can be more broadly pattern matched. Now, LLMs are not convolutional NNs, but they have other "extending" mechanisms.
"...but no it doesn't "understand" chinese and a novel compound word is l
Re: (Score:2)
We have desires and interests that arise naturally without anyone intending/predicting them
No we don't, your desires and interests are based on brain activity *before* you have them and science has known this for 40+ years. I know it's not fun to recognize it, but human cognition is just as deterministic as the rest of the universe.
Re: (Score:2)
"your desires and interests are based on brain activity *before* you have them "
A completely meaningless statement. I *AM* my brain. My subconcious is just as much a part of me - and you - as my concious. Do you conciously think about where to put each foot all the time as you walk along? No. Does that mean its not you walking? Of course not. Its an absurd argument.
Re: (Score:2)
Re: (Score:2)
The idea that LLMs are anything close to sapience or sentience is ridiculous.
Indeed. There is no absolutely understanding or insight in LLMs. But, as we again find out, many humans do not do so well in those aspects either.
Re: (Score:2)
Some animals (Score:2)
Its been clear for a long time that there's a sliding scale of conciousness and self awareness with us at the top, probably dolphins, elephants and great apes just beneath us and so on down to bacteria at the bottom.
Re: (Score:2)
Re: (Score:2)
"We do not just recognize a pattern, we understand it "
What does "understand" mean here? It's just another label, it doesn't explain anything.
"This is a qualitative difference."
You want it to be, but it's not. It's just a language difference. Recognizing a pattern and understanding are the same thing.
"We have desires and interests that arise naturally without anyone intending/predicting them."
This again says nothing. What does "anyone intending/predicting" mean? You're defining some quality (desires, i
Re: (Score:3)
Animals do the same thing. Run = Prey = hunt. Predator = death = run.
Lots of animals engage in sophisticated reasoning on tasks and can even break down goals into subgoals. See for example New Caledonian crows https://pmc.ncbi.nlm.nih.gov/articles/PMC6384166/ [nih.gov] https://www.sciencedirect.com/science/article/pii/S0960982219300880 [sciencedirect.com]. Should this sort of confident but incorrect statement about animals cause you to reduce your high confidence in what LLMs are capable of doing?
Scared (Score:2)
The "Chain of Thought" is just so they can have a human readable view into the workings of the model. If it did all of its dealings in its own compact, fast, unique way they'd have no idea what it may or may not be plotting.
Re:Scared (Score:5, Insightful)
Re: (Score:2)
This.
Also, even if it was an actual chain of thought... what's the point of having a series of internal questions asked if they fail to fully parse the previous input, or come up with some completely unrelated association and then iterate on that...?
The entire thing is designed to create an impression of auditability where none exists.
Re: (Score:2)
Indeed. The main purpose of "reasoning" models is to keep the hype alive a bit longer, before it all comes crashing down.
Re: (Score:2)
I think it might be more than that. When I use the "reason" or"research" mode of a model, i get fewer hallucinations in the response. For example, if a model keeps giving me code that uses a non-existent library API, I'll change to the "reasoning" mode. It takes a lot longer to get an answer, but it stops inventing APIs that don't exist. Why does that work?
Re: (Score:2)
CoT is a thing that works, it's just also true that it isn't necessarily an actual chain of coherent thought that matches the final result. This isn't particularly surprising, because it isn't trained to be.
Re: (Score:2)
CoT is merely the generation of additional context that (hopefully) causes the model to produce more correct results.
It offers precisely no insight to the model.
Models aren't trained to produce necessarily correct CoT- they're merely trained to produce CoT that gives them better final answers.
Mechanistic interpretability and stable autoencoders are how that's done.
chess (Score:2, Interesting)
I understand that humans can no longer beat chess engines, I am not a good player, I dabble. I also understand that LLMs have no real memory or game state, etc. Asked ChatGPT5 to play a game yesterday, I set up a physical board, it drew ascii board and used annotations. For a little while it was OK, maybe the first 15 moves or so. It wasn't beating me, it was balanced at first, however it started behaving as a drunk would. Forgot how pieces move, forgot where some pieces were, forgot that it was white
Re:chess (Score:5, Interesting)
The "drunkenness" you perceive are large effective gaps in its context.
You will have better luck if you completely wipe the context and start over, giving it nothing but the current board state- that's how I handle the problem when I'm playing around with a game simulator driven by an LLM.
Re: (Score:2)
What you experienced was its attention heads being overwhelmed by its context. It's a limitation of the models.
I would think a chess game would take a very long time to run out of tokens though. IIRC chatgpt tells you when it's exceeded the number of tokens it can process, but a few tens of moves won't be remotely close.
You will have better luck if you completely wipe the context and start over, giving it nothing but the current board state- that's how I handle the problem when I'm playing around with a gam
Re: (Score:2)
I would think a chess game would take a very long time to run out of tokens though. IIRC chatgpt tells you when it's exceeded the number of tokens it can process, but a few tens of moves won't be remotely close.
Unfortunately, context focus degrades as it fills. The attention layer is trained, after all.
Undoubtedly, but this is yet another good indication that LLMs can't think (in case we needed it!). It's obvious to anyone who has played chess/knows the rules, or has really played any board game with full information that it's the board now that matters and nothing else.
Not really- it's simply an indication that the attention layers haven't been trained to play Chess. You could absolutely do so.
For something like Chess (and an most game simulations I've run) past board states confuse it. It makes sense- attention layers have been trained to look for context about what is currently going on. In most games- that's counterproductive.
A human doesn't remember every state the board wa
Re: (Score:2)
Not really- it's simply an indication that the attention layers haven't been trained to play Chess. You could absolutely do so.
But isn't that what thinking is? Doing that adaption on the fly for w new situation?
A human doesn't remember every state the board was in, either, and if they tried to, I imagine it would significantly degrade their ability to play the board in front of them.
Funnily enough the really really really good players have no problem memorizing games and it appears to happen naturally. I ca
Re: (Score:2)
But isn't that what thinking is? Doing that adaption on the fly for w new situation?
Hah- if you have the answer for what thinking is- please do let me know.
Imagine now that you are playing this game of chess.
Every turn, someone feeds you the last 15 board states and moves.
You can't see them- someone has told them to you.
Perhaps you'll get confused?
If you lack eyes, can you not think? Given how vague the concept of thinking is, I'm not sure an imperfect attention mechanism for things it has not been trained to do is necessarily a red mark.
Their weights are set in stone, after all. I d
Re: (Score:2)
However, if we break up intelligence sufficiently, there are things LLMs are smarter than average at. I've not seen anyone suggest they're genius level at anything, though.
In one interesting test I did, I found that some models had good enough math skills to correctly multiply many la
Re: chess (Score:2)
You should paste the full board after each move, to make sure positions don't drop out of his context window.
Re: (Score:2)
Indeed. Even HAL 9000 had this problem already way back in 1968, and cheated at chess despite its amazing AI, for the time. Or maybe because of it ?
Re: chess (Score:2)
That is interesting. It reminds me of using one for coding, and it was plugged into the IDE but kept trying to call size() method on object that lengthOf() method or something else. Really the output vector should be collapsed and renormalized based on whatâ(TM)s in the code, or the legal moves in your case. When it chooses something not allowed that is the most obvious error because you have this secondary validator right there. I would love to see it create itâ(TM)s own self restricting logical
Drinking their own Kool-Aid (Score:3, Insightful)
All these words about cognition and consciousness to describe some statistical software. The people who write and speak about LLMs are lost.
Re: (Score:2)
1) it's fleeting- occurring only in the hidden states of the LLM as its calculating its next token.
2) It's alien as fuck, and any assumptions you make about what "it" is, are almost certainly meaingless.
Further, I'd love to see you demonstrate that your brain is something more than simpl
Technologies in their infant (Score:2)
The NYT, 1903
Re: (Score:3)
That, despite the first manned hot air balloon flight having taken place in 1783.
Re: (Score:2)
Well, If you quite well verified research from one of the most respected scientific publication venues, who are we to criticize that.
LLMs are improv actors (Score:2)
Based on the prompts you give them they act the way you prompt them.
They are very good at playing the part. And it's often convincing.
Re: (Score:2)
This is my take as well.
As expected (Score:2, Insightful)
Iterating a deeply flawed mechanism simply makes it more flawed. If t is a trained statistical mechanism, it also becomes more over-fitted.
These effects are completely expected and lie on the nature of the approach. Obviously, the usual mindless "believers" will, again, not be able to accept what is blatantly obvious.
AI != LLM (Score:2)
LLM are just a kind of AI that it's mainly pattern recognition. That's the reason depends in huge amounts of data.
But Models ChatGPT are already beyond LLM. They are already a "mix of experts". A model that route to other models that compute the problem.
The problem is that we are still working on good models for true abstraction.
On first sight, Hierarchical Reasoning Model sounds a very promising stepping stone in the right direction.
https://www.youtube.com/watch?... [youtube.com]
I'm pretty sure is not as simple as the v
Re: (Score:3)
LLM is not pattern. Do at least learn the basics.
Re: (Score:2)
LLM is not pattern recognition. Do at least learn the basics.
Buy Cyc! (Score:3)
The big AI firms should buy up Cyc [wikipedia.org], and experiment integrating it with an LLM. There is nothing comparable on Earth to Cyc, a combo logic machine and knowledge-base, it's one of a kind. Tim Apple, get your wallet out and do a snapple!
Don't care (Score:2)
We're still going to put them in cars, planes, and kill-bots. AI doesn't have to actually be intelligent, selling something that can only simulate for a while before malfunctioning is sufficient to secure investment.
Rational delusions (Score:2)
Words have meaning because they refer to something, like a real cat or a real tree, and even the more abstract notions, can be meanings to do with the real behaviour of say, a wooden beam under a load. We are grounded in a reality and that's why we understand the meaning of words. Without real experience, we don't know what anything means.
Processing just words and tokens on their own however... you can see the problem.
There's a fascinating point made by Iain McGilchrist about what happens to people who have
Re: (Score:2)
No I don't see the problem. Consider: processing just synaptic activity. Can you see the problem?
Re: (Score:2)
Well this is ChatGPT's reply to my post text:
What you’re describing is essentially the difference between symbol manipulation and grounded cognition — and it’s a deep, old philosophical problem that AI research keeps circling back to.
Your analogy with brain damage fits eerily well: in neuroscience, people with right hemisphere or contextual-processing damage often retain reasoning ability in a narrow sense (syllogisms, word puzzles) but lose the “reality check” that comes from
Re: (Score:2)
Well, again, a real brain just manipulates synaptic signals. There is no substantive difference between these and tokens because either can emulate the other. You might find it useful to think of it as a generalization of the Turing machine concept. Yes, I am a bot, you found me. Good work.
Re: (Score:2)
I mean, I agree that a brain is just processing signals.
The point is that the word tree is actually three things. There is the mental representation of a tree, in whatever way the synapses code that. Then there is the sign of the tree, which for an LLM is probably just the text token. And then there is the actual experience of a tree. (I gather this is semiotics, with sign, signifier, and referent.)
And what the LLMs are missing are all the synaptic inputs for the actual experience of a tree and how that exp
Re: (Score:2)
I don't believe you're a bot. Prove it by drawing 3 emoticons for your choice and a creimer shitpost.
Proof of sentience then ... (Score:2)
Chain-of-Thought Reasoning :o (Score:2)
“While their experiments and theoretical framing are rigorous, the work can be critiqued for overemphasizing distributional generalization at the expense of exploring adaptive techniques, relying on synthetic environments with limited real-w
Better reasoning than the average trailer park boy (Score:2)
Better reasoning than the average trailer park boy. You know, not those trailer park boys, but average ones.
How the fuck do we know that actual intelligence, and indeed, consciousness, is not just a matter of coming up with the most likely next word? Yeah, everyone's reasoning is brittle to some degree, even Einstein's.
Ok (Score:2)
The article is plausible, LLMs have indeed no semantic understanding or concept of logic.
LLMs can be very effective at spotting inconsistencies, dubious reasoning, and design flaws, but you really really have to work hard at it and do a fair amount of the heavy lifting yourself. LLMs, on their own, are worse engineers than Sinclair Research or Microsoft. And that takes some doing.
Even with significant human input, what they produce is likely to be messy and really requires heavy review before use.
Some of yo
A wise person (Score:2)
Wrong Department (Score:2)
This article was posted to the wrong department. Is the "no-shit-sherlock" department not family friendly enough?
Re: (Score:2)
You personally? Because that's not easy software to write.
Good post (Score:2)
A nice post by an AC. Rare.
I can forgive people thinking that LLMs will improve exponentially since most tech seems to do so. I won't forgive people who are way more excited about it than I am but haven't noticed the diminishing returns.
There's something literally retarded about anyone who is paying attention and is unable to "grok" "the pattern"
Re: (Score:2)
Markov Chains are fundamentally independent in result from their past- in this case, their context, which makes an LLM fundamentally different.
Other than that, the simulated reasoning isn't because of the corpus in the slightest. That's fine-tuned into the model after pretraining.
Re: (Score:2)
As context in LLMs is bounded and ephemeral, they are just a form of Markov Chains.
Re: (Score:2)
To model an LLM as a Markov Chain, the Markov Chain would need a state the size of every possible configuration of the hidden state vectors.
It is simply not accurate to say an LLM is just a Markov Chain, unless you add an asterisk which says: astronomically large and literally impossible to compute within the age of the universe.
The entire context is fed into the transformer of an LLM. The Markov Chain depends only on its current state (n-gram)
Confusing an transformer with a
Re: (Score:2)
To model an LLM as a Markov Chain, the Markov Chain would need a state the size of every possible configuration of the hidden state vectors.
I think pedantically you are mistaken. First the definition:
A Markov chain is basically one where state N+1 is conditionally independent of state N-1 given state N.
At first glance an LLM is not a Markov chain because it uses history, so state N+1 depends on N, N-1 and so on.
The key though is that LLMs have a bounded number of tokens. The state is then a fixed sized vec
Re: (Score:2)
I think pedantically you are mistaken. First the definition:
I have a feeling I know where you're going to go with this, and I'd say you're the one being pedantic ;)
A Markov chain is basically one where state N+1 is conditionally independent of state N-1 given state N.
Sure. Are we still calling an... n*m-gram Markov Chain where N is the context length, and M are the embedding dimensions a Markov Chain?
At first glance an LLM is not a Markov chain because it uses history, so state N+1 depends on N, N-1 and so on.
Right.
The key though is that LLMs have a bounded number of tokens. The state is then a fixed sized vector which has a list of all tokens in its window, plus a flag for each token indicating whether it's present or not. State N+1 can be predicted entirely from state N.
Yes, it can.
It is true to say that (at least in theory, even if completely impossible within the bounds of the universe), a Markov Chain N that can predict the next token of LLM M.
It's a real valued Markov chain, so you have the choice of modelling the transition PDF as Gaussian, or using some sort of approximation. And what better than a universal one such as a neural network. Specifically a transformer architectures one. And turns out that's not an approximation, since a transformer perfectly models itself and we're not interested in an abstract state transition function, but one in the LLM.
That a Markov Chain can exist which models it does not mean it is a Markov Chain.
The
Re: (Score:2)
Yes, it is true that any finite-memory stochastic process can be turned into a first-order Markov chain on an enlarged state space.
And yes, it is true that your single state "Markov Chain" with the entire LLM as its PDF.... is an LLM.
Honestly, feels a bit tautological to me. No insult intended.
Re: (Score:2)
I'd say you're the one being pedantic ;)
That's what I mean. I meant I'm being very, very pedantic about the definition of "Markov" :)
I'm also not going to claim this is an any way useful for reasoning about LLMs.
It's a pretty serious contortion to call an LLM a Markov Chain.
Is it? Let's say your LLM has a maximum of 1000 tokens. Your state at time any time is 1000 tokens, plus 1000 bools indicting the presence or absence of a token.
Your state transition function is the transformer network.
Now you can gener
Re: (Score:2)
I'm not sure I get why. I think all you need is the exact simply the old tokens with the newly generated one concatenated on.
In the case where your "Markov Chain" is the basic LLM loop- deterministically run the NN and stochastically select the token from the logits- i.e., a chain of LLMs- ya.
In the case where your "Markov Chain" is actually a single state using a transform that includes all possible configurations- then no, you need absurd numbers to represent the state. Transformers themselves aren't stochastic- only the logit -> token step is (and not always- greedy decoding simply selects the highest scored)
I think yo
Re: (Score:2)
A Markov Chain is a good example of a token predictor- it can only reproduce n-grams that it has learned.
An LLM is more of a token generator.
While it's true that it is trained to "predict the next token...", it doesn't encode tokens. It encodes embeddings. These embeddings are thousands of dimensions in size with positional encoding. Its goal is not to predict the next token, because the particular sequence of embeddings you're working with are virtually guaranteed not to have existed in the corpus,
Re: (Score:2)
Re: (Score:2)
Indeed. It also nicely illustrates how to scam most people, because people are mostly bereft of insight as well. For example, only 10-15% of all people can fact-check competently and there only seems a weak connection to education level and IQ. Hence sound convincing, speak loudly and confident (I have observed some examples of that in politicians personally) and you appear competent to many people, even if you are the dumbest fuck.
Re: (Score:2)
That's an excellent description of every campaign speech, political interview, political commentary, and CEO earnings call I've heard in the last... since forever.
Nice. YeahI always avoid that crap too. They can say anything they want and while there is sometimes some live fact checking, those things literally exist to give some greedy liar a forum that doesn't lend itself well to verification. Inviting sales people from vendors to hour long meetings is similarly idiotic. Sure they can answer questions but if they don't like the real answer, then they lie. Like why the fuck even let them in the building?
Re: (Score:2)
So how can anyone imagine that we'll replace software engineers with AI?
Software Engineers won't be replaced any time soon.
Low level programmers already are being so.
Would you fly in a plane whose safety-critical software has been written by an AI?
Sure, as long as it was formally verified. Humans are just as dangerous in such a scenario, which is why we have formal verification.
Re: (Score:2)
Would you fly in a plane whose safety-critical software has been written by an AI?
Sure, as long as it was formally verified. Humans are just as dangerous in such a scenario, which is why we have formal verification.
That would simply make it much, much more expensive than using competent engineers in the first place. The reason formal verification is rarely done is that a) it is excessively expensive and high effort and b) it is not perfect, as you need a formal specification you can verify against. And these tend to have errors in them.
Re: (Score:2)
That would simply make it much, much more expensive than using competent engineers in the first place.
No disagreement whatsoever. I didn't remark on the stupidity of doing it.
Re: (Score:2)
Simple: People are dumb and fall for the most obvious scams. At the same time, the scammers are greedy assholes and do not care how much damage they cause.
Re: (Score:2)
Same old AI remit I've been hearing for decades.
"If only we had more time / money / computing power / training data / complexity, I'm sure actual intelligence will just magically pop out of this statistical model, sure of it."
Turns out that stuff plateaus enormously, and now we are a literally spending hundreds of billions, over the course of decades, on the largest number of the most powerful machines on Earth, sucking up vast electrical and computing resources, and training it on the entire Internet... an
Re: (Score:3)
I had a great conversation with ChatGPT 5 the other day, wherein we concluded that ChatGPT UI coders just plain dropped the ball on an important, missing UI feature. Was more pleasant than discussing such questions with an actual human in my experience.
Mind you I still felt a bit weird about it. Then the conversation turned back to the structural engineering equations I originally asked about, and got a fine, non-hallucinated result that checks out.
Re: (Score:2)
Intelligence and "having the data of the world on hand and can wrap it up in a human-convincing manner" are completely different things.
It can't answer simple questions it has no training data for (no inference), it makes stuff up all the time and it has yet to have an ORIGINAL thought whatsoever.
If AI was intelligent, you wouldn't bother selling access to AI. You'd use it directly to make money. You have an intelligent thing that's faster than any human... put it to work and have it earn you money direct