Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
AI

Reasoning LLMs Deliver Value Today, So AGI Hype Doesn't Matter (simonwillison.net) 49

Simon Willison, commenting on the recent paper from Apple researchers that found state-of-the-art large language models face complete performance collapse beyond certain complexity thresholds: I thought this paper got way more attention than it warranted -- the title "The Illusion of Thinking" captured the attention of the "LLMs are over-hyped junk" crowd. I saw enough well-reasoned rebuttals that I didn't feel it worth digging into.

And now, notable LLM skeptic Gary Marcus has saved me some time by aggregating the best of those rebuttals together in one place!

[...] And therein lies my disagreement. I'm not interested in whether or not LLMs are the "road to AGI". I continue to care only about whether they have useful applications today, once you've understood their limitations.

Reasoning LLMs are a relatively new and interesting twist on the genre. They are demonstrably able to solve a whole bunch of problems that previous LLMs were unable to handle, hence why we've seen a rush of new models from OpenAI and Anthropic and Gemini and DeepSeek and Qwen and Mistral.

They get even more interesting when you combine them with tools.

They're already useful to me today, whether or not they can reliably solve the Tower of Hanoi or River Crossing puzzles.

Reasoning LLMs Deliver Value Today, So AGI Hype Doesn't Matter

Comments Filter:
  • "They get even more interesting when you combine them with tools."

    LLMs ARE tools. How long will it take for people to understand that LLMs are not magic, they are computer applications.

    • by Anonymous Coward

      "They get even more interesting when you combine them with tools."

      LLMs ARE tools. How long will it take for people to understand that LLMs are not magic, they are computer applications.

      Tools has a specific meaning in this context. But if it makes you happy, insert the word "other" in the obvious place.

    • by allo ( 1728082 )

      Look up what "tool calling" in LLMs is. And read some article about MCP.

      The sentence basically says "A Computer can print easier, if you install printer drivers".
      If you want to LLM to control something, you need to give it access to tools for controlling something.

  • I Disagree (Score:5, Insightful)

    by SlashbotAgent ( 6477336 ) on Thursday June 19, 2025 @01:44PM (#65461381)

    Your argument seems to be that, since the lies, politely referred to as AI hype, contain partial truths(LLMs do what you think they should) then the fact that they are lying does not matter.

    I disagree. It is my opinion that big fat whopper lies are being told continuously and few are being called to task about it, like they should. They are still lying and those lies definitely matter.

    The Emperor has no clothes. That he's wearing socks does change the first statement.

    • I continue to care only about whether they have useful applications today, once you've understood their limitations.

      The point being made, I think, is to ignore all the hype, propaganda, marketing and sales pitches and focus on whether the tool is useful to you.

      Make up your own mind and please don't tell me about it :)

      • No, the guy is telling you to ignore the criticisms and just use it. LLMs can be useful at things which are "fuzzy," where details like fact, accuracy, precision, repeatability, etc. aren't strict requirements. That's why they sometimes do well with image and language processing tasks, if given enough manual corrections and overrides.
    • Your argument seems to be that, since the lies

      Let me stop you there. The fact that some people use LLMs to generate text without thought and end up with lies / hallucinations / bullshit outputs has little to nothing to do with the concept of AIs and LLMs providing value. LLMs objectively do provide value, the value depends on how they are fed and implemented. We use LLMs at work to process natural language input and respond with only authorities references, effectively searching documents in a perfect form.

      Just some of the examples of LLM use cases whi

      • Let me stop you there. The fact that some people use LLMs to generate text without thought and end up with lies / hallucinations / bullshit outputs has little to nothing to do with the concept of AIs and LLMs providing value.

        You stopped yourself too soon. You should have read the whole sentence. Then you might have realized that the lies that I and the author are talking about are the marketing and sales lies, not the questionable output of the LLM itself.

      • Erm. How many trillions of dollars does it take to redo "Google" in your organization yourself, inhouse?

        The argument about the value of AI is about if it is superhuman, because only superhuman capabilities justify the fuckload of investor money being thrown into this technology. There is a bubble currently and it will burst, and in the meantime the bubble is causing unnecessary destruction in the economy, by reorganising companies with proven expertise into generic service providers of dubious quality.

    • by hey! ( 33014 )

      Well, yes -- the lies and the exaggerations are a problem. But even if you *discount* the lies and exaggerations, they're not *all of the problem*.

      I have no reason to believe this particular individual is a liar, so I'm inclined to entertain his argument as being offered in good faith. That doesn't mean I necessarily have to buy into it. I'm also allowed to have *degrees* of belief; while the gentleman has *a* point, that doesn't mean there aren't other points to make.

      That's where I am on his point. I th

  • by Alain Williams ( 2972 ) <addw@phcomp.co.uk> on Thursday June 19, 2025 @01:47PM (#65461395) Homepage

    Has anyone got an AI generated summary for me to read ?

  • by AleRunner ( 4556245 ) on Thursday June 19, 2025 @01:50PM (#65461405)

    When we call LLMs and related systems "Artificial Intelligence", what we are really doing is false advertising. We need a better name. Maybe "Artificial Skills" and "Artificial knowledge"? This whole AGI thing, pretending that current "AI" is a step on the way to actual artificial intelligence, except in that it's another failed step done by researchers trying to work out what intelligence actually is is a big con job. There's no clarity about that at all.

    This is really needed because the systems break in horrible ways, such as Tesla cars being able to drive, but not understanding that they are driving in dangerous conditions where their cameras aren't enough and need to slow down. The confusing this is causing is already ending up with people dead.

    • Well spoken. AI in all its varieties IS over-hyped. With human reasoning/creativity it's turtles all the way down. With current AI you get the top shell and nothing, but word-salad  beneath it.
      • The strong do as they will. The weak do as they must. 5-th Century BC Athenian commons vs Militians

        The quote is "the strong do what they can and the weak suffer what they must," and it was the Melians not the Militians, and it wasn't Athenian commons, it was an Athenian commander.

    • by ceoyoyo ( 59147 )

      You make a case for not allowing laymen access to scientific research. "Artificial intelligence" is a fairly well defined technical term. Your point is that it is confusing for the uneducated.

      I disagree. It's important that the public be allowed access to scientific research, not only because they pay for most of it, but aslo as it is a collective endeavour of humanity, and everyone should be given as much opportunity as practical to educate themselves.

    • Why? Artificial Intelligence is exactly what the definition has always been, a system that learns / is trained on input and then produces output. LLMs fit this description as did the earliest AI papers. The term was effectively coined in the 50s to describe exactly the kind of thing LLMs are now. Your desire to redefine it now is your misunderstanding (probably from reading too many scifi books), not a problem with the word itself.

      I agree with the rest of it though. AI in its current form is not a step to A

      • Artificial Intelligence is exactly what the definition has always been, a system that learns / is trained on input and then produces output.

        That isn't the original definition. Here is how it was defined in 1955, in the proposal for the Dartmouth Summer Research Project on Artificial Intelligence [computerhistory.org].

        The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.

        They defined AI in terms of what it does (simulating intelligent behavior), not what approach or mechanism is used to do it (learning from input, programming by an expert, trial and error, etc.)

    • When we call LLMs and related systems "Artificial Intelligence", what we are really doing is false advertising. We need a better name.

      Traditionally the names have been "strong AI" or "weak AI." You can call LLMs "weak AI."

  • by Big Hairy Gorilla ( 9839972 ) on Thursday June 19, 2025 @01:53PM (#65461411)
    The LLMs are useful. That part is true. The outputs must be analyzed and vetted, so in other words, same as if you talked to a person.
    The best value an LLM provide, imho, is that they will like know more of the subject matter than you do.
    I suspect when the bubble bursts and dust settles, we'll end up with a kind of interactive encyclopedia as as useable form factor for LLMs.
    • Interactive encyclopedias that at times are confidently.... Wrong! I'm sure there's a huge market for that.. lol! I see more utility in AI that can process the sheer torrent of information that is send to our brains every day in terms of emails, instant messages, training videos, etc. and provide personalized summaries on the key message and any actionable items. If they can get the hallucinations under control that is...
      • Interactive encyclopedias that at times are confidently.... Wrong!

        Just like Wikipedia, which is nonetheless the most useful website ever created. Both Wikipedia and LLM gives you a initial idea, which you can dig through in other resources for confirmation.
        If Wikipedia stopped accepting new contributions, its static dump would be an important achievement of the 2000s. If LLMs stopped being updated, existing open weights models (Llama) would be an important achievement of the 2020s.

        • by Anonymous Coward

          Don't forget that they also preserve language, being language models. You want to know how people talked in the 20s? Current models will preserve that forever. You want to know if the sentence you wrote for the person in your book who lives in the 20s using the right language for that? A language model tells you. I bet people in science, linguistics and history will be very happy in a few decades that they have the essence of the language of today packed into a small model.

          • Very interesting. For once, I have to note the contribution of Meta for hosting conversations in so many languages, enabling them to to preserve even relatively rare languages.

            I agree very much that the value of the existing LLM is already much beyond the interactive encyclopedia mentioned by the OP. Even only the ability to summarise/expand/rephrase/explain text is mind boggling.

            People here who ridicule LLM based on their failure at complex logics or pitiful level at chess are missing the point. Which is b

    • Re: (Score:3, Interesting)

      by dvice ( 6309704 )

      Personally I have found several values for the LLM, for example:
      1. Negative searches that were not possible with traditional search engines except in very rare scenarios. Like "Find me a molecule that has iron and oxygen, but not nitrogen."
      2. Searching research papers with fuzzy words. It is very hard to find a research paper unless you know the exact words to search for it, but AI can translate your fuzzy wording into meaningful search results.
      3. Testing out coding ideas. You can describe something you wan

      • 1. Negative searches that were not possible with traditional search engines except in very rare scenarios. Like "Find me a molecule that has iron and oxygen, but not nitrogen."
        2. Searching research papers with fuzzy words. It is very hard to find a research paper unless you know the exact words to search for it, but AI can translate your fuzzy wording into meaningful search results

        uhhhh search engines supported sort of both of these until they all agreed to start sucking. They were fantastic features.
        If you’re too young to know about old search engines how did you get here?

        • There was a time when Google let you use boolians to narrow your search results to those that you were most likely to want. Recently, they've removed that facility because it lowers the number of results and it's very important at Google to deliver as many results as possible, even though most of them are only vaguely related to what you asked for. The fact that Google doesn't care that this is a major regression says quite a bit about the company and it's ability to maintain its position when, not if, a
      • by narcc ( 412956 )

        It is very hard to find a research paper unless you know the exact words to search for it

        If you have the citation, finding the paper is likely going to be trivial. My guess is that you're just asking the LLM for a citation for something and not checking to see if the paper even exists. They're very good at generating pretend citations. LLMs are not search engines.

        Testing out coding ideas. You can describe something you want and you get instantly code that creates the UI for you.

        Or you could just draw a picture. Not only will you be able to iterate faster, you'll use significantly fewer resources. If that's not your thing, you could use any one of a zillion interface design tools. They don't need endles

  • So now we're getting daily or more stories questioning the value of machine learning, discounting the importance of machine learning, declaiming the feasibility of AGI, downplaying the impact of models on employment, criticizing the hype around it all, predicting crashes in investments, etc. Every day, someone, somewhere is furiously writing another think piece on along these lines. There are at least two on the Slashdot main page right now.

    Meanwhile, the makers are making, the investors are investing,

    • No
      And it’s “they’re”

    • by ceoyoyo ( 59147 )

      The really funny thing is, if you give most people a 20 layer Tower of Hanoi problem and ask them to solve it you'll get something that you might call "complete performance collapse."

      The people who get to administer such tests usually refer to it by terms like "you've got to be fucking kidding" or similar.

  • I care if they are useful, not if they "really understand". They are tools. Just learn to use them.
    • by narcc ( 412956 )

      They are far too inconsistent to be considered tools. These things are toys, as more people are discovering every day.

  • "researchers that found state-of-the-art large language models face complete performance collapse beyond certain complexity thresholds"

    Humans also face complete performance collapse with cognitive tasks beyond certain complexity thresholds

    • by narcc ( 412956 )

      This is the dumbest take. Yes, humans make mistakes and their performance drops of with task complexity, but these things are in no way comparable to the failures of LLMs. The kinds of mistakes humans make are nothing like the kinds of 'mistakes' that LLMs make. You won't find humans accidentally fabricating citations or unintentionally summarizing text that doesn't exist. As for 'complexity', LLMs fail on even simplified versions of Towers of Hanoi, even when they're given explicit instructions. Hum

  • "found state-of-the-art large language models face complete performance collapse beyond certain complexity threshold" the moment they run into a need for the "Intelligence" part they fail!
  • Who is this "Simon Willison", exactly?

    Yes, I can easily DDG him... but the point is, if this isn't someone most of us know. TFS should really provide some information regarding who he is.

    And now that I know he's just one of the creators of Django... why should I care what his opinion is regarding LLMs, one way or the other?

  • new goal to keep the hype-train funded. Resting an laurels of existing AI is not enough to justify bigly P/E ratios.

  • That's the real problem. 99.5% of the people using them, or being encouraged to use them, do NOT understand their limitations - and companies are doing their best to make sure that does not change. This is the same bullshit you see re:robotics in fast food, and other complex, but low status jobs. Robots are not going to replace people anytime soon, either. They are not adaptable enough, and the problems that arise in the real world are too varied. When systems fail with human workers, you can adapt, and ge
  • We've gotten very wrapped up in the philosophical discussion of whether AI models are "thinking." But most people don't actually care whether we've reached some abstract achievement of creating "thought." Most people just care if the tool can do the job.

    Of course, there are serious limitations with current AI tools, but every tool has limitations. The trick is making sure they are used responsibly and in a way that is cognizant of those limitations.

  • This is the problem a lot of people apply to every piece of technology that comes along.

    "It can't safely fly a plane full of 300 people! It's useless!"

    Ok, yeah, sure... I guess. But most things in the world don't have that degree of confidence needed. I used an LLM this morning to remind me of a plot thread in a TV show I haven't watched in 15 years. It got it right, as I remember it as well and it's infinitely easier than scrolling through 300-episode summaries.

HOST SYSTEM RESPONDING, PROBABLY UP...

Working...