Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Math Technology

How Much of the World Is It Possible to Model? 45

Dan Rockmore, the director of the Neukom Institute for Computational Sciences at Dartmouth College, writing for The New Yorker: Recently, statistical modelling has taken on a new kind of importance as the engine of artificial intelligence -- specifically in the form of the deep neural networks that power, among other things, large language models, such as OpenAI's G.P.T.s. These systems sift vast corpora of text to create a statistical model of written expression, realized as the likelihood of given words occurring in particular contexts. Rather than trying to encode a principled theory of how we produce writing, they are a vertiginous form of curve fitting; the largest models find the best ways to connect hundreds of thousands of simple mathematical neurons, using trillions of parameters.They create a vast data structure akin to a tangle of Christmas lights whose on-off patterns attempt to capture a chunk of historical word usage. The neurons derive from mathematical models of biological neurons originally formulated by Warren S. McCulloch and Walter Pitts, in a landmark 1943 paper, titled "A Logical Calculus of the Ideas Immanent in Nervous Activity." McCulloch and Pitts argued that brain activity could be reduced to a model of simple, interconnected processing units, receiving and sending zeros and ones among themselves based on relatively simple rules of activation and deactivation.

The McCulloch-Pitts model was intended as a foundational step in a larger project, spearheaded by McCulloch, to uncover a biological foundation of psychiatry. McCulloch and Pitts never imagined that their cartoon neurons could be trained, using data, so that their on-off states linked to certain properties in that data. But others saw this possibility, and early machine-learning researchers experimented with small networks of mathematical neurons, effectively creating mathematical models of the neural architecture of simple brains, not to do psychiatry but to categorize data. The results were a good deal less than astonishing. It wasn't until vast amounts of good data -- like text -- became readily available that computer scientists discovered how powerful their models could be when implemented on vast scales. The predictive and generative abilities of these models in many contexts is beyond remarkable. Unfortunately, it comes at the expense of understanding just how they do what they do. A new field, called interpretability (or X-A.I., for "explainable" A.I.), is effectively the neuroscience of artificial neural networks.

This is an instructive origin story for a field of research. The field begins with a focus on a basic and well-defined underlying mechanism -- the activity of a single neuron. Then, as the technology scales, it grows in opacity; as the scope of the field's success widens, so does the ambition of its claims. The contrast with climate modelling is telling. Climate models have expanded in scale and reach, but at each step the models must hew to a ground truth of historical, measurable fact. Even models of covid or elections need to be measured against external data. The success of deep learning is different. Trillions of parameters are fine-tuned on larger and larger corpora that uncover more and more correlations across a range of phenomena. The success of this data-driven approach isn't without danger. We run the risk of conflating success on well-defined tasks with an understanding of the underlying phenomenon -- thought -- that motivated the models in the first place.

Part of the problem is that, in many cases, we actually want to use models as replacements for thinking. That's the raison detre of modelling -- substitution. It's useful to recall the story of Icarus. If only he had just done his flying well below the sun. The fact that his wings worked near sea level didn't mean they were a good design for the upper atmosphere. If we don't understand how a model works, then we aren't in a good position to know its limitations until something goes wrong. By then it might be too late. Eugene Wigner, the physicist who noted the "unreasonable effectiveness of mathematics," restricted his awe and wonder to its ability to describe the inanimate world. Mathematics proceeds according to its own internal logic, and so it's striking that its conclusions apply to the physical universe; at the same time, how they play out varies more the further that we stray from physics. Math can help us shine a light on dark worlds, but we should look critically, always asking why the math is so effective, recognizing where it isn't, and pushing on the places in between.
This discussion has been archived. No new comments can be posted.

How Much of the World Is It Possible to Model?

Comments Filter:
  • By definition... (Score:5, Insightful)

    by iMadeGhostzilla ( 1851560 ) on Friday January 19, 2024 @11:18AM (#64172699)

    ... an infinitely small fraction of it.

    Because, as the French historian Ernest Renan said, logic excludes -- by definition -- nuances, and truth resides exclusively in the nuances.

    That said, sometimes an infinitely small fraction is good enough for what you're trying to do, so it all depends on what you're trying to do.

  • by Rei ( 128717 ) on Friday January 19, 2024 @12:09PM (#64172863) Homepage

    Words don't exist in a vacuum. Words are a reflection of the world that led to their creation. Accurate prediction requires an accurate underlying model of the world behind those words, which involves iterative fuzzy logic performed on an insanely massive superposition of decisionmaking processes.

    Prediction is a fundamental aspect of our own brains. We are constantly making predictions about what our senses will experience. The difference between our predictions and our actual senses forms an error metric which provides the ground for back propagating training through the brain, adjusting not just the final output layers, but the deep conceptual layers that led to said predictions. If you reach out to touch a table and it's a centimetre away from where you expected, it's not going to change much. But if you reach out to touch a table and your hand passes right through it, your entire model of the world may be about to change.

    Word prediction in our brain in particular has been very well studied, and indeed, is well modeled by the Transformers architecture.

    The space for the detail in which the universe can be represented in a LLM, LMM, etc can be found in the size of the hidden state, e.g. perhaps 192 floating point numbers, e.g. the maximum of their quantization resolution (anywhere from 2^3 to 2^32 bits) to the 192nd power. While a person can't just look at it and "read it" (though there's increasingly good work on "probes" for it), it's trivial to graph out where concepts are relative to other concepts on any number of axes, e.g. cosine similarity. For example, "cow" is near both "grass" and "milk", but "milk" and "grass" aren't close to each other. You can also perform other mathematical operations on these dense generalized states of reality - for example, "king" + "woman" - "man" ~= "queen".

    It should also be noted that outputs are planned as a whole, not simply the subsequent word. For example:

    "John wanted something sour, so he went to his lemon tree and picked..."
    "John wanted something sweet, so he went to his apple tree and picked..."

    Will the next word be "a" or "an"? Well, it depends on the word that comes after that. Words later in the output impact earlier predictions. A lemon vs. AN apple; the object is the determiner. Look at graphs of diagrammed sentences; words consistently reach back into the past. You simply cannot have long-term coherent output - only rambling - if you just work via "recent word statistics". Ala Markov chain text prediction is inherently going to rapidly trend off to nonsense.

    The overall planning is a result of the attention mechanism. Each token's hidden state is shifted in a given direction as a result of its role relative to other tokens throughout the entirety of the input. The fact that it's a lemon tree influences the fact that what's being picked is A lemon, and from that, the odds that the next word is a, not an.

    These hidden states form a dense representation of reality. A point plotted out in a vastly multidimensional space. To compress all of reality down into a point represented by couple hundred floats, to compress reality by countless orders of magnitude, to generalize reality. And as to "how much" you can compress... certainly that space is tiny compared to reality, but (2^(3 to 32))^(a couple hundred) still is a really damned large space. Your model's ability is much more likely to be limited by how well you transform that space than by the space itself. Which is why we don't just use larger hidden states, and why you can quantize heavily without losing that much performance.

    • by awwshit ( 6214476 ) on Friday January 19, 2024 @12:21PM (#64172901)

      It takes imagination to create new training data. These models completely lack imagination. At best they can only mimic human thought.

      • Mimicking is of course exactly the point. But I wonder if you can rigorously define the difference between imagination and what an LLM does. From what I can tell, the main difference is the ability to introspect and logically analyze possible outputs. (It's kind of weird that logic, of all things, has become the problem with AI.)
        • Hmm. I kinda have to go back to the person I responded to, who said this:

          > Word prediction in our brain in particular has been very well studied, and indeed, is well modeled by the Transformers architecture.

          Ok, but predicting words is not the only action I take. I might predict a word and not quite *like* it, the word does not *feel* right. So, I go to a Thesaurus and I look up *alternatives* to the word that does not *feel* right. Then I *imagine* how I might use some of these alternative words in a sen

      • by Rei ( 128717 )

        Good LLMs beat almost all humans in creativity tests. [theguardian.com] As r Ethan Mollick puts it: “We are running out of creativity tests that AIs cannot ace.”

        They are not compositors; they are logic engines.

        • Ah, but isn’t creativity a slippery concept – something that’s hard to define but that we nevertheless recognise when we see it? That hasn’t stopped psychologists from trying to measure it...

          The really illuminating aspect of the study, though, was an inference drawn from it by the researchers about the economics of it. “A professional working with ChatGPT-4,” they write, “can generate ideas at a rate of about 800 ideas per hour. At a cost of $500 per hour of human effort, a figure representing an estimate of the fully loaded cost of a skilled professional, ideas are generated at a cost of about $0.63 each At the time we used ChatGPT-4, the API fee [application programming interface, which allows two or more computer programs to communicate with each other] for 800 ideas was about $20. For that same $500 per hour, a human working alone, without assistance from an LLM, only generates 20 ideas at a cost of roughly $25 each For the focused idea generation task itself, a human using ChatGPT-4 is thus about 40 times more productive than a human working alone.”

          I'm feeling like that is a very odd way to value creative work, because most of those ideas are garbage and one could also assign the entire cost to the final approved idea. It is really the quality and value of the best idea that matters.

          We already know from a normal distribution that the top maybe 2% of people have the capacity to do interesting things. If we assume that the measures are any good, the studies show the LLM beating 95-99% of people. The imagination of that other 1-5% of people is what helps

    • Smells like JapeChat  runningonwordthickenss.
  • I suppose in the context of a New Yorker article, using 700 words in the "summary" actually does condense things down. But still, this is one of the longest damn "summaries" I've seen on /. in a long while. And it's not much of a summary, in the sense of a succinct encapsulation of the full article, or at least an enticing intro. Instead, this "summary" a straight-up quotation of four whole fucking paragraphs that appear 2/3 of the way into the article. That's just lazy, and does not serve the community
  • We like artificial intelligences because of the promise of mating infallible computers with complex tasks (such as driving). But the unfortunate reality is that AI, as it is practiced today, is not infallible. It may not even be better than a human. It cannot be validated analytically. It has to be tested empirically. It is faster than a human, though, and also, not subject to fatigue, which is an improvement over the human being when it comes to driving.

    Regarding the limits of modeling, that is an inter
  • Seriously? I am certainly not part of that "we" mentioned there. The only way to get better at thinking is to do it. If you are bad at thinking, the world is going to roll right over you. So why would anybody not want to think and practice thinking? Besides, on most topics it is fun and it is valuable on all of them.

    Yes, I know. Most people would rather die than to start thinking of their own volition. I just do not understand how that can be.

  • Comment removed based on user account deletion
  • How Much of the World Is It Possible to Model?

    All of it. It seems Dan Rockmore didn't get the memo that we live in a simulation.
  • by wherrera ( 235520 ) on Friday January 19, 2024 @01:54PM (#64173243) Journal
    "The map is not the territory." —Alfred Korzybski, Science and Sanity https://en.wikipedia.org/wiki/... [wikipedia.org]
    • The quotation from Korzybski is exactly to the point. A model represents a particular abstraction from reality, where "abstraction" is a process of selective leaving out. A model may be useful for specific purposes provided that nothing has been left out that is essential for those purposes.

      Oddly perhaps (or perhaps not), clever people used to be sceptical about the use of models to predict aspects of reality. That seems to have changed when many people began to use computers that let them create elaborate

  • by VeryFluffyBunny ( 5037285 ) on Friday January 19, 2024 @02:15PM (#64173349)
    ...some are useful." - George Box

    Statistical models are only predictive as long as nothing has changed.

    Statistical models tell us what happens, i.e. descriptions of surface features, rather than *how* & *why* things happen.

    When you understand this, you can understand when & how statistical models are useful.

    LLMs are a special problem because language is the social semiotic, i.e. it's a trace of a much larger more complex system; society. LLMs are modelling the trace, not the cause of the trace.
    • There was a good example of this problem in the recent /. article on bird-identifying binoculars [slashdot.org]. They don't seem particularly useful for learning, since they don't tell how they came up with the results. If it's not a completely black box, it should be possible to extract this "how", but it probably wouldn't make much sense to humans. Current AI makes a very bad replacement for teachers, as it cannot teach the actual process but only gives answers.
  • The question isn't how much you can model.

    The question is how accurate your model can be.

    You can model the whole world really badly using bad assumptions and low granularity.

    • Also, models by definition focus on certain characteristics. A model might focus on physical shapes, such as topography, or the motion of things like air masses or water, or roads, or any number of other specific things. A model that doesn't focus on something specific, isn't a model at all. And because a model has a specific focus, it by definition leaves out everything else. And that is both what makes models useful, and limited, at the same time.

  • In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Us

  • The current approaches are effective in certain domains and useless in others. The LLM paradigm is NOT the way to achieve AGIs, and despite its popularity (with the hordes) it is a flawed and misleading path forward. We know by analysis of the brain that we learn incrementally, not by mass batch training. Humans learn one thing at time, and we do it by back-connecting to things we already know, building up knowledge trees. LLMs are rigid in that they have to absorb a ton of information per session and doin

    • That is NOT how the human brain works, and as I've said many times, we do it on 20 watts not with expensive vector processors and megawatts.

      Yes, but we also do it with neurons and not with transistors. Neurons use less than a tenth of a volt. Neurons don't need to be emulated with code, they are real things. Etc etc, obviously it's going to take more power to emulate than the real thing. That's not a show stopper unless the increase in power is literally impossible to deliver, utilize, etc.

      Humans learn one thing at time, and we do it by back-connecting to things we already know, building up knowledge trees

      It seems to me like the human brain does both kinds of thing, and maybe some other things besides... I think the real reason our current efforts can't lead t

You know you've landed gear-up when it takes full power to taxi.

Working...