Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Facebook

Meta and Qualcomm Team Up To Run Big AI Models on Phones (cnbc.com) 17

Qualcomm and Meta will enable the social networking company's new large language model, Llama 2, to run on Qualcomm chips on phones and PCs starting in 2024, the companies announced today. From a report: So far, LLMs have primarily run in large server farms, on Nvidia graphics processors, due to the technology's vast needs for computational power and data, boosting Nvidia stock, which is up more than 220% this year. But the AI boom has largely missed the companies that make leading edge processors for phones and PCs, like Qualcomm. Its stock is up about 10% so far in 2023, trailing the NASDAQ's gain of 36%. The announcement on Tuesday suggests that Qualcomm wants to position its processors as well-suited for A.I. but "on the edge," or on a device, instead of "in the cloud." If large language models can run on phones instead of in large data centers, it could push down the significant cost of running A.I. models, and could lead to better and faster voice assistants and other apps.
This discussion has been archived. No new comments can be posted.

Meta and Qualcomm Team Up To Run Big AI Models on Phones

Comments Filter:
  • "Big AI Models?" (Score:5, Insightful)

    by aldousd666 ( 640240 ) on Tuesday July 18, 2023 @01:13PM (#63697038) Journal
    I can only assume whomever wrote "Big AI Models" is not familiar with the idea that "Large Language Models" is actually not descriptive of the size of the model, but the category of model.
  • by williamyf ( 227051 ) on Tuesday July 18, 2023 @01:28PM (#63697046)

    Thanks to accelerators like Apple's Neural Engine, Huawei's NPU, Qualcomm's QNP , and all the others...

    After all, runnig the model is not such a huge burden (compared to training) specially with accelerators already deployed on most edge processors.

    In terms of latency, NW traffic and privacy, is the best option there is.

    • by narcc ( 412956 )

      Just because running a large model is cheaper than training one doesn't mean that it's not still expensive! You can run a 4-bit quantization of a LLaMa model at home, but you'll want a pretty beefy machine.

      I'm all for running things locally, just keep those expectations low. If we see anything usable on mobile, it's going to be smaller, more specialized, models.

      • Just because running a large model is cheaper than training one doesn't mean that it's not still expensive! You can run a 4-bit quantization of a LLaMa model at home, but you'll want a pretty beefy machine.

        I'm all for running things locally, just keep those expectations low. If we see anything usable on mobile, it's going to be smaller, more specialized, models.

        If said "beefy machine" DOES NOT HAVE neural processing accelerators (and GPUs do not count as such) then of course the machine needs to be beefy. But if said machine does have dedicated hardware for nerural processing, suddenly the machine can afford the luxury if being less beefy. That's precisely the point.

        • If said "beefy machine" DOES NOT HAVE neural processing accelerators (and GPUs do not count as such) then of course the machine needs to be beefy.

          What criteria isn't being met by current GPUs?

          But if said machine does have dedicated hardware for nerural processing, suddenly the machine can afford the luxury if being less beefy. That's precisely the point.

          Even with super effect processor instructions which all of the major vendors either have or are rolling out you still need the ram and memory bandwidth to exploit these capabilities. What makes high end GPUs useful isn't just a huge number of "tensor" units but the associated memory and massive bandwidth measured in TB/s.

          • Re: (Score:3, Informative)

            Depends on what you want to do: Image classification or generative language... In any case, a ten-fold improvement is economically useful.

            For example: with specialized processor you get about a ten fold reduction in the amount of power needed for image classification at the edge:

            With the K210, a specialized processor optimized to process convolutional neural networks, you can do image classification with about 300 milliamps. Roughly comparable results on a Raspberry Pi take about 3 amps.
            https://wiki.sipeed. [sipeed.com]

            • Depends on what you want to do: Image classification or generative language... In any case, a ten-fold improvement is economically useful.

              Article and my commentary is in the context of LLMs nothing more.

              For example: with specialized processor you get about a ten fold reduction in the amount of power needed for image classification at the edge

              Mythic has the nanos beat by orders of magnitude.

        • by narcc ( 412956 )

          "neural processing accelerators" What a cringe-inducing term! Just call them NPUs. It's slightly less deceptive.

          They're not all that different from GPUs, despite your instance otherwise. They're just more specialized, being optimized for matrix multiplication and convolutions. They're also nothing new. Odds are good that your smartphone already has one.

          So why aren't we running LLaMa-65B/16 our smartphones now? Because they're not magical! Remember that your phone and your computer both have GPUs, bu

  • New word for the day is self-fuckery.

  • by alispguru ( 72689 ) <bob.bane@ m e . c om> on Tuesday July 18, 2023 @02:41PM (#63697196) Journal

    Running stuff like this on a local device should be a win for privacy and possibly latency.

    However, Meta is involved, so surveillance and total lack of privacy is assumed.

    • by kyoko21 ( 198413 )

      If I can somehow pack in the power of a 3090 into a phone along side with a LLM and a stack that can provide accurate information, that would be a game changer. My standalone oobabooga install can load a 33B parameter model and I can ask it all sorts of things and it can provide a response without being connected online. Granted, it hallucinates a lot but functionally it does work for a lot of simple questions, i.e. it can summarize "to kill a mockingbird". These LLMs, with enough time, can be trained and t

      • by narcc ( 412956 )

        These LLMs, with enough time, can be trained and tuned in such a way that it should be able provide accurate answers

        That is unlikely.

        and perhaps in time know when it doesn't know an answer.

        That's not how these things work. They don't carefully consider the prompt and then produce an answer. That's impossible.

        See, they really do produce just one token at a time, with no internal state maintained between tokens. The token produced isn't even necessarily the one that the model predicts is the most likely. (That would produce worse results!) Every token in its vocabulary is assigned a probability and the next token selected randomly on that basis.

        Models like this don't operate

  • Team up to mine even more personal data.
  • Smart phones, because they use shared RAM for video means that most people are more likely to be able to run a LLM on their phone than on their desktop/laptop. And local LLMs are essential to the myriad of applications that can be unlocked because of privacy like medical, etc. It also allows more nested prompts (mild AGI).

    Truly democratizing AI (ownership) is the best alignment strategy I know of. "When everyone is botting, no one is botting."

The best defense against logic is ignorance.

Working...