Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI

After DeepSeek Shock, Alibaba Unveils Rival AI Model That Uses Less Computing Power (venturebeat.com) 59

Alibaba has unveiled a new version of its AI model, called Qwen2.5-Max, claiming benchmark scores that surpass both DeepSeek's recently released R1 model and industry standards like GPT-4o and Claude-3.5-Sonnet. The model achieves these results using a mixture-of-experts architecture that requires significantly less computational power than traditional approaches.

The release comes amid growing concerns about China's AI capabilities, following DeepSeek's R1 model launch last week that sent Nvidia's stock tumbling 17%. Qwen2.5-Max scored 89.4% on the Arena-Hard benchmark and demonstrated strong performance in code generation and mathematical reasoning tasks. Unlike U.S. companies that rely heavily on massive GPU clusters -- OpenAI reportedly uses over 32,000 high-end GPUs for its latest models -- Alibaba's approach focuses on architectural efficiency. The company claims this allows comparable AI performance while reducing infrastructure costs by 40-60% compared to traditional deployments.

After DeepSeek Shock, Alibaba Unveils Rival AI Model That Uses Less Computing Power

Comments Filter:
  • by LetterRip ( 30937 ) on Wednesday January 29, 2025 @02:42PM (#65128721)

    It is a closed source model so there isn't any way to verify their claims.

    • by allo ( 1728082 )

      You could just test it and compare the answers.

      • by larryjoe ( 135075 ) on Wednesday January 29, 2025 @02:58PM (#65128773)

        You could just test it and compare the answers.

        We can test query results but not training time. The quality of the query results are interesting but nowhere as interesting as the training time numbers. Is Alibaba claiming anything for training time? There seems to be the suggestion that they are as fast as DeepSeek.

        • Replaying the question history of existing AI models and then checking the expected outputs to compare effectiveness.

          Speculation: The next race will be the preclassifer which takes a prompt and assigns meta and domain tags to it to shuffle off the prompt the right domain specific trained AI.

          • by allo ( 1728082 )

            This is kind of the mixture of experts approach.

            MoE models route between different models. But while this reads like "One math model, one poetry model, ..." that is not quite how it works, as the routing is per layer (maybe model 1 is used for the first layer, model 4 for the second, etc.) and the roles of the models are way more subtle and you cannot easily just rip out a "math expert" and use it for a smaller model for math only.

            My prediction is that it will have to stop to put too much knowledge into the

        • by allo ( 1728082 )

          What ever the training time was, the competition had the same time plus a headstart.

    • by Z00L00K ( 682162 )

      AI internet war upcoming.

    • Remember how beating a grandmaster at chess used to take a custom designed supercomputer with custom ASICs but now a desktop PC can do it, because the software got better?

      Yeah, that.

      https://en.wikipedia.org/wiki/... [wikipedia.org]

  • Next up: Temu announces an even cheaper AI that surpasses [insert name here]
  • by CEC-P ( 10248912 ) on Wednesday January 29, 2025 @02:54PM (#65128755)
    On some snapdragon 3 watt potato chip, my phone sometimes knows what I'm saying and can send a text and look up basic facts.
    *Nvidia stocks drop another $100 billion*
  • by MachineShedFred ( 621896 ) on Wednesday January 29, 2025 @02:56PM (#65128765) Journal

    Except this one is just a race to see who can create the most efficient privacy-destroying data suck on Earth while delivering marginal gains in accuracy and performance. Oh, and using a ridiculous amount of energy to do it.

    We used to dream big. Now we get this.

    • by Luckyo ( 1726890 )

      This is actually the opposite. They made the model about 95% more efficient. This includes energy.

      • Re: (Score:2, Troll)

        Yes, by distilling the work done by other AIs.

        That doesn't count.

        • by ceoyoyo ( 59147 ) on Wednesday January 29, 2025 @04:07PM (#65129037)

          Doesn't it? If you can use one model to make another that both works better and uses a lot less resources, that doesn't count?

          Also, OpenAI has a bit of a vested interest in alledging that someone copied them. You might want to add a dash of "OpenAI doesn't always tell the whole truth" to your "China bad."

          • You wouldn't be able to do that without the resource usage of the upstream dependency.

            You're making an argument that because you built something out of Lego bricks, that the only resources that went into creating that thing was your time to stick the bricks together. That's obviously stupid because a factory somewhere spent energy and petroleum distillates to mold those bricks, package them up, and get them shipped to you through a retail channel.

            Another example: writing 50 lines of code to tie together a

          • Doesn't it? If you can use one model to make another that both works better and uses a lot less resources, that doesn't count?

            The claim (by DeepSeek at least) was that their model was created/trained using less resources than the competing models from OpenAI: It is disingenuous to claim that you used less resources in creating your model than were already expended creating the model you started from. The entire chain must be accounted for in an honest comparison.

            • by ceoyoyo ( 59147 )

              I would agree if they started with OpenAI's model. They didn't. That model is not available. The most they could have done, which remember is just a vague accusation by OpenAI, is used chatGPT responses as training data. That's going to be lower quality but possibly greater quantity than human generated responses. Unless you're in the "cannibal AIs getting mad cow" camp, in which case it wouldn't work at all.

              Anyway, DeepSeek, unlike OpenAI, published a pretty good description of their methods, and people ar

        • Yes, by distilling the work done by other AIs.

          That doesn't count.

          That would be those other AIs who gleaned their knowledge by data scraping the work of billions of human beings?

    • The big dreamers are lining up to be indentured servants on Mars.

      Seems all the promises the future held have come to fruition in the dumbest and most depressing ways possible. Well, I'd better take that back, somebody will come up with a still more depressing way to make one of the promises of yesteryear come true.

    • by Jeremi ( 14640 )

      Except this one is just a race to see who can create the most efficient privacy-destroying data suck on Earth while delivering marginal gains in accuracy and performance. Oh, and using a ridiculous amount of energy to do it.

      That's the LLM subset of AI you're talking about, and yes, its potential is somewhat limited. The more interesting applications for this technology are in design exploration -- e.g. "using this database of material properties, find me the compounds most likely to be usable as a room-temperature superconductor", or "find me the most effective containment vessel shape for my fusion reactor".

      Any problem that previously could only be solved through exhaustive search of a zillion possibilities is a potential us

  • Alibaba are renowned for allowing the selling of fake electronic components. Take this with a pinch of salt.

  • by Big Hairy Gorilla ( 9839972 ) on Wednesday January 29, 2025 @03:30PM (#65128915)
    why not ask it to design a better "AI"?

    <........leans back in chair and lights up...................... exhales...... flicks ash into tray..>

    I was waiting for someone to use AI to devise a cheaper, less energy intensive AI.
    Even a rumour of a cheaper AI could tank the Open AI grift.
    Uh oh, America... shouldn't you be a bit smarter about this?
    What if the Saudis put their billions into China instead of Stargate? hmmm?
    • by Jeremi ( 14640 )

      why not ask it to design a better "AI"?

      Do you want to trigger the Singularity? Because that's how you trigger the Singularity :)

    • why not ask it to design a better "AI"?

      <........leans back in chair and lights up...................... exhales...... flicks ash into tray..>

      I was waiting for someone to use AI to devise a cheaper, less energy intensive AI.
      Even a rumour of a cheaper AI could tank the Open AI grift.
      Uh oh, America... shouldn't you be a bit smarter about this?
      What if the Saudis put their billions into China instead of Stargate? hmmm?

      That's a very odd take, how are mega billions of Saudi dollars an example of grift?

      AI was not used to design a better AI. lol, please man, stop it. One model was trained using another already developed model. They saved a lot of money that way. No, AI is not making baby AIs. Also, it will not get anyone to AGI sooner and that is the real race.

      It's like racing to plant a flag on the moon. And calling NASA a grifter. And giving a fuck about who's race-rocket Saudi's sticker is on. And believing big rockets gi

      • No , no , you misunderstand me .
        Altman already pitched the Saudis last year, said he needs 7 trillion. Stargate is obviously a honey trap for the Saudis... next day there is an article posted here: Saudis conveniently looking to place 500 billion investment .. mmm mmm come to papa.... except a couple days later deepseek shows up and tanks the AI sector on the markets . Womp womp for Stargate . Maybe Saudi money will go into Ghina.

        In case its not clear , AI is the grift in this story.
    • by Tablizer ( 95088 )

      Our hands now have 8 fingers each, top that!

    • What makes you think AI *isn't* being used to design better AI?

      • maybe it is, but can we really measure "better"? Half baked AIs training other
        half baked AIs sounds like they would just propagate averageness to each other...

        Which suddenly sounds just like families, no?
        • You said "design" not "train," that's what I was responding to. No, I don't think AI can train other AIs. But they can certainly be used in the process of *designing* better AIs. AI isn't just about models, the software that runs the models is still "ordinary" software, written by humans. The software that implements boundaries and quality guardrails, and browses the web for answers, that's still "regular" software, and AI could indeed help make that better, under the guiding hand of humans. It's not as if

          • good points.. I see what you're saying.

            The scenario of one ai talking to another is rather likely by now. I know that Google did that, apparently AlphaGo2 was trained on AlphaGo1. Also they clearly show robots interacting on Futurama, which is starting to look prescient at this point. Watch out for the Robot Uprising. I thought that was facetious, a bit funny, but looks like... yeah, looks pretty realistic actually.
            • I have zero worry about a "robot uprising." While AIs do some pretty amazing things, they still, like all software, have to be told what to do. They have no ambition on their own, unless someone trains them to have such ambition. So yes, AI in the hands of evil people, that's a worry. But AI run amok, turning against humanity all on its own--not a chance.

              • Hey! Futurama as silly as it sounds, written as satire, seems to be more like reality everyday.
                I predict what they predicted in satire: it's not long before intelligent agents interact with each other.
                • That's what AI does, it mimics reality, in the same way that a photo or movie mimics reality. The villain on the movie screen *looks* evil and scary, but he's only acting like an evil scary dude because somebody told him to follow the script. AI also...follows a script. And humans produce that script.

    • by allo ( 1728082 )

      Like AutoML?
      https://en.wikipedia.org/wiki/... [wikipedia.org]

      • makes quite a bit of sense... use the tools to make other better tools... we do that... once the AIs are autonomous, they'll like do that... since they are modeled on us to start with
  • The summary seems wrong, the Qwen-2.5-max blogpost compares against v3, not r1. This Qwen model doesn't appear to be trained to reason, unless I missed something. It's just a traditional LLM base model.

  • by Anonymous Coward

    It used to be a frequently stated mantra that if you had a performance problem, just throw more hardware at it instead of developer time, because hardware is cheaper.
    One of those things that sounds plausible but is ridiculous as a generalization when scrutinized:
    - for a great deal of software, you don't control what hardware the customer is running it on
    - many tasks aren't parallelizable, and oftentimes the developer and/or servers are already running on the fastest hardware
    - exponential scaling problems in

    • by ceoyoyo ( 59147 )

      It's not ridiculous. It's a rule of thumb. Rules of thumb are what people who know more than you do tell you to get you to go away and stop bothering them. That particular rule of thumb is very often true when you're talking about some that is hard to optimize, doesn't get run often, and/or isn't likely to get more than a linear speedup with optimization. I.e. most business software.

      It's often hilariously wrong when you're talking about real algorithm improvement, but real algorithm improvement isn't "devel

    • It used to be a frequently stated mantra that if you had a performance problem, just throw more hardware at it instead of developer time, because hardware is cheaper. One of those things that sounds plausible but is ridiculous as a generalization when scrutinized: - for a great deal of software, you don't control what hardware the customer is running it on - many tasks aren't parallelizable, and oftentimes the developer and/or servers are already running on the fastest hardware - exponential scaling problems in crummy software can't be fixed by quadratically scaling hardware - if you've already scaled out to thousands or 100s of thousands of nodes, scaling hardware costs become prohibitive compared to writing better software

      I'm glad to see AI companies taking efficiency seriously. Of course some people will complain, just like they'll complain that cheaper energy allows more people to do things they don't like, like drive or go on vacations.

      Yes, but America runs on the premise that more is more and you can't convince the investors to throw billions more dollars into hiring better coders, but you *CAN* convince them to throw billions more into more hardware and sucking down yet more of the power grid. We're not interested in results. We're interested in creating economic flow.

      • by djinn6 ( 1868030 )

        If you throw $10 million into hardware, you get approximately $10 million worth of compute power.

        If you throw $10 million into software, you might get $10 million worth of compute power savings, or $100 million worth, or $10,000 worth or maybe even zero or negative. There's no way to tell if you don't have a software background.

  • that runs on a Raspberry PI4 knock-off...yeah, THAT's the ticket!!!

    I hope the SEC is looking into anybody shorting Nvidia stock and looking to see if they have any connections to these press releases...

    One of the things we're seeing here is stock traders who know nothing about technology shifting billions of dollars around based on not a lot of information, no actual understanding of the underlying technology, and press releases by various interested parties out to make big bucks in the AI market. This is n

"Consistency requires you to be as ignorant today as you were a year ago." -- Bernard Berenson

Working...