DeepSeek Outstrips Meta and Mistral To Lead Open-Source AI Race (semianalysis.com) 22
DeepSeek has emerged as the leading open-source AI model developer, surpassing Meta's Llama and Mistral, after releasing its latest model V3 with breakthrough cost efficiencies, research and consultancy firm SemiAnalysis reported on Friday.
The Chinese startup, backed by hedge fund High-Flyer, reached this milestone through innovations in Multi-head Latent Attention technology, which cut inference costs by 93.3% versus standard methods. Despite offering services below cost to gain market share, its performance matches or exceeds OpenAI's GPT-4.
The Chinese startup, backed by hedge fund High-Flyer, reached this milestone through innovations in Multi-head Latent Attention technology, which cut inference costs by 93.3% versus standard methods. Despite offering services below cost to gain market share, its performance matches or exceeds OpenAI's GPT-4.
Follow the (sponsored) money. (Score:3)
Sooo..who owns SemiAnalysis these days? Do they have a fitting name for this best-of-the-best AI analysis? I'm still trying to figure out what the hell "oustrips" might imply with regards to AI performance, since my first thought was copper mining. Just curious as to who is pimping "leading" AI stats these days, and how they outstripped and outdid themselves into that claim.
If we thought dick-measuring contests were bad before, just wait until AI lubes the bullshit-stained skids.
Re: (Score:3)
Pretty good FP. I just wrote a piece that could be adapted to continue the discussion along your lines [what I called "trust" issues], but I'm sorry to say I don't have motivation to find the time to do that for today's Slashdot, so I'm just going to paste a version of my initial reaction here:
What is Deep Seek like? I'll call it DS for short. This is intended as my shared initial reaction based on two conversations. So far neither of my DS conversations has gone very far. That's largely because of "trust" issues, and the second conversation has already started explicit discussion of those matters.
First of all, DS feels quite similar to ChatGPT. That includes the imbalanced verbosity issue. In normal English conversation the turns are fairly balanced. Each side speaks for a while and then the other side responds at similar length. In contrast, in today's discussion, I started with an 8-word description of a topic and DS responded with a huge essay about "life, the universe and everything". [What? No Oxford comma in the original? Color me outraged.] That did motivate me to find the "Stop" button. It's the input button while DS is "typing". [The quotes this time are scare quotes warning of anthropomorphic thinking--but if I used them everywhere they are called for, then this would be much harder to read than it already is...]
This first session actually wound up running over 300 words from my side. One of the diverting topics was word counting. DS explicitly denied being able to count words and offered to explain how I could do it, but I noted that was tedious but if DS counted them it could help balance the conversation... For which suggestion DS again "sincerely thanked" me. [Faking sincerity. As always. Another recurring theme of all the GAIs I've played with.]
So far I have not introduced any substance into either of the conversations. In the first conversation, I am concerned that the data source may "offend" the operators/owners of DS. It's a big data source that almost certainly includes some politically sensitive material. The second conversation involves some ideas that might be quite valuable. Why should I let the operators/owners of DS take the money and run? [But one of my many problems is that I don't really care much about money... However if I do help solve some problems in the real world it would be nice to get some credit?]
That's about all I have time for now... Not sure if I'll continue later on this theme. Your conversational turn:
Enough DeepSeek (Score:2)
Can we get some bitcoin stories for old times sake?
This is nothing but a method to pump up stock quickly before the bottom falls out.
Re: (Score:3)
What stock? This is just the CCP making hay, not for money but for geopolitics.
AI just got made redundant by AI (Score:4, Funny)
How ironic.
Insightful but partially paywalled article (Score:4, Interesting)
The linked SemiAccurate article is actually quite insightful, but the summary is off-mark and not insightful. The article talks gives a broad perspective across companies/models and years. The author argues that DeepSeek advances are "simply" part of the skyrocketing pace of progress that have been happening before DeepSeek. The article is a good read, although the last half is paywalled.
Not open source???? (Score:2)
Re: (Score:2)
The DeepSeek model everyone is talking about is MIT licensed.
Some others have a "DeepSeek" license that mostly limits what you're allowed to do with it (You may not create deepfakes, etc.)
Can we please agree on this much (Score:2)
The training costs (Score:2)
Are misrepresented, they only reported the final training and not the costs associated with massaging the data. No one knows how much they spent. I'm glad though because maybe these idiot AI companies will start to actually think of ways to save energy than to 'throw now hardware at it'.
If you are worried about your power bill get ready for it to double or triple, these energy guzzlers will make power scarce and drive up costs. I'm many data center installations, they also want the power companies to suppor
Re: (Score:2)
Are misrepresented, they only reported the final training and not the costs associated with massaging the data. No one knows how much they spent. I'm glad though because maybe these idiot AI companies will start to actually think of ways to save energy than to 'throw now hardware at it'.
Pretraining costs are just one time expenditures. Once pretraining is complete cost of tuning or additional training are relatively minuscule. What ultimately matters for an AI that gets used are inference time costs.
As inference costs continue to fall would expect Jevons paradox to apply with a fury for these thought chain models where capabilities are somewhat linked to runtime and energy consumption.
Re: (Score:1)
Re: (Score:2)
A great thing about probabilistic algorithms is that if you can get it to be right 60% of the time, then all you have to do is to rinse and repeat over and over again to increase your accuracy.
Again, what "race"? (Score:3)
The race to be stupid faster and cheaper?
Re: (Score:3)
Re: (Score:2)
Re: (Score:1)