OpenAI's Lead Over Other AI Companies Has Largely Vanished, 'State of AI' Report Finds (yahoo.com) 61

Posted by msmash on Friday October 18, 2024 @01:42PM from the closer-look dept.

An anonymous reader shares a report: Every year for the past seven, Nathan Benaich, the founder and solo general partner at the early-stage AI investment firm Air Street Capital, has produced a magisterial "State of AI" report. Benaich and his collaborators marshal an impressive array of data to provide a great snapshot of the technology's evolving capabilities, the landscape of companies developing it, a survey of how AI is being deployed, and a critical examination of the challenges still facing the field.

One of the big takeaways from this year's report, which was published late last week, is that OpenAI's lead over other AI labs has largely eroded. Anthropic's Claude 3.5 Sonnet, Google's Gemini 1.5, X's Grok 2, and even Meta's open-source Llama 3.1 405 B model have equaled, or narrowly surpassed on some benchmarks, OpenAI's GPT-4o.ââBut, on the other hand, OpenAI still retains an edge for the moment on reasoning tasks with the release of its o1 "Strawberry" model -- which Air Street's report rightly characterized as a weird mix of incredibly strong logical abilities for some tasks, and surprisingly weak ones for others.

Another big takeaway, Benaich told me, is the extent to which the cost of using a trained AI model -- an activity known as "inference" -- is falling rapidly. There are several reasons for this. One is linked to that first big takeaway: With models less differentiated from one another on capabilities and performance, companies are forced to compete on price.ââAnother reason is that engineers for companies such as OpenAI and Anthropic -- and their hyperscaler partners Microsoft and AWS, respectively -- are discovering ways to optimize how the largest models run on big GPU clusters. The cost of outputs from OpenAI's GPT-4o today is 100-times less per token (which is about equivalent to 1.5 words) than it was for GPT-4 when that model debuted in March 2023. Google's Gemini 1.5 Pro now costs 76% less per output token than it did when that model was launched in February 2024.â

OpenAI's Lead Over Other AI Companies Has Largely Vanished, 'State of AI' Report Finds

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 61 Comments Log In/Create an Account

Comments Filter:

They never really had one (Score:4, Insightful)

by gweihir ( 88907 ) writes: on Friday October 18, 2024 @01:49PM (#64875195)

Crappy and unfixable as their product is. All they had is more lies and something dressed up to look nicely a bit earlier.
It really does not matter. All LLMs are crap and will remain crap. The approach is not suitable for anything but a slightly better search engine, but I recently found out ChatGPT is crap at that as well, because it could not actually provide reasonable sources for statements it made. Apparently referencing sources is not done using the training data, but by web search. With that, I can simply do a web search directly and save time.

- Re: (Score:3)
  
  by i kan reed ( 749298 ) writes:
  
  LLMs fantastically solved the problem of how to masquerade spam as legitimate content undetectably to search engines.
  LLMs may indeed only produce crap, but that's great news for all the shitmongers out there.
  - Re: (Score:2)
    
    by ebunga ( 95613 ) writes:
    
    And it wasn't that long ago that they were exclusively available only as the deluxe AI-based version of article respinners used for SEO spam for the content factories used in affiliate marketing schemes.
- Re:They never really had one (Score:5, Interesting)
  
  by MightyMartian ( 840721 ) writes: on Friday October 18, 2024 @01:55PM (#64875219) Journal
  
  People that say this sort of thing never actually seem to use it for more than making fart poems. I have used ChatGPT to combine multiple dissimilar reports into a single collated report in the proper tense. It still needed some massaging, but for me to do it would have been several hours work, and ChatGPT puked out a page of text that I just had to tweak and add some graphs to. It's terrible for a lot of problems (the code it creates is nightmarish, but it's actually not bad with SQL), but for language-based problems, providing you understand its limitations and how to give it instructions, it's definitely boosted my productivity.
  
  - Re: (Score:1)
    
    by gweihir ( 88907 ) writes:
    
    Your invalid AdHominem is just that: invalid.
    - - Re: (Score:2)
        
        by dfghjk ( 711126 ) writes:
        
        "You're a known malcontent on the subject of LLMs..."
        A virtue.
        "...dismissing your "it's all crap" statements is perfectly valid."
        It would be if the argument wasn't that LLMs are useful in producing crap. That was literally the argument.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        "You're a known malcontent on the subject of LLMs..."
        A virtue.
        Thank you. I like to think of it as being in possession of a working mind.
      - Venture capital firm says other AI investments are (Score:2)
        
        by will4 ( 7250692 ) writes:
        
        Exactly how skeptical do you have to be when a venture capital firm which invests in AI startups claims that AI startups which are not the largest one are as good as the largest one?
        And how may of those second tier, third tier, ... AI startups have investments from that venture capital fund?
        Is this just a, California venture capital, billionaires, 1% wealthy invested in a lot of AI startups and those other startup investments need to be focused on so that the uber-wealthy rich can make money off of them?
  - Re:They never really had one (Score:5, Informative)
    
    by Savage-Rabbit ( 308260 ) writes: on Friday October 18, 2024 @02:20PM (#64875291)
    
    People that say this sort of thing never actually seem to use it for more than making fart poems. I have used ChatGPT to combine multiple dissimilar reports into a single collated report in the proper tense. It still needed some massaging, but for me to do it would have been several hours work, and ChatGPT puked out a page of text that I just had to tweak and add some graphs to. It's terrible for a lot of problems (the code it creates is nightmarish, but it's actually not bad with SQL), but for language-based problems, providing you understand its limitations and how to give it instructions, it's definitely boosted my productivity.
    This is true, but it is also not the hype. The hype is that 2028 GPT powered robots will put an end to the need for any human workers and what’s left of humanity will subsist on Soylent Green, Corpse Starch and just enough UBI handouts to prevent us from revolting and eatingo our AI bro overlords raw.
    
    - Re: (Score:3)
      
      by MightyMartian ( 840721 ) writes:
      
      Well first of all, the explicit hype you mention is your own hyperbole. So straw man noted and promptly ignored.
      As to hype in general, if we were to declare invalid every product or service promoted by hype, well, fuck, there wouldn't be much left to buy or sell. My experience with the actual tool is, providing you understand its limitations (it's absolutely fucking horrible at math beyond about a grade five level), it's a pretty sophisticated analysis and correlation tool.
      - Re: (Score:2)
        
        by MightyMartian ( 840721 ) writes:
        
        Rebutting an argument with sarcasm doesn't constitute a rebuttal at all
        
        Re: (Score:2)
        
        by MightyMartian ( 840721 ) writes:
        
        The post wasn't framed as humor:
        "This is true, but it is also not the hype. The hype is that 2028 GPT powered robots will put an end to the need for any human workers and what’s left of humanity will subsist on Soylent Green, Corpse Starch and just enough UBI handouts to prevent us from revolting and eatingo our AI bro overlords raw."
        I sense no sarcasm here at all. It's literally invoking a strawman argument. In other words a dishonest representation of what LLM developers and advocates are promising
  - Re: (Score:1, Interesting)
    
    by dfghjk ( 711126 ) writes:
    
    "...it's definitely boosted my productivity."
    Because your job is to produce things that aren't needed, and we're perhaps better off without, such as ramming together dissimilar "reports" made by others and adding "some graphs". You know, things that worthless tools can "puke out".
    - Re: (Score:2)
      
      by MightyMartian ( 840721 ) writes:
      
      You have no idea what my job is, and if you have to create a caricature of my career based solely on what you could literally invent in the space between your two ears, then I'd say that suggests a good deal about the nature and veracity of your arguments.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        You gave a description of the task where it "boosted" your productivity. That description matches a traditional "bullshit", typically done by a person that does not see the nature of their job.
- Re:They never really had one (Score:5, Interesting)
  
  by Ed_1024 ( 744566 ) writes: on Friday October 18, 2024 @02:37PM (#64875347)
  
  I think one of the more telling things is that Apple declined to invest in them. It was not because they (Apple) were short of cash so it must have been because they just did not see anything remarkable in the product, or anything they could not do at least as well or better themselves. $6B or whatever it was buys a lot of custom silicon AI hardware and electricity...
  
  - Re: (Score:2)
    
    by Big Hairy Gorilla ( 9839972 ) writes:
    
    Keep in mind Apple is the original proprietary model in tech that everyone is trying to be now. Microsoft has branded hardware devices, for instance. Vendor lock in and root access to your device is extremely profitable.
    
    I would likely put Apple's strategy at: We saw what they are doing at OpenAI and we'll just take the idea and develop it ourselves. Who tf wants to pay license fees?!
  - Re: (Score:2)
    
    by Xarius ( 691264 ) writes:
    
    It's much more likely they saw OpenAI's financials and declined on that grounds--they have no path to profitability.
- Re: (Score:2)
  
  by JamesTRexx ( 675890 ) writes:
  
  Seems to me the only times "AI" is truly beneficial is at strictly defined tasks with vetted data. Which proves that garbage in is garbage out, and humans produce loads of it which does not make for good general training data.
  In my experience, when it comes to programming, so far it hasn't produced better results than a decent internet search.
  As for fiction writing, it hasn't produced anything more than the usual plot ideas in a generic fashion, so it hasn't grown beyond 99% of screenwriting including the b
  - Re: (Score:2)
    
    by ebunga ( 95613 ) writes:
    
    To be able to craft the proper input you have to be an expert in the subject domain and the tool, and to make sure it's not generating garbage. To validate the results you have to do actual research to verify any citations, and also be a subject domain expert.
    The only value in genAI right now is for bullshit-heavy jobs like SEO spam and business consultants that speak moon language to begin with.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Yes, that seems to be the main "use": A negative one. As somebody called it, "better crap". Still crap and the last thing the world needs is more of it.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Seems to me the only times "AI" is truly beneficial is at strictly defined tasks with vetted data.
    Maybe. LLMs have a tendency to hallucinate even with good data.
- Re:They never really had one (Score:4, Insightful)
  
  by thegarbz ( 1787294 ) writes: on Friday October 18, 2024 @03:24PM (#64875473)
  
  It really does not matter. All LLMs are crap and will remain crap.
  
  False. LLMs are just a tool. The implementation as random chatbots are crap. On the flip side they are incredibly useful when fed specific information in their training and when they provide specific references to source documents.
  You using only a bunch of shitty ones doesn't make all LLMs crap.
  
  - Re: (Score:2, Insightful)
    
    by gweihir ( 88907 ) writes:
    
    Nope. But is looks like you still do not understand LLMs at all. For example, "hallucinations" do _not_ go away with carefully selected training data.
    - Re: (Score:3)
      
      by thegarbz ( 1787294 ) writes:
      
      Nope. But is looks like you still do not understand LLMs at all. For example, "hallucinations" do _not_ go away with carefully selected training data.
      No, it is you who doesn't understand the application or what I am talking about (presumably because you think all LLMs = ChatGPT). LLMs hallucinating becomes irrelevant when you use LLMs to identify source information. They become nothing more than a natural language search engine - a very good one at that, and one where you never need to take the output of them at face value since it becomes a redirection to source material which by definition can't be a halucination.
      Have a think a bit about what I wrote n
- Re: (Score:3)
  
  by Archibald Buttle ( 536586 ) writes:
  
  An LLM by itself is indeed inherently limited. They are essentially language prediction systems, so their base functionality is to predict what word should come next.
  Couple an LLM with RAG (Retrieval Augmented Generation), an approach for combining an external knowledge store with the LLM's ability to generate responses to statements, and you can enhance LLMs quite effectively. Research in this area has been happening over the past 4 or 5 years.
  A very promising technique that has emerged this year is Grap
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Not a promising approach. Remember CYC? They essentially failed generating that "knowledge graph", despite being at it for a really long time. And the problem has _not_ gotten easier.
    - Re: (Score:2)
      
      by Archibald Buttle ( 536586 ) writes:
      
      the fact that the Cyc organisation "failed" to create a general knowledge graph of all human knowledge does not mean that GraphRAG is "not a promising approach". their efforts predate GraphRAG by a _long_ way, and is not related to modern efforts to augment LLMs
      i also did not say that this is "easy". indeed i explicitly said that it's "not a magic bullet", and that you need a "high quality knowledge graph" that "requires careful work to prepare and curate".
      modern systems using the GraphRAG approach genera
- Re: (Score:2)
  
  by strikethree ( 811449 ) writes:
  
  It really does not matter. All LLMs are crap and will remain crap.
  If you measure LLMs as AIs, then yes, you get to call them crap freely. If you think they are crap in general ... then you are not very smart at using tools.
The Architecture of Annoyance (Score:1)

by Pseudonymous Powers ( 4097097 ) writes:

Great. In that case, can we get a new punchgob in place of this Altman guy? I'm sick of his punchable gob.
Altman responsible (Score:5, Informative)

by Bill, Shooter of Bul ( 629286 ) writes: on Friday October 18, 2024 @01:55PM (#64875217) Journal

First of all, most of the innovation was done by Google*.
Second of all OpenAi has had a huge brain drain.

I think Google did not release on purpose, it wanted someone else to be hit with all the bad publicity from the bad ai results. Now that has happened it can swoop in and collect the accolades and dollars while letting Open Ai get all the blame.

- Re: (Score:3)
  
  by fleeped ( 1945926 ) writes:
  
  it can swoop in
  
  Swoop in with what? Gemini? Have you used it? Dumb as bricks doesn't even begin to describe it's (in)capability. A couple of days ago I thought I'd ask it for ways to get Google Photos saved en-masse locally, and it suggested PICASA! What's next? CorelDRAW for painting? (which still exists apparently)
  - Re: (Score:2)
    
    by fleeped ( 1945926 ) writes:
    
    describe its (in)capability -- damn you apostrophe from hell
- Re: (Score:1)
  
  by dadaz ( 9614606 ) writes:
  
  First of all, most of the innovation was done by Google*. Second of all OpenAi has had a huge brain drain. I think Google did not release on purpose, it wanted someone else to be hit with all the bad publicity from the bad ai results. Now that has happened it can swoop in and collect the accolades and dollars while letting Open Ai get all the blame.
  The Transformer architecture came from Google way before OpenAI did ChatGPT.
  True.
  
  I am not sure Google held back in releasing big LLMs, I believe it is more of Google Research failed to push models to the scale (in terms of parameters and input tokens) that OpenAI did.
  Sometime you observe a ML model and it seems to have hit an asymptote in terms of performance metrics, while if you push forward that, more gains can be achieved.
Next Step (Score:2)

by jmccue ( 834797 ) writes:

I guess the next step investors will finally realize AI is just an empty money pit and will not provide much value.
State of AI (Score:3)

by MpVpRb ( 1423381 ) writes: on Friday October 18, 2024 @02:30PM (#64875319)

While it's likely that future AI will be a very useful tool, and early versions like AlphaFold are already producing results, today's consumer focused AI offerings are just crap generators that produce stuff that appears to be well written, but is in fact, crap. It's kinda like a BS artist, who confidently claims expertise while spewing nonsense

- Re: (Score:2)
  
  by stevez67 ( 2374822 ) writes:
  
  Kinda like the politicians we're saddled with
- Re: (Score:2)
  
  by Tony Isaac ( 1301187 ) writes:
  
  I suppose it depends on what you are asking GPT to do. If you're trying to get it to do your job for you, then you're right, it's crap, it can't do anything close to that. But as an assistant, tools like Copilot and ChatGPT already can save a ton of time. Most notably, these tools can quickly digest large documents and extract data from them (like Google's NotebookLM), they greatly enhance web search, and they streamline research. For example, I recently put together a presentation of software design patter
I remember reading a post (Score:5, Insightful)

by rsilvergun ( 571051 ) writes: on Friday October 18, 2024 @02:38PM (#64875349)

About a Google engineer complaining that he was an expert mathematician spending his days trying to figure out how to get people to click on advertisements.

I wonder how many thousands of hours of incredibly valuable time from highly skilled mathematicians is going to be spent figuring out how to replace workers making 10 to 12 bucks an hour so that money can be pocketed by a handful of Nepo babies and oligarchs.

I sometimes wonder what our species could accomplish if we didn't spend so much time and effort pleasing our kings and queens. I guess we call them CEOs now.

- Re: (Score:2)
  
  by ebunga ( 95613 ) writes:
  
  Wonder if that guy is still alive? Must be pretty depressing knowing the highest achievement you'll ever hit is increasing ad clicks by 0.00000002%
  - Re: (Score:2)
    
    by rsilvergun ( 571051 ) writes:
    
    The shitloads of money he got paid probably helped. But yeah everybody has to pay the bills. Of course you get a lot of public University educators and professors and teachers assistants who live in abject poverty so that they can work on science stuff that eventually becomes incredibly profitable products for other people to make money off of.
    
    I've known guys like that and honestly they don't care about the money they're just completely obsessed with their area of expertise. The problem is you usually
- Re: (Score:3)
  
  by Rinnon ( 1474161 ) writes:
  
  I sometimes wonder what our species could accomplish if we didn't spend so much time and effort pleasing our kings and queens. I guess we call them CEOs now.
  Following along with your analogy, I would suggest that it isn't the CEO that has replaced the monarch... it is the market itself. Which is far more terrifying, because you can't overthrow, imprison, or guillotine the market.
  - The market is just a collection of systems (Score:2, Insightful)
    
    by rsilvergun ( 571051 ) writes:
    
    It's not a real thing. It's an abstraction we use to understand the systems people put in place.
    
    At the end of the day it's the CEOs that actually control and manipulate the market. It is not nor has it ever been nor will it ever be "free". That's a lie the CEOs and other members of the ruling class tell you so that you will leave a power vacuum they can fill.
    
    It's the same reason they tell you not to organize into labor unions and the same reason they tell you both sides are bad when it comes to voti
They still have the lead (Score:3)

by Whateverthisis ( 7004192 ) writes: on Friday October 18, 2024 @03:22PM (#64875469)

I think they still have the lead in wild unrealistic statements, [tomshardware.com]insane predictions [venturebeat.com], and poor financial [reuters.com] planning [cnbc.com].

not suprising as (Score:2)

by Growlley ( 6732614 ) writes:

their entire goal is making the ceo rich
Good (Score:3)

by Rujiel ( 1632063 ) writes: on Friday October 18, 2024 @06:56PM (#64875963)

I hate OpenAI.
I hate their deceptive branding as "open" despite that word having the obvious connotation for open source software, which they do not produce.
I hate that they bring on people like the former head of the NSA to develop something that will make life worse for all of us.
I hate their Worldcoin racket and its pretentious "World" rebrand, as if we needed another company that is confounding to search for and complicates searching for unrelated things.
I hate that they test their biometrics technology (specifically eye scanning) on the third world, exploiting those peoples' desperation for some crypto from a company that will never do anything good for them, nor will anyone who winds up with their data through purchase or a leak.
I hate their normalization of, let's face it, the expectation that your own biometrics are not really "yours" if they are easily acquired in public, and therefore you're merely checking in for some shitcoin.
And then there's Sam Altman himself, who along with others is throwing billions at an Israeli AI startup, while that country uses its AI "Lavender" to make automated decisions about which civilians should be killed.
This company needs to sink into the ocean.

SearchGPT - The FINAL Search Engine (Score:1)

by TheWho79 ( 10289219 ) writes:

Been into search engines since world wide web worm and am some what of a well known seo. SearchGPT by OpenAI is the best search engine I have ever used. I bumped google from my default speed dial about a month ago and am not looking back. Sure, I still go back to Bing for images (they crush Google) and I go back to google for maps and flights, but for pure search, it is rare to go to another search engine. If you don't have access to SearchGPT, I fell sorry for you, because it is the final search engine.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

They never really had one (Score:4, Insightful)

Re: (Score:3)

Re: (Score:2)

Re:They never really had one (Score:5, Interesting)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Venture capital firm says other AI investments are (Score:2)

Re:They never really had one (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:They never really had one (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:They never really had one (Score:4, Insightful)

Re: (Score:2, Insightful)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

The Architecture of Annoyance (Score:1)

Altman responsible (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Next Step (Score:2)

State of AI (Score:3)

Re: (Score:2)

Re: (Score:2)

I remember reading a post (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

The market is just a collection of systems (Score:2, Insightful)

They still have the lead (Score:3)

not suprising as (Score:2)

Good (Score:3)

SearchGPT - The FINAL Search Engine (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals