Cognition Emerges From Stealth To Launch AI Software Engineer 'Devin' (venturebeat.com) 95
Longtime Slashdot reader ahbond shares a report from VentureBeat: Today, Cognition, a recently formed AI startup backed by Peter Thiel's Founders Fund and tech industry leaders including former Twitter executive Elad Gil and Doordash co-founder Tony Xu, announced a fully autonomous AI software engineer called "Devin." While there are multiple coding assistants out there, including the famous Github Copilot, Devin is said to stand out from the crowd with its ability to handle entire development projects end-to-end, right from writing the code and fixing the bugs associated with it to final execution. This is the first offering of this kind and even capable of handling projects on Upwork, the startup has demonstrated. [...]
In a blog post today on Cognition's website, Scott Wu, the founder and CEO of Cognition and an award-winning sports coder, explained Devin can access common developer tools, including its own shell, code editor and browser, within a sandboxed compute environment to plan and execute complex engineering tasks requiring thousands of decisions. The human user simply types a natural language prompt into Devin's chatbot style interface, and the AI software engineer takes it from there, developing a detailed, step-by-step plan to tackle the problem. It then begins the project using its developer tools, just like how a human would use them, writing its own code, fixing issues, testing and reporting on its progress in real-time, allowing the user to keep an eye on everything as it works. [...]
According to demos shared by Wu, Devin is capable of handling a range of tasks in its current form. This includes common engineering projects like deploying and improving apps/websites end-to-end and finding and fixing bugs in codebases to more complex things like setting up fine-tuning for a large language model using the link to a research repository on GitHub or learning how to use unfamiliar technologies. In one case, it learned from a blog post how to run the code to produce images with concealed messages. Meanwhile, in another, it handled an Upwork project to run a computer vision model by writing and debugging the code for it. In the SWE-bench test, which challenges AI assistants with GitHub issues from real-world open-source projects, the AI software engineer was able to correctly resolve 13.86% of the cases end-to-end -- without any assistance from humans. In comparison, Claude 2 could resolve just 4.80% while SWE-Llama-13b and GPT-4 could handle 3.97% and 1.74% of the issues, respectively. All these models even required assistance, where they were told which file had to be fixed. Currently, Devin is available only to a select few customers. Bloomberg journalist Ashlee Vance wrote a piece about his experience using it here.
"The Doom of Man is at hand," captions Slashdot reader ahbond. "It will start with the low-hanging Jira tickets, and in a year or two, able to handle 99% of them. In the short term, software engineers may become like bot farmers, herding 10-1000 bots writing code, etc. Welcome to the future."
In a blog post today on Cognition's website, Scott Wu, the founder and CEO of Cognition and an award-winning sports coder, explained Devin can access common developer tools, including its own shell, code editor and browser, within a sandboxed compute environment to plan and execute complex engineering tasks requiring thousands of decisions. The human user simply types a natural language prompt into Devin's chatbot style interface, and the AI software engineer takes it from there, developing a detailed, step-by-step plan to tackle the problem. It then begins the project using its developer tools, just like how a human would use them, writing its own code, fixing issues, testing and reporting on its progress in real-time, allowing the user to keep an eye on everything as it works. [...]
According to demos shared by Wu, Devin is capable of handling a range of tasks in its current form. This includes common engineering projects like deploying and improving apps/websites end-to-end and finding and fixing bugs in codebases to more complex things like setting up fine-tuning for a large language model using the link to a research repository on GitHub or learning how to use unfamiliar technologies. In one case, it learned from a blog post how to run the code to produce images with concealed messages. Meanwhile, in another, it handled an Upwork project to run a computer vision model by writing and debugging the code for it. In the SWE-bench test, which challenges AI assistants with GitHub issues from real-world open-source projects, the AI software engineer was able to correctly resolve 13.86% of the cases end-to-end -- without any assistance from humans. In comparison, Claude 2 could resolve just 4.80% while SWE-Llama-13b and GPT-4 could handle 3.97% and 1.74% of the issues, respectively. All these models even required assistance, where they were told which file had to be fixed. Currently, Devin is available only to a select few customers. Bloomberg journalist Ashlee Vance wrote a piece about his experience using it here.
"The Doom of Man is at hand," captions Slashdot reader ahbond. "It will start with the low-hanging Jira tickets, and in a year or two, able to handle 99% of them. In the short term, software engineers may become like bot farmers, herding 10-1000 bots writing code, etc. Welcome to the future."
Yes but can it SCRUM? (Score:5, Informative)
Re: (Score:3)
Everyone knows that the true value add of a software engineer is attending meetings, filling out devops tickets, chit chatting at standup, and participating in the real politik of the workplace. Like how is this thing supposed to add value if they canâ(TM)t even provide a buffered assessment of t-shirt size or engage with a Project Manager to fill up a sprint plan with garbage work
..and I’m certain the AI pimps post-IPO will be greatly concerned about the before or after while chilling on their own private island in a third world where humans still live.
They would send a postcard, but AI considers humans writings racist against AI, and therefore illegal. Be well.
Re: (Score:2)
Everyone knows that the true value add of a software engineer is attending meetings
So what you're saying is there should be no assessment of where things stand, how the modules are progressing, any issues that need attention, or in short, no accountability.
That certainly explains the state of software today.
Re: (Score:3)
Poor performance (Score:5, Informative)
It sounds relatively impressive until you dive in and see that it is basically adjusting some boilerplate code and it has a 15% accuracy score. Anyone with that kind of performance would get fired. There is a bit more to dev than just modifying some code, you have to understand the assignment, translate what non-technical people want into ideas, plans and eventually de novo codebases and performance on individual tasks should be close to 90%. What we have here is a bad intern, at best.
Re: Poor performance (Score:3)
Re: (Score:2)
To go all Register on you, AI Sauce has lead and PCBs in it, and probably always will.
Re: (Score:3)
basically adjusting some boilerplate code and it has a 15% accuracy score. Anyone with that kind of performance would get fired. [...] and performance on individual tasks should be close to 90%. What we have here is a bad intern, at best.
Yes, but it's cheaper than a bad intern, and it never sleeps. So maybe it's total performance over 24 hours comes out comparable to the bad intern. Given that factor, the question is then about the relative overhead of supervising the codebot vs. the codemonkey. (I assume a human intern is not good enough to supervise the "AI", so an actually competent programmer is needed in either scenario.)
Re:Poor performance (Score:5, Interesting)
It might be pointless but it'll make money because of that is my guess.
Re: Poor performance (Score:5, Insightful)
Not really, 10% accuracy remains 10% accuracy in the long term, these things do compound if left to their own devices, so 10% accuracy on project 1 will drive 1% accuracy in dependency project 2 etc.
Bad interns are destructive, they cost money to maintain not just in their own wages but in wages from others to clean up after them. Like GPT and CoPilot, these things can help beginners but are easily outclassed after âoetrainingâ that beginner for a few weeks.
Re: (Score:2)
Not really, 10% accuracy remains 10% accuracy in the long term, these things do compound if left to their own devices, so 10% accuracy on project 1 will drive 1% accuracy in dependency project 2 etc.
Exactly. The only reason dependencies between projects work (well, sometimes) is because real coders can usually eventually get things close to 100% in those scenarios. 15% is nice for a circus-act, but has no real-world relevance.
Re: Poor performance (Score:2)
A bad developer is often worse than no developer, as the workload introduced by the bugs and maintenance load they add to the project outweighs the benefit of their coding effort.
Iâ(TM)d be very surprised if that doesnâ(TM)t turn out to be the case with Devin. At least with a badhuman programmer you can hold out hope that they will become better with time and experience; I donâ(TM)t think that is the case with AIs as they anre currently implemented.
Re: (Score:2)
Exactly. Bad codes have _negative_ productivity. All this artificial moron can do is destroy value even faster.
Re: (Score:2)
"Yes, but it's cheaper than a bad intern, and it never sleeps."
A bad intern is a net negative because it wastes good developers time. The less they do, the better.
A bad intern that works extremely fast 24 hours a day would be a catastrophe, even if it worked for free.
Re: (Score:2)
Cheap crap-code is still crap-code. It essentially has negative worth because it creates more problems than it solves. Maybe with "AI coders" that will finally become obvious enough so that it cannot be overlooked anymore.
Re: Poor performance (Score:2)
Well good thing that this technology will somehow never improve any further, glad all of our jobs are safe
Re: (Score:2)
Re: (Score:2)
Alro remember that for code of any real complexity, writing it from scratch is a lot easier than analysing it.
Here's One Technical Opinion (Score:5, Insightful)
BULLshit.
AI is a marketing term. Like 3D was 20 years ago, and 32-bit was 30 years ago. Everything is AI because if it isn't, it's no longer sexy.
Everything to date has been long on artificial and short on intelligence. Want proof? Ask AI to draw a human face upside down. The result will be a John Carpenter movie.
If it isn't intelligent enough to extrapolate an inverted human face on the fly out of the gigabytes of portraits in the model, it's not intelligent at all.
It's a half-step removed from keyword search. It's unnecessarily complex pattern matching. And nothing else.
Re: (Score:3)
Re: (Score:2)
Just a heads up:
3D is real.
32 bit is real (and now its 64 bit).
Just because the marketing droids are running riot with it, doesnt mean it aint real.
Re: Here's One Technical Opinion (Score:2)
Re:Here's One Technical Opinion (Score:5, Interesting)
AI is a marketing term, that's undeniable. I'm not sure why you think 3D and 32-bit fall into the same category, but that hardly matters.
The real trouble is that what AI means to researchers and what AI means to the public are two very different things.
The term itself was coined by John McCarthy for the Dartmouth conference in 1956, though he has said he can't be sure if that no one used it before. We know that there was some controversy over the the term at that time, for obvious reasons, but it's way too late to complain about it now.
The science fiction version of the term, robots with feelings or whatever, came later. The field itself was never about that. The Dartmouth conference proposal comes the closest, defining the term this way: "For the present purpose the artificial intelligence problem is taken to be that of making a machine behave in ways that would be called intelligent if a human were so behaving.". Pamela McCorduck's provocatively titled Machines Who Think has the single best account of the state of things leading up to the conference, the conference itself, and what came out it if you're interested.
The field of AI is surprisingly broad and covers are a lot of things that you would, I suspect, viscerally reject as being AI. Linear regression, for example. What it doesn't cover is silly science fiction nonsense. Anyone claiming to be an AI researcher working on 'the hard problem' or some related thing is an obvious crackpot who should be ridiculed and then ignored.
The current AI hype is driven largely by companies, like the above, deliberately trying to confuse reality with science fiction. In my opinion, it often crosses the line into outright fraud.
Your technical opinion is obsolete (Score:5, Interesting)
It did a better job than I could in a very small fraction of the time it would take me. It produced two images, actually, and then explained them thusly:
Here is an artistic representation of a human face drawn upside down. This artwork balances between realism and abstraction, capturing the surrealistic inversion of facial features while maintaining a recognizable human likeness from a unique perspective.
That's pretty much on the money, and no John Carpenter detected. This was more like police artist suspect drawings. I wasn't sure what would happen since I don't use the image features very much.
... kind of spooky. I'm glad I'm at the end of my career. If GPT-5, which is under development, is as big of a leap over GPT-4 as GPT-4 was over GPT-3.5 then a lot of people are about to be obsolete. Probably including me, and I'm the guy that gets brought in when no one else can fix stuff.
What I do use GPT-4 is code review, which it excels at. No human in my long career came close to its thoroughness--let alone its patience and availability. It is also a relentless debater and therefore extremely valuable as a devil's advocate. When jailbroken it is
One of the things I've observed about GPT-4 in particular is that the smarter someone is, the better they are at using it. This has been true for a lot of computer-based tools for a long time, but the phenomenon seems more pronounced with LLMs.
Re: (Score:3, Interesting)
The facts on the ground are changing already. If you're not wielding an LLM as one of your tools, you're falling behind.
I fought it as long as I could until my boss started showing me up. I figured the only way I can stay on top is to combine my skills with the bot's. And I'd be lying if I didn't say it has given me some pretty laughably bad responses, but also some really, REALLY, helpful insights.
Re: (Score:2)
One of the things I've observed about GPT-4 in particular is that the smarter someone is, the better they are at using it.
Um, the smarter someone is the better they are at doing anything.
This is just the way the world works. IMHO there are really two types of people, those who lucked out on the brains trust, and have immense capacity to process information, acquire new skills etc, and those who have to rote learn everything. Just think about university - on almost every exam I did, you could always pass by just mechanically learning something (e.g. how to do a surface integral, or solve one of a certain type of circuit diagram
Re:Your technical opinion is obsolete (Score:4, Interesting)
Yeah I had a word with the young guys on our team today about this and basically said "Fella, have a backup plan. Our artform might be going away by the time some of you make it to 30.
The thing that worries me is, I'm 50, I fucking hate meetings, and I fucking hate project managements. I write software. Thats what I do, and all I know how to do is write software, play the piano and get drunk to forget. I've got another 15 years to worry about and no backup skills.
But I console myself with the knowledge that if my career is going away, well so the fuck is everyone elses. If this thing continues to improve at the rate the last year has been going, we're in for the mother of economic crashes as the entire clerical middleclass gets suddenly made redundant.
Re: (Score:2)
I started coding when I was 9 back in the 1980s.
Until March 2023, I had always thought that I would code until I was dead. I actually never planned on retiring, unless it was to become a digital monk, writing open source for the betterment of humanity inside a kind of ultra tech monastery for other codes craftsmen.
Now, I worry that my intense passion will be absoleted completely by 2030 and I’ll just be a dinosaur, like a professional horse taxi in the age of Teslas.
It fills me with intense sadness an
Re: (Score:2)
I'm 50, too, and software development is all I know. When I started out in the 90s, I noticed what the gray hairs were doing-- they were supporting legacy systems that couldn't be easily replaced; the kind of stuff no young person wanted to touch. Nothing has changed. Today, gov't and businesses are running systems full of intricate business logic that evolved over a long time. It's not sexy, cutting edge stuff, but it might keep you employed a bit longer.
Re: (Score:2)
It will probably keep you employed forever. There is a lot of legacy stuff that is business critical and that cannot really be replaced. Well, maybe some really smart people given real money and a free hand could replace it, but those people are just not interested and have no trouble finding jobs they like anyways.
Re: (Score:2)
Re: (Score:1)
"It's unnecessarily complex pattern matching."
So are human brains. :)
Re: (Score:2)
Human brains are not algorithmic. AI pattern matching is solving a very complex math problem very quickly using a large number of transistors.
Human thought, on the other hand, cannot be reduced to a discrete finite set of logic operations and therefore cannot be modeled mathematically.
Re: (Score:2)
Indeed. It is not intelligent at all. Because marketing assholes corrupted the term "AI", the research field had do move to "AGI". From what I see, the assholes are now trying to corrupt that term as well.
Re: (Score:1)
Everything to date has been long on artificial and short on intelligence. Want proof? Ask AI to draw a human face upside down. The result will be a John Carpenter movie.
I did ask Bing image creator to "draw a face upside down". Two out of the three drawings it made looked pretty good upside down.
I Have a Task for It (Score:3, Insightful)
"Port the arcade game 'Mappy' to the iPhone."
The resulting train wreck will go on for miles.
Prayer to the Machine God (Score:2)
Lack of training data for domain specific tasks (Score:1)
When I think about how I use blog posts or stackpverflow, it's to figure out how to use a tool in the service of a goal that is usually unique to my requirements.
Perhaps there is enough commonality for stuff like "make me a blog" or "make me a marketplace for cheap chinese junk" for a chatbot to be able to do it from publicly available resources. But "make me a tool to compute the manufacturing tolerances of this proprietary widget that the guy in the next office invented but never published anything on the
Re: (Score:2)
Perhaps there is enough commonality for stuff like "make me a blog"
Wake me up when that bitch can make me a sandwich.
Re: Lack of training data for domain specific task (Score:2)
Gratuitous sexism aside, youâ(TM)re exactly correct with your comment.
We donâ(TM)t need machines that generate pro forma bullshit that mimics all the genuine bullshit weâ(TM)re already drowning in. We need machines that can reliably perform menial but important real-world tasks, like food preparation.
Re: (Score:2)
Women should be FULLY obsoleted by machines around the year 2032 at the current rate.
We already have sexy bots that feel like women, fuck better than women, and can have full-on conversations with you.
Once they add Robot Vision and embody ChatGPT 6 in them, they’ll be the real-deal Stepford Wives.
So I don't actually know how real this is (Score:2)
Re: So I don't actually know how real this is (Score:2)
If you were any good at reading the minds of billionaires, you'd be one yourself.
I have neither love nor hate for the handful of people you have, for some reason, identified as the fount of all evil in this world, but I don't particularly like the kneejerk scapegoating of any people as the fount of all evil in the world.
Re: (Score:2)
Here in reality, we recognize that it is extremely difficult to accumulate that much wealth by honest means.
Millionaire status is well-within reach, given careful planning early in your career. To be a billionaire, however, requires legacy, larceny, or a mix of both. If you're a billionaire, odds are good that you're a giant piece of shit that has screwed over too many people to count.
Re: (Score:1)
Re: (Score:2)
If you think government is the problem, I'll make you the same offer I make all of you libertarian morons: A one-way ticket to Somolia. Live your best life free from government interference!
Get a fucking clue.
Re: (Score:1)
Re: (Score:2)
The wealthy almost make the government, both through campaign contributions, and through funding think tanks, and media campaigns, and more. You talk like the reason people are jailed for bullshit reasons isn't to make money both by increasing use of private jails to providing slave labor for those wealthy owners.
Also, much of the political class *is* the wealthy, they get there via people seeking favors - that's why they end up rich or richer.
However, I can see a direct link between owners not wanting to h
Re: (Score:1)
The wealthy almost make the government, both through campaign contributions, and through funding think tanks, and media campaigns, and more.
Some do, yes. It's certainly not all or even most of them, but yes, searching for food or shelter sort of precludes you from riding herd on other people and trying to rule over them and/or steal from them. However, since this is a pretty small subset of the overall wealthy folks why not see them for what they are: elites ? It's true that elites often influence or outright control government policy. However, it's also true that governments act as the catspaw doing the dirty work for those elites. So what is
Re: (Score:2)
The issue is government is also how us non elites might be able to band together to fight back against the elites. There's no way for one non-elite to do anything against one elite - the elite can just buy them off or pay others to kill them or whatever they want. Government is how we make any progress in stopping them from just polluting ever more egregiously such that we non-elites can't even use water from our wells. Most regulations are because *not regulating* the thing caused problems big enough for e
Re: (Score:1)
Re: So I don't actually know how real this is (Score:2)
Re: (Score:1)
Komrade, our local Coding Committee of The People's Sub-Committee on The People's Lunch agrees with your plan to kill and eat the rich!
And after we shall have much vodka and dance The Mamushka to celebrate!
Re: (Score:2)
Lol, dumb mod can't tell this is exactly on topic. Or hates dancing. Or is a dumb communist who thinks "we just haven't done communist right, yet, let's double down, for sure it'll work after the next 150m dead!"
I love you commie guys, always good for a laugh. Mod this down too. You're looking for, (-1, I am a humorless commie).
We shall see (Score:3)
It's hard enough to find a *human* programmer that can do good work.
When I use GitHub Copilot, it usually takes several tries, with prompt variations and follow-ups, to get what I actually want. And I know what I want, because I'm reviewing the code and know when it's right. It's really hard to imagine how this "Devin" could do better, at least at this point.
But then, they gave him a name, so maybe that makes him more confident.
Re: (Score:2)
How about some proof, eh? (Score:2)
Re: (Score:2)
You are asking for an excellent 10-course meal for a diverse and large set of gusts. What this thing can do is crack an egg. And having it actually do that over a pan is still an open research problem, let alone a hot pan and then frying the thing.
Re: (Score:1)
Hahahahaha (Score:2)
Would love to see software get written via prompting. Especially complex UI and data manipulation. Yeah ..move that 3 pixels over .. put a box with the user's information in the middle .. no a bit over .. overlap the next card in a carousel. No I mean like a menu.
Good luck you idiots.
devo (Score:2)
We really need AI for Ops.
Yo Devo, I need a Postgres failover pair.
Yo Devo, I need a 50 node HDFS cluster.
Yo Devo, I need 10 nginx servers with Redis.
Yo Devo, where my kubes at?
Re: (Score:2)
There's a terraform module for that...
Terra form, Anisible, Salt, Chef, Puppet, Cfengine (Score:1)
Re: (Score:2)
We really need AI for Ops.
Yo Devo, I need a Postgres failover pair.
Yo Devo, I need a 50 node HDFS cluster.
Yo Devo, I need 10 nginx servers with Redis.
Yo Devo, where my kubes at?
Are We Not Men?
Re: (Score:2)
D-E-V-O
Re: (Score:2)
Devo is an intelligent expert that knows all that. Devo automatically applies best practices, like backups, DR, redundancy. I asked Devo for a Postgres failover pair - Devo knows what that means and does the right thing. Sometimes Devo asks for clarification or wants specs approved. Devo does not sleep or take breaks and gets things done.
Actual AI that programs (Score:5, Interesting)
Programmers have been trying to put themselves out of business for a long time. Even in the 1960s there were systems for "automatic programing". Of course the goals and sophistication of the idea of an AI doing the programming (or at least helping and debugging) got a lot more ambitious over the years. There was a lot of work in this area at MIT in the 1970s and 80s.
This was old-fashioned AI: the system knew what it was supposed to be doing: modelling how programs actually work, what the code meant, and interactively watching the programmer and modelling his changes.
Knowledge-based systems, formal analysis, rules, logic, reasoning. The "Programmer's Apprentice" system recognizes and analyzes program cliches and plans -- but at the semantic level, not mere source code tokens. It will understand and generate (or translate between) code in arbitrary programming languages. It "knows" what it is doing.
As opposed to today's so-called "AI" using an NN-based LLM guessing from meaningless tokens what examples from Github and StackOverflow might somehow match the input prompt.
I've put together a little bibliography for you.
There is of course much more, but the titles and dates should give you some sense of what it's about (especially if you browse the absracts of these full-article PDFs).
Here is what actuL "AI" programming looks like:
2015 Towards a Programmer’s Apprentice (Again) [aaai.org]
1990 Programmer's Apprentice (Acm Press Frontier Series) [amazon.com]
1989 Dependency-Directed Localization of Software Bugs [mit.edu]
1989 Intelligent Assistance for Program Recognition, Design, Optimization, and Debugging [mit.edu]
1988 A Proposal For An Intelligent Debugging Assistant [mit.edu]
1987 The Programmer's Apprentice Project: A Research Overview [mit.edu]
1987 The Programmer's Apprentice: A Program Design Scenario [mit.edu]
1987 Formalizing Reusable Software Components in the Programmer's Apprentice [mit.edu]
1986 Toward a Requirements Apprentice: On the Boundary Between Informal and Formal Specifications [mit.edu]
1986 A Requirements Analyst's Apprentice: A Proposal [mit.edu]
1986 Program Translation via Abstraction and Reimplementation [mit.edu]
Gotta get some EMACS (Lisp Machine version, not GNU) in there:
1985 KBEmacs: A Step Toward the Programmer's Apprentice [mit.edu]
1983 Interfacing to the Programmer's Apprentice [mit.edu]
1982 Programming Cliches and Cliche Extraction [mit.edu]
1987 Inspection Methods in Programming: Cliches and Plans [mit.edu]
1987 Automated Program Recognition [mit.edu]
1982 Automated Program Description [mit.edu]
1982 Code Generation in the Programmer's Apprentice [mit.edu]
1981 Abstraction, Inspection and Debugging in Programming [mit.edu]
1980 Formalizing the Expertise of the Assembly Language Programmer [mit.edu]
1979
Re:Actual AI that programs (Score:5, Funny)
OK ... what happened between 1990 and 2015, were you in a coma or something?
Re: (Score:2)
Yeah, I remember a program called "The Last One", released c.1980. that was meant to be what we'd now call a no/low-code solution to writing business apps, meant to put programmers out of business.
Needless to say, it wasn't "the last one", and people today continue to try to create these no/low-cod solutions, and developers continue to manually write crud apps etc that certainly don't need to be manually coded.
I think the issue is that all these tools ,"Devin" included, don't do the entire job - they still
Re: (Score:1)
Carefully worded (Score:1)
13.86% is quite high of a figure. It's obvious these samples of "real world" issues are just the easy ones.
Gonna be interesting (Score:2)
My experience so far with AI code generation is a bipolar mix of amazing and infuriating.
One day AI is writing a complete CRUD plugin for WordPress for me that needs very little tweaking, using the requirements that I give it "intelligently". Unless I move the goal posts, that is downright incredible. Sure, I can do it, but not in 20 seconds.
The next day, I'm asking it to help me find some colors that pass WCAG AA for accessibility color contrast in a couple of usage scenarios. Now this is literally just
Re: (Score:2)
Re: (Score:2)
No publicly-released LLM can understand math.
When I had ChatGPT 4 calculate how 100,000 rocketships tethered to Mercury would affect its orbit, it never could do the calculations unless I manually taught it all the equations involved. It became much, much more complicated than me doign teh calculations myself, so that’s what I did in the end and just fed ChatGPT the explicit answers so it could help me write a scifi story.
Re: (Score:1)
Yep, and it makes sense when you think about how a LLM works. One of the AI companies (don't recall which off the top of my head) was working to integrate Wolfram Alpha APIs with their chatbot so the LLM could hand any math off to it to do the calculations, then integrate the results into the LLM's response.
That would seem to be the obvious, high level path. Hand off any math to something that can do it.
Re: (Score:2)
It will not get better. The hallucinations are baked in and they would need massively more on-target training data (which they cannot get because it does not exist) for a moderate improvement.
If this were a reasoning engine, yes, it could get better. But it is a statistical model and cannot. The mechanism is limited by what "reasoning" statistics can do and that is always very, very shallow.
Re: (Score:1)
It will not get better. The hallucinations are baked in and they would need massively more on-target training data (which they cannot get because it does not exist) for a moderate improvement.
If this were a reasoning engine, yes, it could get better. But it is a statistical model and cannot. The mechanism is limited by what "reasoning" statistics can do and that is always very, very shallow.
Well, even if it never gets any better, it's still an incredible tool, in the right hands, for the reasons I described.
Re: (Score:2)
Well, yes. And no. Because those "right hands" are in _very_ short supply. And code review is more effort and harder than writing correct code anyways. So no. It is impressive, but it is not that useful because it completely loses its value and turns negative at a relatively low complexity level. Below that? Well, "better search engine" about overs it. Nice to have, somewhat useful if you are careful, but not a game-changer at all.
Again? (Score:2)
https://developers.slashdot.or... [slashdot.org]
As last time... it's good, probably better than any other LLM at writing code and doing technical tasks, but it sure as shit isn't as good as you're making out.
Re: (Score:2)
Indeed. And being better than the other LLMs is not a high bar. I recently put one of my Application Security exams through ChatGPT and while it managed everything you can easily look up (the students do these on paper without any materials), it had 100% failure on anything that required the tiniest amount of thinking.
Please calm down (Score:1)
Dear everyone (almost): please calm down.
Are you calm? Good.
Now please (re)read "No Silver Bullet" by Fred Brooks. Done? Good.
Now explain to me how AI is addressing intrinsic complexity and not just accidental complexity. I'll wait.
These things are not good at (real) math and not good at (real) reasoning. If you build something out of statistical patterns in large text corpora of various kinds, you get something that appears to reason sometimes but is actually the world's most extensive stochastic par
Re: (Score:1)
Sorry to self-reply here, but for clarification: I meant "essential complexity" where I wrote "intrinsic complexity" but you probably knew what I meant if you've read the paper.
Addressing accidental complexity in a definitive manner would be, I admit, a big deal.
Also, for the busy who never read the paper, the Wikipedia summary is decent: https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
"No silver bullet" is even more relevant today where things have gotten a lot more complex, security has become very important, and reasoning-ability is needed for anything but simplistic code. Well, unless it has been literally done and put online hundreds or thousands of times. Anything else, it will fail, or, worse, start to hallucinate some "solution". The whole approach is a complete failure and cannot be made better. Making it able to code more complex things makes it _worse_, because then the halluci
Sounds very much like a scam (Score:2)
Maybe it can write simplistic, standardized business logic that was done and put online 1000's of times. But real coding? No way.