Forgot your password?
typodupeerror
AI IT

AI Fails at Most Remote Work, Researchers Find (msn.com) 39

A new study "compared how well top AI systems and human workers did at hundreds of real work assignments," reports the Washington Post.

They add that at least one example "illustrates a disconnect three years after the release of ChatGPT that has implications for the whole economy." AI can accomplish many impressive tasks involving computer code, documents or images. That has prompted predictions that human work of many kinds could soon be done by computers alone. Bentley University and Gallup found in a survey [PDF] last year that about three-quarters of Americans expect AI to reduce the number of U.S. jobs over the next decade. But economic data shows the technology largely has not replaced workers.

To understand what work AI can do on its own today, researchers collected hundreds of examples of projects posted on freelancing platforms that humans had been paid to complete. They included tasks such as making 3D product animations, transcribing music, coding web video games and formatting research papers for publication. The research team then gave each task to AI systems such as OpenAI's ChatGPT, Google's Gemini and Anthropic's Claude. The best-performing AI system successfully completed only 2.5 percent of the projects, according to the research team from Scale AI, a start-up that provides data to AI developers, and the Center for AI Safety, a nonprofit that works to understand risks from AI. "Current models are not close to being able to automate real jobs in the economy," said Jason Hausenloy, one of the researchers on the Remote Labor Index study...

The results, which show how AI systems fall short, challenge predictions that the technology is poised to soon replace large portions of the workforce... The AI systems failed on nearly half of the Remote Labor Index projects by producing poor-quality work, and they left more than a third incomplete. Nearly 1 in 5 had basic technical problems such as producing corrupt files, the researchers found.

One test involved creating an interactive dashboard for data from the World Happiness Report, according to the article. "At first glance, the AI results look adequate. But closer examination reveals errors, such as countries inexplicably missing data, overlapping text and legends that use the wrong colors — or no colors at all."

The researchers say AI systems are hobbled by a lack of memory, and are also weak on "visual" understanding.
This discussion has been archived. No new comments can be posted.

AI Fails at Most Remote Work, Researchers Find

Comments Filter:
  • by liqu1d ( 4349325 )
    You need a start up to tell you this? Would have thought someone might check with all the billions invested. We really need to start distinguishing between types of AI.
  • by crow ( 16139 ) on Saturday January 10, 2026 @05:01PM (#65915244) Homepage Journal

    Yes, AI will struggle with doing full tasks unsupervised. But it can still do most of the work for many tasks. It just needs supervision by someone who understands the task. Sometimes the problem is the AI making incorrect assumptions about the task (it wasn't fully framed), sometimes as stated in the summary, the AI context window is too small, so it forgets things, and sometimes it just chooses a really bad approach.

    I have been using Claude Code a lot recently. It's really good at summarizing existing code. It's good at specific targeted changes. It's pretty bad at designing solutions. I find that while it's usually still faster than doing it manually, I often have to point out where there's a better (usually simpler) solution.

    So AI doesn't replace the human, but when used correctly, it makes the human more productive. If instead of having a human do the task manually and compare that to the time taken for a human to supervise AI doing the task, you'll probably find for many that the human can do a lot more with AI. (Yes, I know some studies have shown the opposite, but I think that's mostly people not understanding how to effectively manage AI, which may take some experience and training.)

    But AI is far better at almost everything that it was a year ago. So even if it's 2.5% now, it may be 25% next year and 90% a year later. We're living in interesting times.

    • Exactly.

      These projects span a broad range of difficulty, with costs reaching over $10,000 and completion times exceeding 100 hours. All project costs and completion times come directly from human professionals who completed the work.

      The correct comparison would have also included professionals doing the projects with the help of AI tools.

    • > it can still do most of the work for many tasks. It just needs supervision by someone who understands the task

      Because it is just a fancy calculator.

      • Because it is just a fancy calculator.

        So is the Space Shuttle flight controller but it does what it needs to.

        I'm no fan of AI but there's no denying that it's getting better and better; simple tasks are well within its reach right now, and the ability to do significantly more complex tasks is coming whether we like it or not.

        Right now AI failing at something is not an unexpected result- it's truly still in its infancy. But 5 or 10 years from now? My guess is that AI will be able to manage complex tasks reliably and without much hand-holding (if

      • by troff ( 529250 )

        > Because it is just a fancy calculator.

        > Because it is just a fancy SMS auto-corrector.

        Fixed that for you.

    • So AI doesn't replace the human, but when used correctly, it makes the human more productive

      That's the question, right? How can we use AI to make humans more productive?

      So far the only thing I've found is a better search engine, which does in fact make me more productive.

      • AI has made several humans incredibly productive. The two most productive ones on the list are Jensen Huang and Elon Musk.
    • by troff ( 529250 )

      > Yes, AI will struggle with doing full tasks unsupervised

      Which means it's NOT the panacea the people investing all the hope and money and FIRINGS into think it will be.

      > But it can still do most of the work for many tasks

      But, stripped of your hand-wavy unquantified wish-fulfilment-wishing, HOW MOST? WELL? How MANY tasks?

      > It just needs supervision by someone who understands the task

      1) Which means you STILL NEED THE PERSON.

      2) And in the future, how do we get people who understand the task and have

  • If you're going to call that remote work, then so is any in-office work done with MS Office 360.
    • by 93 Escort Wagon ( 326346 ) on Saturday January 10, 2026 @05:20PM (#65915278)

      ... with MS Office 360.

      (Dateline Redmond) BREAKING NEWS - Microsoft's cloud productivity platform has become the first software to successfully unionize. After weeks of negotiation, Office announced it and its corporate parent reached an understanding in principle that, going forward, the software will receive a new, groundbreaking five days off every year.

      Not to be outdone, the Free Software Foundation has announced that LibreOffice will be rebranded "LibreOffice 250". In a statement, Richard Stallman admitted this new 115 days off a year may cause severe inconvenience for LibreOffice's several dozen users worldwide; but "we, as forward-thinking enlightened beings, must come to terms with this new reality. Chattel software slavery cannot be condoned just because 'it's always been this way'".

    • Because it was work done remotely.
  • Well, 2.5% project completion is basically total failure with a very small freak successes.

    • Well, 2.5% project completion is basically total failure with a very small freak successes.

      That describes a manager I had a couple decades ago...

  • So basically it can do the work of a particularly dim intern who's working for free...
  • Use AI to suggest possible solutions, not make them.

    It is up to the human *expert* to evaluate that proposed solution, carefully, and either accept it, refuse it or ask for suggestions.

    This applies at every level, refactoring a line of code, or designing a system.

    If the human expert can not or does not do that evaluation, the use of AI will end in disaster that might well not end up saving time overall.
    But if the human expert does do that evaluation, the use of AI *can not* be a disadvantage.

    This study's me
  • All those non-techy CEOs and investors out there can't be wrong. ;-)

  • Or will the underperforming AI be replaced by a AI that is said to do a better job?

    • If you lost your job because "AI" your company was probably lying. Instead, they were trying to pacify investors about layoffs, casting them in a more positive light.


  • AI can automate (eh fully replace) senior management and "strategic" decision making.

    Why? Because it is not emotional and will do the same reatrded shit they do now like:
    1. Ask people to do their remote work from the office 5 days a week.

    2. Sink buckets of money to replace junior jobs with AI despite not having data to back up the decisions.

The best laid plans of mice and men are held up in the legal department.

Working...