Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AI Technology

ChatGPT Generates Fake Data Set To Support Scientific Hypothesis (nature.com) 41

Researchers have used the technology behind the AI chatbot ChatGPT to create a fake clinical-trial data set to support an unverified scientific claim. From a report: In a paper published in JAMA Ophthalmology on 9 November, the authors used GPT-4 -- the latest version of the large language model on which ChatGPT runs -- paired with Advanced Data Analysis (ADA), a model that incorporates the programming language Python and can perform statistical analysis and create data visualizations. The AI-generated data compared the outcomes of two surgical procedures and indicated -- wrongly -- that one treatment is better than the other.

"Our aim was to highlight that, in a few minutes, you can create a data set that is not supported by real original data, and it is also opposite or in the other direction compared to the evidence that are available," says study co-author Giuseppe Giannaccare, an eye surgeon at the University of Cagliari in Italy. The ability of AI to fabricate convincing data adds to concern among researchers and journal editors about research integrity. "It was one thing that generative AI could be used to generate texts that would not be detectable using plagiarism software, but the capacity to create fake but realistic data sets is a next level of worry," says Elisabeth Bik, a microbiologist and independent research-integrity consultant in San Francisco, California. "It will make it very easy for any researcher or group of researchers to create fake measurements on non-existent patients, fake answers to questionnaires or to generate a large data set on animal experiments."

This discussion has been archived. No new comments can be posted.

ChatGPT Generates Fake Data Set To Support Scientific Hypothesis

Comments Filter:
  • Comment removed based on user account deletion
    • by Darinbob ( 1142669 ) on Thursday November 23, 2023 @01:36PM (#64027211)

      AI will change the world, but it's just a catalyst that accelerates the chemical reaction of existing idiocy and ignorance in society.

      • by dargaud ( 518470 )
        Yes. Even when you use it seriously it will replace real science in many cases. For instance, in my former field (climatology/weather forecast), you don't even need to know PV=nRT or any physics anymore. You just feed all your archive data to the AI, all the temp, pressure, humidity and etc measurements, give it the weather a hour / day / week after, and ask it to reproduce. The forecasts are better than with hugely complex science models. And much faster too. But of course nobody understands them. That's a
    • Lying is a sign of intelligence.
      • by narcc ( 412956 ) on Thursday November 23, 2023 @08:10PM (#64027835) Journal

        It's not lying. That would imply understanding, something well beyond the reach of models like this. The problem the article highlights, however, is absolutely real, but hardly something new:

        the capacity to create fake but realistic data sets is a next level of worry

        The problem with this statement is that generating data that is statistically similar to other data is exactly what these things are designed to do! From n-grams to transformers, that's the whole game.

  • Turing (Score:4, Interesting)

    by grasshoppa ( 657393 ) on Thursday November 23, 2023 @12:13PM (#64027061) Homepage

    So it's indistinguishable from "real" scientists.

    Turing test passed?

    • by HiThere ( 15173 )

      No. In the Turing Test the questioner got to pick the questions they asked. Including the topics.

    • by rsilvergun ( 571051 ) on Thursday November 23, 2023 @01:05PM (#64027149)
      in peer review. It's annoying that political pundits will use them but generally if you poke around for even a little peer review action on studies like that you'll find it and find when they're garbage.

      The real headache is when you have a legit study that's being taken out of context. Those are harder to spot.
      • in peer review. It's annoying that political pundits will use them but generally if you poke around for even a little peer review action on studies like that you'll find it and find when they're garbage.

        The real headache is when you have a legit study that's being taken out of context. Those are harder to spot.

        I believe the fraudulent Egyptian study on treating COVID with Ivermectin [bbc.com] went through peer review.

        Peer review is about looking at methods and analysis, it's not really well equipped to figure out if someone is simply fabricating data.

        That's why this is worrying, because it makes creation of fake datasets easier, meaning more unethical researchers are bound to do it. And the researchers who are going to do this are the ones in less developed countries like Egypt where it's even harder for outsiders to verif

        • So some stuff is going to slip through but when other researchers go to use the data they're going to quickly figure out that the data doesn't work.

          Science is about reproducibility. If I write a bad paper it gets peer reviewed and it's somehow slips through the process but it sits doing nothing for decades on end that's pretty harmless. If my bad paper gets picked up and used then it's going to get a lot more attention and the peer review process is going to catch the mistakes made the first time.

          Thi
          • So some stuff is going to slip through but when other researchers go to use the data they're going to quickly figure out that the data doesn't work.

            How? The data will work, it just doesn't represent reality.

            Science is about reproducibility. If I write a bad paper it gets peer reviewed and it's somehow slips through the process but it sits doing nothing for decades on end that's pretty harmless. If my bad paper gets picked up and used then it's going to get a lot more attention and the peer review process is going to catch the mistakes made the first time.

            The problem is that reproducibility is hard, especially with fields that study people (epidemiology, medical, psychology). Did you fail to reproduce because you did the experiment wrong? Because of another variable you changed? Because the population is different? etc, etc. That's why people do reviews.

            This is basically how we know that study was bogus. It's one of those cases where ultimately the system works. It's not like religious dogma where mistakes and distortions can stand for hundreds if not thousands of years

            The system working in that case is a bit of a stretch.

            We caught the study because the fraud was lazy and had obvious red flags like partially duplica

            • The problem is that reproducibility is hard

              Yes, that's why they normally go to highly trained scientists to reproduce it. If it cannot be reproduced than it is taken with a grain of salt until it is. Findings that cannot be reproduced are not really useful anyway.

      • Kind of like how the latest vaccine was tested on just 8 mice? If we're going to worry about fraud studies, should we be more worried they aren't running studies anymore and instead just giving everyone experimental medicine? Aren't 3 stages of trials usually needed? Maybe someone can explain to me why the covid vaccines don't have to follow any safety rules.
  • OMG (Score:4, Insightful)

    by backslashdot ( 95548 ) on Thursday November 23, 2023 @12:22PM (#64027073)

    They really have achieved human-like behavior.

  • by zendarva ( 8340223 ) on Thursday November 23, 2023 @12:23PM (#64027077)
    It's a machine for producing responses that look like answers. It has no capacity, or intent to be *right* Why do people keep forgetting this?
    • There was a time when things that could not easily be explained were thought to be magic. The same thing is happening by calling pattern matching algorithms 'artificial intelligence'.
      • by ffkom ( 3519199 )
        While I share your thought on how prematurely things are being called "artificial intelligence", we also have to admit that there is no good definition of "natural intelligence", and no real understanding of how natural intelligence emerges from a blob of connected nerve cells. We cannot rule out that, maybe, those connected nerve cells ultimately do not much more than sophisticated pattern matching and extrapolation.
    • Because they never knew it and if they did then knowing it gets in the way of falsely generated profits.

    • Seems like I have to remind someone once a week that ChatGPT is a world class bullshitter.
    • Yes. I like to say, it is a language model not a meaning model.
  • by turp182 ( 1020263 ) on Thursday November 23, 2023 @12:23PM (#64027079) Journal

    Generating sample or test data is a powerful use case for AI.

    Dump a DDL script into a conversation and have it generate insert statements. Bam!

    Generate statistic sets based loosely on equations (for different definitions of loose). Bam!

    What is being pointed out is a feature, certainly not a bug...

    Oh, I can and have many times done just this with Excel. However Excel is probably more dangerous than AI at this point...

    • Heh, I was thinking the same exact thing as I read this. There's actually some real-world demand for accurately-structured filler data for various types of server load testing for planning resource provisioning. It would be great at that!

  • by Tyr07 ( 8900565 ) on Thursday November 23, 2023 @12:29PM (#64027101)

    People were able to design things like this and fake data, reports etc have been around for a long, long time.
    They used many tools to generate this information. Under scrutiny the data fails verification, like checking if that person really exists, but people have made fake convincing data for a long, long time. Like all data sets, AI or not, you have to still validate it.

    What's happened here is there is technology that they're introduced to and they're shocked 'Tech can do this?! WOW hey guys! Do you realize what this can do?'

    Hence my beer reference.
    "Whoa guys, have you tried this thing called beer? I started using it, you know the next day though, there are some serious side affects, watch out'.
    The rest of us have an eye brow raised and are like, 'uh huh...'

    • by HiThere ( 15173 )

      I think the expectation is that data generated in this way would be even harder to detect as fake data. This is both plausible and troubling.

      • by Tyr07 ( 8900565 )

        I think the expectation is that data generated in this way would be even harder to detect as fake data. This is both plausible and troubling.

        Only if you were taking things at face value, which is a common problem the media tries to exploit. Anyone worth their salt that uses data from other studies attempts to verify the data. Competent researchers do not take unverified study data as accurate. My previous position stands. It's easier for lazy people to be aware of the ability to generate fake data, that's about it. Making it more accessible to generate doesn't take away from competent researchers, although might make it more obnoxious to sift th

    • Ironically or not, this is a major reason science isn't blind to the 'who'. Variations on just writing down numbers, or selectively keeping numbers have been around for as long as the scientific method.

  • Todays AI is! (Score:4, Insightful)

    by oldgraybeard ( 2939809 ) on Thursday November 23, 2023 @12:30PM (#64027109)
    Really big on "Artificial" with no "Intelligence". Which makes it the perfect tool to create misinformation and propaganda. Garbage in, Garbage out!
  • Pretty much the only thing it is good at as far as I can see.

  • by MpVpRb ( 1423381 ) on Thursday November 23, 2023 @01:40PM (#64027213)

    Scientists are judged by the number of papers they produce, not the quality
    When you require publication in order to remain employed, clever people may seek workarounds if they have nothing real to publish

  • Okay. "Robot: commit $TYPE fraud for me!" is a rather underwhelming bit of functionality.

    At best, it replaces "Grad student/postdoc: commit fraud for me!" as the de facto lexicon of a non-negligible portion of the modern American academic's lexicon.

    Somewhat of an inferior substitute, actually. While faster, the machine, having no sense of morality or even a memory beyond a single user session, will happily execute the fraud, but unlike the grad student, will not internalize the lesson that the fraud *is* th

  • The title implies that ChatGPT just spat out fake data that supported it's conclusions. It did not. You have to lead it to where you want to go - it takes agency.

  • Why can't backlinks to the source documents be maintained, allowing a user to click on any selection and say "show your work"?

  • Has there ever been a white person who got rich and old and didn't grow out of touch and distant from the rest of the world? The founders couldn't care less. They're off enjoying their multi billions and doing god knows what. And I can't even blame them, I'd probably do the same.

  • That's a childish method, and you risk being caught. Much better is to use low statistics experiment real data. If you play enough with data (honest analysis, no cheating), or if you repeat your low statistics experiments often enough, eventually you will find something unusual. Now you can make some nice story out of it. Some reviewers might be picky, but often enough you will get through peer-review. When you controversial paper is published, there will be plenty of theoreticians arguing why you measured
  • amaaaaaaaaaazing, color me impressed. give whoever is in charge of said software this week (I vote clippy) a cookie

  • Having interacted with ChapGPT often over the last six months or so, I have stopped using it precisely because of that: it often lies, and tries to hide that fact. Which makes it worse than useless.
  • More indication that ChatGPT is indistinguishable from human researchers.

If imprinted foil seal under cap is broken or missing when purchased, do not use.

Working...