ChatGPT Generates Fake Data Set To Support Scientific Hypothesis (nature.com) 41

Posted by msmash on Thursday November 23, 2023 @12:00PM from the watch-out dept.

Researchers have used the technology behind the AI chatbot ChatGPT to create a fake clinical-trial data set to support an unverified scientific claim. From a report: In a paper published in JAMA Ophthalmology on 9 November, the authors used GPT-4 -- the latest version of the large language model on which ChatGPT runs -- paired with Advanced Data Analysis (ADA), a model that incorporates the programming language Python and can perform statistical analysis and create data visualizations. The AI-generated data compared the outcomes of two surgical procedures and indicated -- wrongly -- that one treatment is better than the other.

"Our aim was to highlight that, in a few minutes, you can create a data set that is not supported by real original data, and it is also opposite or in the other direction compared to the evidence that are available," says study co-author Giuseppe Giannaccare, an eye surgeon at the University of Cagliari in Italy. The ability of AI to fabricate convincing data adds to concern among researchers and journal editors about research integrity. "It was one thing that generative AI could be used to generate texts that would not be detectable using plagiarism software, but the capacity to create fake but realistic data sets is a next level of worry," says Elisabeth Bik, a microbiologist and independent research-integrity consultant in San Francisco, California. "It will make it very easy for any researcher or group of researchers to create fake measurements on non-existent patients, fake answers to questionnaires or to generate a large data set on animal experiments."

ChatGPT Generates Fake Data Set To Support Scientific Hypothesis

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 41 Comments Log In/Create an Account

Comments Filter:

Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
- Comment removed (Score:5, Insightful)
  
  by account_deleted ( 4530225 ) writes: on Thursday November 23, 2023 @01:36PM (#64027211)
  
  Comment removed based on user account deletion
  
  - Re: (Score:3)
    
    by dargaud ( 518470 ) writes:
    
    Yes. Even when you use it seriously it will replace real science in many cases. For instance, in my former field (climatology/weather forecast), you don't even need to know PV=nRT or any physics anymore. You just feed all your archive data to the AI, all the temp, pressure, humidity and etc measurements, give it the weather a hour / day / week after, and ask it to reproduce. The forecasts are better than with hugely complex science models. And much faster too. But of course nobody understands them. That's a
- Re: (Score:2)
  
  by NomDeAlias ( 10449224 ) writes:
  
  Lying is a sign of intelligence.
  - Re:AI is gonna change the world (Score:4, Insightful)
    
    by narcc ( 412956 ) writes: on Thursday November 23, 2023 @08:10PM (#64027835) Journal
    
    It's not lying. That would imply understanding, something well beyond the reach of models like this. The problem the article highlights, however, is absolutely real, but hardly something new:
    the capacity to create fake but realistic data sets is a next level of worry
    The problem with this statement is that generating data that is statistically similar to other data is exactly what these things are designed to do! From n-grams to transformers, that's the whole game.
    
Turing (Score:4, Interesting)

by grasshoppa ( 657393 ) writes: on Thursday November 23, 2023 @12:13PM (#64027061) Homepage

So it's indistinguishable from "real" scientists.
Turing test passed?

- Re: (Score:3)
  
  by HiThere ( 15173 ) writes:
  
  No. In the Turing Test the questioner got to pick the questions they asked. Including the topics.
- Meh, fake datasets stick out like sore thumbs (Score:5, Interesting)
  
  by rsilvergun ( 571051 ) writes: on Thursday November 23, 2023 @01:05PM (#64027149)
  
  in peer review. It's annoying that political pundits will use them but generally if you poke around for even a little peer review action on studies like that you'll find it and find when they're garbage.
  
  The real headache is when you have a legit study that's being taken out of context. Those are harder to spot.
  
  - Re: (Score:2)
    
    by quantaman ( 517394 ) writes:
    
    in peer review. It's annoying that political pundits will use them but generally if you poke around for even a little peer review action on studies like that you'll find it and find when they're garbage.
    The real headache is when you have a legit study that's being taken out of context. Those are harder to spot.
    I believe the fraudulent Egyptian study on treating COVID with Ivermectin [bbc.com] went through peer review.
    Peer review is about looking at methods and analysis, it's not really well equipped to figure out if someone is simply fabricating data.
    That's why this is worrying, because it makes creation of fake datasets easier, meaning more unethical researchers are bound to do it. And the researchers who are going to do this are the ones in less developed countries like Egypt where it's even harder for outsiders to verif
    - There's a nearly unlimited amount of papers (Score:1)
      
      by rsilvergun ( 571051 ) writes:
      
      So some stuff is going to slip through but when other researchers go to use the data they're going to quickly figure out that the data doesn't work.
      
      Science is about reproducibility. If I write a bad paper it gets peer reviewed and it's somehow slips through the process but it sits doing nothing for decades on end that's pretty harmless. If my bad paper gets picked up and used then it's going to get a lot more attention and the peer review process is going to catch the mistakes made the first time.
      
      Thi
      - Re: (Score:2)
        
        by quantaman ( 517394 ) writes:
        
        So some stuff is going to slip through but when other researchers go to use the data they're going to quickly figure out that the data doesn't work.
        
        How? The data will work, it just doesn't represent reality.
        Science is about reproducibility. If I write a bad paper it gets peer reviewed and it's somehow slips through the process but it sits doing nothing for decades on end that's pretty harmless. If my bad paper gets picked up and used then it's going to get a lot more attention and the peer review process is going to catch the mistakes made the first time.
        The problem is that reproducibility is hard, especially with fields that study people (epidemiology, medical, psychology). Did you fail to reproduce because you did the experiment wrong? Because of another variable you changed? Because the population is different? etc, etc. That's why people do reviews.
        This is basically how we know that study was bogus. It's one of those cases where ultimately the system works. It's not like religious dogma where mistakes and distortions can stand for hundreds if not thousands of years
        The system working in that case is a bit of a stretch.
        We caught the study because the fraud was lazy and had obvious red flags like partially duplica
        
        Re: (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        The problem is that reproducibility is hard
        Yes, that's why they normally go to highly trained scientists to reproduce it. If it cannot be reproduced than it is taken with a grain of salt until it is. Findings that cannot be reproduced are not really useful anyway.
  - Te5t3d 0n eight m1c3 (Score:1)
    
    by buck-yar ( 164658 ) writes:
    
    Kind of like how the latest vaccine was tested on just 8 mice? If we're going to worry about fraud studies, should we be more worried they aren't running studies anymore and instead just giving everyone experimental medicine? Aren't 3 stages of trials usually needed? Maybe someone can explain to me why the covid vaccines don't have to follow any safety rules.
    - Re: (Score:2)
      
      by fluffernutter ( 1411889 ) writes:
      
      It was not a new vaccine. It was the same vaccine with a different strain.
OMG (Score:4, Insightful)

by backslashdot ( 95548 ) writes: on Thursday November 23, 2023 @12:22PM (#64027073)

They really have achieved human-like behavior.

ChatGPT isn't an answer machine. (Score:5, Insightful)

by zendarva ( 8340223 ) writes: on Thursday November 23, 2023 @12:23PM (#64027077)

It's a machine for producing responses that look like answers. It has no capacity, or intent to be *right* Why do people keep forgetting this?

- Re: (Score:3)
  
  by Fly Swatter ( 30498 ) writes:
  
  There was a time when things that could not easily be explained were thought to be magic. The same thing is happening by calling pattern matching algorithms 'artificial intelligence'.
  - Re: (Score:2)
    
    by ffkom ( 3519199 ) writes:
    
    While I share your thought on how prematurely things are being called "artificial intelligence", we also have to admit that there is no good definition of "natural intelligence", and no real understanding of how natural intelligence emerges from a blob of connected nerve cells. We cannot rule out that, maybe, those connected nerve cells ultimately do not much more than sophisticated pattern matching and extrapolation.
- Re: (Score:2)
  
  by iAmWaySmarterThanYou ( 10095012 ) writes:
  
  Because they never knew it and if they did then knowing it gets in the way of falsely generated profits.
- Re: (Score:2)
  
  by NomDeAlias ( 10449224 ) writes:
  
  Seems like I have to remind someone once a week that ChatGPT is a world class bullshitter.
- Re: (Score:2)
  
  by jbmartin6 ( 1232050 ) writes:
  
  Yes. I like to say, it is a language model not a meaning model.
This is actually a great use for AI (Score:5, Informative)

by turp182 ( 1020263 ) writes: on Thursday November 23, 2023 @12:23PM (#64027079) Journal

Generating sample or test data is a powerful use case for AI.
Dump a DDL script into a conversation and have it generate insert statements. Bam!
Generate statistic sets based loosely on equations (for different definitions of loose). Bam!
What is being pointed out is a feature, certainly not a bug...
Oh, I can and have many times done just this with Excel. However Excel is probably more dangerous than AI at this point...

- Re: (Score:1)
  
  by Narcocide ( 102829 ) writes:
  
  Heh, I was thinking the same exact thing as I read this. There's actually some real-world demand for accurately-structured filler data for various types of server load testing for planning resource provisioning. It would be great at that!
I remember my first beer too. (Score:3)

by Tyr07 ( 8900565 ) writes: on Thursday November 23, 2023 @12:29PM (#64027101)

People were able to design things like this and fake data, reports etc have been around for a long, long time.
They used many tools to generate this information. Under scrutiny the data fails verification, like checking if that person really exists, but people have made fake convincing data for a long, long time. Like all data sets, AI or not, you have to still validate it.
What's happened here is there is technology that they're introduced to and they're shocked 'Tech can do this?! WOW hey guys! Do you realize what this can do?'
Hence my beer reference.
"Whoa guys, have you tried this thing called beer? I started using it, you know the next day though, there are some serious side affects, watch out'.
The rest of us have an eye brow raised and are like, 'uh huh...'

- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  I think the expectation is that data generated in this way would be even harder to detect as fake data. This is both plausible and troubling.
  - Re: (Score:3)
    
    by Tyr07 ( 8900565 ) writes:
    
    I think the expectation is that data generated in this way would be even harder to detect as fake data. This is both plausible and troubling.
    Only if you were taking things at face value, which is a common problem the media tries to exploit. Anyone worth their salt that uses data from other studies attempts to verify the data. Competent researchers do not take unverified study data as accurate. My previous position stands. It's easier for lazy people to be aware of the ability to generate fake data, that's about it. Making it more accessible to generate doesn't take away from competent researchers, although might make it more obnoxious to sift th
- Re: (Score:1)
  
  by ChrisPa ( 8510271 ) writes:
  
  Ironically or not, this is a major reason science isn't blind to the 'who'. Variations on just writing down numbers, or selectively keeping numbers have been around for as long as the scientific method.
Todays AI is! (Score:4, Insightful)

by oldgraybeard ( 2939809 ) writes: on Thursday November 23, 2023 @12:30PM (#64027109)

Really big on "Artificial" with no "Intelligence". Which makes it the perfect tool to create misinformation and propaganda. Garbage in, Garbage out!

This thing is a good liar (Score:1)

by gweihir ( 88907 ) writes:

Pretty much the only thing it is good at as far as I can see.
This exposes a serious problem (Score:3)

by MpVpRb ( 1423381 ) writes: on Thursday November 23, 2023 @01:40PM (#64027213)

Scientists are judged by the number of papers they produce, not the quality
When you require publication in order to remain employed, clever people may seek workarounds if they have nothing real to publish

Automating the scientific enterprise (Score:1)

by RightwingNutjob ( 1302813 ) writes:

Okay. "Robot: commit $TYPE fraud for me!" is a rather underwhelming bit of functionality.
At best, it replaces "Grad student/postdoc: commit fraud for me!" as the de facto lexicon of a non-negligible portion of the modern American academic's lexicon.
Somewhat of an inferior substitute, actually. While faster, the machine, having no sense of morality or even a memory beyond a single user session, will happily execute the fraud, but unlike the grad student, will not internalize the lesson that the fraud *is* th
Implication is wrong. (Score:2)

by Petersko ( 564140 ) writes:

The title implies that ChatGPT just spat out fake data that supported it's conclusions. It did not. You have to lead it to where you want to go - it takes agency.
How hard can this be? (Score:2)

by DulcetTone ( 601692 ) writes:

Why can't backlinks to the source documents be maintained, allowing a user to click on any selection and say "show your work"?
you can't even trust the science (Score:2)

by diffract ( 7165501 ) writes:

today
Rich White People (Score:2)

by sonicmerlin ( 1505111 ) writes:

Has there ever been a white person who got rich and old and didn't grow out of touch and distant from the rest of the world? The founders couldn't care less. They're off enjoying their multi billions and doing god knows what. And I can't even blame them, I'd probably do the same.
That is not how you do it (Score:1)

by KAdamM ( 2996395 ) writes:

That's a childish method, and you risk being caught. Much better is to use low statistics experiment real data. If you play enough with data (honest analysis, no cheating), or if you repeat your low statistics experiments often enough, eventually you will find something unusual. Now you can make some nice story out of it. Some reviewers might be picky, but often enough you will get through peer-review. When you controversial paper is published, there will be plenty of theoreticians arguing why you measured
What? an engine made to create data made some? wow (Score:2)

by youn ( 1516637 ) writes:

amaaaaaaaaaazing, color me impressed. give whoever is in charge of said software this week (I vote clippy) a cookie
ChatGPT lies way too often (Score:2)

by RUs1729 ( 10049396 ) writes:

Having interacted with ChapGPT often over the last six months or so, I have stopped using it precisely because of that: it often lies, and tries to hide that fact. Which makes it worse than useless.
more indication (Score:2)

by groobly ( 6155920 ) writes:

More indication that ChatGPT is indistinguishable from human researchers.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re: (Score:2)

Comment removed (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re:AI is gonna change the world (Score:4, Insightful)

Turing (Score:4, Interesting)

Re: (Score:3)

Meh, fake datasets stick out like sore thumbs (Score:5, Interesting)

Re: (Score:2)

There's a nearly unlimited amount of papers (Score:1)

Re: (Score:2)

Re: (Score:2)

Te5t3d 0n eight m1c3 (Score:1)

Re: (Score:2)

OMG (Score:4, Insightful)

ChatGPT isn't an answer machine. (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

This is actually a great use for AI (Score:5, Informative)

Re: (Score:1)

I remember my first beer too. (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

Todays AI is! (Score:4, Insightful)

This thing is a good liar (Score:1)

This exposes a serious problem (Score:3)

Automating the scientific enterprise (Score:1)

Implication is wrong. (Score:2)

How hard can this be? (Score:2)

you can't even trust the science (Score:2)

Rich White People (Score:2)

That is not how you do it (Score:1)

What? an engine made to create data made some? wow (Score:2)

ChatGPT lies way too often (Score:2)

more indication (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals