GPT-Fabricated Scientific Papers Found on Google Scholar by Misinformation Researchers (harvard.edu) 81
Harvard's school of public policy is publishing a Misinformation Review for peer-reviewed, scholarly articles promising "reliable, unbiased research on the prevalence, diffusion, and impact of misinformation worldwide."
This week it reported that "Academic journals, archives, and repositories are seeing an increasing number of questionable research papers clearly produced using generative AI." They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing. Google Scholar easily locates and lists these questionable papers alongside reputable, quality-controlled research. Our analysis of a selection of questionable GPT-fabricated scientific papers found in Google Scholar shows that many are about applied, often controversial topics susceptible to disinformation: the environment, health, and computing.
The resulting enhanced potential for malicious manipulation of society's evidence base, particularly in politically divisive domains, is a growing concern... [T]he abundance of fabricated "studies" seeping into all areas of the research infrastructure threatens to overwhelm the scholarly communication system and jeopardize the integrity of the scientific record. A second risk lies in the increased possibility that convincingly scientific-looking content was in fact deceitfully created with AI tools and is also optimized to be retrieved by publicly available academic search engines, particularly Google Scholar. However small, this possibility and awareness of it risks undermining the basis for trust in scientific knowledge and poses serious societal risks.
"Our analysis shows that questionable and potentially manipulative GPT-fabricated papers permeate the research infrastructure and are likely to become a widespread phenomenon..." the article points out.
"Google Scholar's central position in the publicly accessible scholarly communication infrastructure, as well as its lack of standards, transparency, and accountability in terms of inclusion criteria, has potentially serious implications for public trust in science. This is likely to exacerbate the already-known potential to exploit Google Scholar for evidence hacking..."
This week it reported that "Academic journals, archives, and repositories are seeing an increasing number of questionable research papers clearly produced using generative AI." They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing. Google Scholar easily locates and lists these questionable papers alongside reputable, quality-controlled research. Our analysis of a selection of questionable GPT-fabricated scientific papers found in Google Scholar shows that many are about applied, often controversial topics susceptible to disinformation: the environment, health, and computing.
The resulting enhanced potential for malicious manipulation of society's evidence base, particularly in politically divisive domains, is a growing concern... [T]he abundance of fabricated "studies" seeping into all areas of the research infrastructure threatens to overwhelm the scholarly communication system and jeopardize the integrity of the scientific record. A second risk lies in the increased possibility that convincingly scientific-looking content was in fact deceitfully created with AI tools and is also optimized to be retrieved by publicly available academic search engines, particularly Google Scholar. However small, this possibility and awareness of it risks undermining the basis for trust in scientific knowledge and poses serious societal risks.
"Our analysis shows that questionable and potentially manipulative GPT-fabricated papers permeate the research infrastructure and are likely to become a widespread phenomenon..." the article points out.
"Google Scholar's central position in the publicly accessible scholarly communication infrastructure, as well as its lack of standards, transparency, and accountability in terms of inclusion criteria, has potentially serious implications for public trust in science. This is likely to exacerbate the already-known potential to exploit Google Scholar for evidence hacking..."
Academic future (Score:5, Interesting)
Shouldn't there be a go fund me or bug bounty to detect and force retractions of papers?
And then determine if GPT and AI written work if worthy of getting a researcher expelled from the research university's staff.
Re: (Score:1)
No, because most of these papers are just written by students who are forced to have those written to graduate and leave the trash fire that is modern academia forever. They're minimum necessary effort to generate this necessary evil.
No one will ever cite them, because people who actually do research professionally understand the formulaic nature of such papers (as they're written under professor's guidance who understanding the issue will make them adhere to a specific formula).
If you want to change this,
Re: (Score:2)
Going through the motions is how you learn, for example, to write. That extends to writing for a particular purpose. You don't learn that if you ChatGPT it, so sanctioning the students who do that and lie about it seems like the correct course of action.
Removing the incentive to publish these papers is also a good idea, but we can hold 2 good ideas in our head simultaneously, or at least I can.
Re: (Score:2)
"Going through the motions is how you learn, for example, to write. That extends to writing for a particular purpose."
Not when the purpose is unrelated to writing as in this case. It doesn't matter how the information gets into an accepted format and tone for its purpose, feeding the real information into chatgpt and having it do the druge work of writing/formatting is perfectly valid. Writing the paper is not a end in itself, it is just a means for sharing the information.
Re: (Score:2)
You share information by communicating it clearly and effectively. Writing the paper is not separate from that goal, it's integral to it. ChatGPT might be able to do it for you, or it might not. You won't even be able to tell if you've never written yourself.
If a critical part of your job feels like "drudge work", either suck it up, or find a new line of work. Most likely option #2, because if you're considering using ChatGPT, you are clearly lacking in the rigor and meticulousness that science demands.
Re: (Score:2)
"You share information by communicating it clearly and effectively. Writing the paper is not separate from that goal, it's integral to it."
I disagree. Having a well written paper is integral to that goal, at least as long as papers are the medium which we are using. There are more paths from having the information to having a well written paper than 'writing the paper.'
"You won't even be able to tell if you've never written yourself."
Nonsense. You'll be able to tell by reading it.
"If a critical part of your
Re: (Score:2)
"You won't even be able to tell if you've never written yourself."
Nonsense. You'll be able to tell by reading it.
Have you ever heard of the Dunning-Kreuger effect?
Re: (Score:2)
Yes. But I am surprised you'd mention it since the hypothetical "author" who utilized chatgpt and proofread the result is at less risk of bias than the one that wrote the paper themselves. That is a strong example of philosophical charity and intellectual integrity. I applaud that.
In turn, I have to admit there is merit to your argument that having a certain amount of experience writing papers helps with reading an interpreting them as well. It is always beneficial when communicating to be able to step into
Re: (Score:2)
Arithmetic is a great analogy. But unlike arithmetic, I would say most university students have not developed communication skills well enough that they can skip practice.
We already have navigation and typing skills (and much else) on the decline due to smart phones. It certainly seems that most people are OK watching their general competence degrade, or fail to develop. But there are certain fields - science, medicine - where that kind of slippage isn't acceptable. And yet I do think it's happening. I don'
Re: (Score:1)
Absolutely. And a handful of those that have to write those may actually find that they're good at it and keep doing it until they get good. At which point they will start getting cited.
The problem is that writing papers is mostly chatGPT automatable shit that is exceedingly boring, repetitive and frankly mindless. The actual meat of the paper, the novel thing, the one that is the subject is a tiny percentage of the workload.
And automating the mundane so people can focus on the novel is one of the best thin
Re: (Score:2)
No, because most of these papers are just written by students who are forced to have those written to graduate and leave the trash fire that is modern academia forever.
Then the students should have their degrees rescinded. It's the same thing you would do if you found out that a student published made-up data, or plagiarized large parts of their thesis, etc.
Except in this case, an additional step is needed: After rescinding the student's degree, you should open a formal investigation to find out why the thesis committee approved their work. (If a non-expert could determine that it was written by machine, why couldn't a committee of experts figure out the same thing?)
I do
Re: (Score:1)
This is "computers should not be allowed...", "typewriters should not be allowed", etc narrative.
Reality rejects this claim on fundamental level, because those that don't use new technological breakthrough to automate processes that can be automated get left behind by those that do within a few generations.
Re: (Score:2)
That's a very strange analogy. I've never in my life heard anyone suggest that "computers should be banned from academia" (or that "typewriters" should be banned, FFS).
There are certain basic skills which have always been a requirement for working in academia, and one of them is the ability to convey information through language. You don't have to be Shakespeare-- you don't even need to have impeccable grammar and spelling. You *do*, however, have to be able to communicate in a way that is clear and under
Re: (Score:1)
That is because you know little to nothing about history. Penmanship was key to academia for a long time, and when typewriters came there was massive opposition to their adoption. Because it was much more fast and efficient, but also caused loss of penmanship skill. Notably, argument was exactly the same. That penmanship required careful consideration of writing, and therefore adopting fast typewriters would eliminate the need for that sort of thinking.
Same opposition was for computers and word processing s
Re: (Score:2)
Citation needed. I don't think "penmanship" has been considered a "key" skill in academia for a very, very long time-- not since the invention of the printing press. It would have been considered a useful "secretarial" skill, sort of like being good at typing. But at the end of the day, your academic career and reputation did not depend on the quality of your handwriting. If your handwriting wasn't so good-- well, that's what you had "scriveners" for. Remember Bartleby?
I wasn't around for the adoption
Re: (Score:1)
>for a very, very long time
While bitching about LLMs taking over education, about a year after a large percentage of students openly tell you that they're using them for composition. Ok grandpa. Keep screaming at the cloud. Just like your predecessors did.
>I wasn't around for the adoption of typewriters
Your personal anecdotes are irrelevant. Get out of your bubble and read. Authors in 1800s bitching in essays and contemporary literature about typewriters and how important it is to write by hand for co
Re: (Score:2)
Again... citations needed, Oh Historically-Informed One. For any piece of technology, there is probably *someone* who disliked it and complained about it. I'm sure in the days of the medieval scriptorium, there were older monks who didn't like the new-fashioned dip pens. But I have read a great deal of 19th-century literature-- it happens to have been my major in college-- and I don't think the literature of the time was awash in essays about how typewriters interfered with "correct thought processes". M
Re: (Score:1)
>If you can't see that there is a fundamental difference between a technology
That is your foundational point, that is however also an irrelevant point, because the only difference that matters is "does this technology make this task easier to perform in a functional way?" The rest included under that massive umbrella of extra claims and points that you're selling is a distraction.
And for every technology that does, there are always holdouts that believe the old way is the correct way, and new way comprom
Re: (Score:2)
Entirely untrue. The people dealing in fabricated papers are professionals. You can't just submit a generated paper to a journal, not even one published by MDPI, Frontiers or IEEE, and expect to have it published. You need to have friendly peer reviewers, i.e. a network of other crooks, preferably ones with credible credentials. And of course, these people will want something in return, perhaps citations to their own rubbish papers as much as money. And citations get you promoted, or a new job.
There are ple
Re: (Score:1)
At no point was this discussion about "fabricated papers". Also "fabricated papers" are easy to push among professionals in many fields, as you note above. My personal favourite is the curious case of Boghossian, Lidndsay and Pluckrose that demonstrated just how bad modern academia is when it comes to fabricated papers.
But this discussion has nothing to do with that. We're talking about very real papers, on very real subjects, published in very real journals, peer reviewed by very real peers. Because publis
Re: (Score:2)
The title of the story is "GPT-Fabricated Scientific Papers Found on Google Scholar by Misinformation Researchers", you blithering imbecile. It's the starting point of the discussion.
Re: (Score:1)
That would be because titles are clickbait bullshit you blithering imbecile. As it is assumed that in god's year of 2024, only blithering imbeciles would expect titles to not be clickbait bullshit.
And we generally discuss the content of the story, not the clickbait title.
2030 suggestion (Score:2)
Suggest
The social science community raise funds for 2030 to take significant, high citation count papers and rerun the experiments and republish the updated results. This is really needed in the social sciences side because many high citation count papers, direct or indirectly cited, are from 1 to 2 generations ago where what would be finding 20 years ago would be less likely to be the finding today.
Why this is needed?
The second generation of researchers to examine and recreate a well cited paper would hav
Re: (Score:2)
Alternative suggestion, we just toss out all social science which isn't entirely built upon physical sciences and empirical observation. The rest we call social pseudoscientology so it is appropriately categorized.
Re: (Score:2)
The problem this ignores is that it is perfectly valid to use GPT or other AI to assist in writing a paper. "Attached is my result, here is my conclusion, here are a few key points in the data to support it.... write that out in a long winded overly pretentious and wordy fashion typical of academic level egos" "okay, that is pretty good but fix this error, adjust that, also here is another point I want incorporated"... etc, etc, etc.
Re: (Score:1)
Correct. And there's nothing new about this. Overwhelming majority of papers are that mandatory crap that certain students need to produce to graduate, and while a few take those seriously, most view it as a necessary step to graduation to be done as quickly as possible. So they can get a real job.
That is why you don't just cite papers from google scholar. You read their contents before you do it.
If anything, ChatGPT likely increases quality of such papers..
Re: (Score:2)
"That is why you don't just cite papers from google scholar. You read their contents before you do it."
One would hope everyone is reading everything they cite in any case. Okay, so in school that is a 'nudge nudge, wink wink' but for real/published work...
As for using ChatGPT, it is just a tool. There is nothing wrong with say writing an outline of your paper and then feeding it parameters to write a first pass section by section, or to polish your own first draft. The purpose of these papers is to convey d
Re: (Score:1)
Devil's word rings true yet again, doesn't it?
Re: (Score:1)
Spoiler alert - the misinformation researchers used GPT for their paper.
Sokal Cubed?
Re: Further proof (Score:4, Interesting)
If I read the summary correctly, no one actually reviewed the papers. I think what they are reporting is google scholar hacking. Essentially google scholar does not pull papers from publishers. They pull also paper from the web.
So you can literally write an ieee formated lorem ipsum. Put it in a place where google crawls, and appear on google scholar.(OK that doesnt actually work, but not much smarter than that DOES work )
We have seen people showing the limits of google scholar by fabricating authors with h-index 1000 by generating non sense papers that cite each others.
So in those cases , google scholar is the only thing that actually "read" the paper. It is a failure of Google Scholar more than of the acientific community.
Now it probably also happen in low tier venues (and probably even in high tier venue, at a lower rate). But one needs to remeber that something isn't good science because it was formatted by latex. Or even because it appeared in a known journal or conference. It is good science because it has been independently reproduced by entities you trust.
The ABC's of Alphabet (Score:2)
The Google dilemma continues. How will they cope with bad actors, technology, and harm reduction, while earning profit primarily from advertising and marketing data revenues?
Re: The ABC's of Alphabet (Score:2)
Re: (Score:2)
If by "user," you mean "advertisers," they already do. If you mean those who do searches on their search engine, you're confusing "users" with "product."
Re: The ABC's of Alphabet (Score:2)
Re: (Score:2)
invented data (Score:2)
If the paper is written by ChatGPT, and the paper is actually correct and accurate, then I don't see a problem with that. ChatGPT doesn't write very well, but many papers are not written very well.
Re:invented data (Score:4, Interesting)
What is needed in Science is *fewer* papers, of higher quality, that leave sufficiently large gaps that are trivial to bridge by talented researchers. That is by definition not something a tool like ChatGPT, which only interpolates existing knowledge and makes up the rest, can help with.
Now if you say ChatGPT can help improve the English grammar of the paper, then I will say it doesn't matter, a sufficiently talented researcher can bridge that gap, and in so doing will be forced to think more deeply about the subject matter anyway.
Re: (Score:2)
I think the proliferation of papers is more to do with ever increasing niche areas of research as an ever increasing number authors strive for originality. Whereas these niches appeal to vanishingly smaller audiences it's easier to sneak in some ChatGPT nonsense.
Re: (Score:2)
This is largely about social science. You can slip in ChatGPT nonsense anywhere you like in these fields because they are grounded in speculation, shoddy math, and popular opinions rather than physical reality.
Re: (Score:2)
"The big problem with Science today is the proliferation of papers."
The big problem with science today is that it is infiltrated with social pseudoscience and the rise of government regimes defining a concept like 'misinformation' so they can block information they disagree with.
Without manipulation there should be as many papers as there are things to report, no more or less.
"That is by definition not something a tool like ChatGPT, which only interpolates existing knowledge and makes up the rest, can help
Re: (Score:2)
The big problem with Science today is the proliferation of papers. It doesn't matter if it is accurate and correct: if it isn't original or novel then it still contributes to the information pollution just as much as if it was inaccurate or downright fantasy.
What is needed in Science is *fewer* papers, of higher quality, that leave sufficiently large gaps that are trivial to bridge by talented researchers. That is by definition not something a tool like ChatGPT, which only interpolates existing knowledge and makes up the rest, can help with.
Now if you say ChatGPT can help improve the English grammar of the paper, then I will say it doesn't matter, a sufficiently talented researcher can bridge that gap, and in so doing will be forced to think more deeply about the subject matter anyway.
I feel like that's more a problem from the outsider perspective. Sure those minor results don't lead to a breakthrough, but those incremental steps add up to help create the bigger breakthroughs.
Re: (Score:2)
With how often ChatGPT is wrong about commonly known things I certainly wouldn't trust it to be right about some new, novel, and extremely esoteric research.
Also.. (Score:2)
Re: (Score:2)
Re: (Score:2)
The Marxists are perfectly happy to program children and youth. Why do think they want to make you scared of shutting down the DoE having 50 or more large sets of opinions dominate the educations of children who can then grow to debate and discussed them rather than forcing everyone to unify on their nationalized cannon?
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Ironically that is the reason for the pseudoscientific social science garbage you defend getting a toehold and corrupting science in the first place.
Re: (Score:2)
lol That's a good one. Of course you mean ideas only discredited by garbage pseudoscience in fields which aren't grounded in observations of physical reality and the first place. It's hilarious for someone defending those sorts of ideas to criticize ANYTHING as garbage science.
Look at the kind of ridiculous and convoluted frameworks you've had to invent. Needing a decade of brainwashing to convince people to agree with your rationalizations, layer-by-layer, doesn't mean you are educated and enlightened, whi
What is the correlation with publication quality? (Score:2)
To gauge if this is a serious problem, we need to know if the top conferences and journals suffer from accepting these AI-generated papers. We should keep in mind that even before the advent of AI-generated papers, we already had to deal with paper mills that tried to game stats such as h-index or paper counts per school or country. Most of these problems arose from non-top level conferences and journals.
If AI-generated (and the implication is that such papers are low quality) papers infect arxiv, is that a
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
And this being america, you'd be shot on sight, on site, for trespassing with a firearm ?
Google's reputation precedes them (Score:2)
That they're now getting caught for promoting misinformation & disinformation in Google Scholar comes as no surprise. Also, given the p
Re: (Score:2)
Yup and remember when they suppressed COVID-19 related information on behalf of government pressure from the organization which funded the gain of function research that likely led to the pandemic and was definitely trying to cover up that possibility?
Re: (Score:2)
Re: (Score:2)
I first became aware that youtube was censoring information related to COVID when a biohacking group I followed produced a vaccine, documenting their work throughout the process for full transparency and then was shut down by youtube.
Re: (Score:2)
Er ... (Score:2)
it increases the noise level. (Score:2)
The problem is the LLMs make it easier to convert vague ideas into papers. This allows the volume to be increased at little cost, with no increase in information content. I.e. it increases the noise level.
so who published them? (Score:4, Insightful)
They were posted so someone must know, yes?
And I presume a paper published in the world has to have an attributed author?
So are these people being identified, blocked, and banned from science publishing...forever?
If a "scientist" publishes a gpt-authored paper, they should be hounded out of the field.
Are they?
Re: so who published them? (Score:2)
I would imagine so.
In computing, publishing is mostly done by ACM and IEEE. Cases of plagiarism and data fabrication are reported to the publisher. They maintain a list (and I believe share it between them) to ban authors caught in non ethical authorship.
When you run a journal or a conferen that they sponsor, they share a list of banned author.
If a conferenreceive a paper from them, you just forward to the publisher who handles it
I have never served on the ethics panel, so I am not sure what the precise sta
Re: (Score:2)
I just used this standard for random stories from the press over the last 4 yrs and it was EXTREMELY accurate. Virtually everything contained a white house press release which is corroborated by a legacy media story has non-promoted retraction 3-6 months later which is subsequently buried. Talking heads and "fact checks" from the networks then proceed as if the retraction/correction never occurred and continue to reference opinions consistent with it as falsehoods and misinformation.
misinformation (Score:2)
I don't know what "misinformation" is. But, it does seem highly likely that as the bulk of scientific publications are in English, and the bulk of the world does not speak or rite gud inglish, that they would use LLM software to help them write and/or translate their papers.
Re: (Score:2)
I doubt that the bulk of scientific papers are in English, but the bulk that are covered by Google probably are. So you've identified one strain of the problem. There are others.
Re: (Score:2)
Indeed, since English is the most universal language it would make far sense not only for the bulk of papers to be written in it but to stop using other languages as the primarily language in academics globally. This should reduce translation errors and miscommunications drastically as well as vastly expanding the pool of readily consumed and shared science across the board for future generations.
Re: (Score:2)
I think China might have a few objections to that. Whether something "makes sense" as a choice depends on what your goals are, and China might be just as happy if a lot of their developments didn't rapidly leak outside the country. (Rapidly is a key word here. I'm not talking about explicit secrecy, but just a barrier that slows diffusion.)
Re: (Score:2)
From what I understand the primarily reason the Chinese state promotes maintaining a China specific language to help them manage propaganda and information outside of science so I'm sure it would be the same within. I could certainly see slowing down ingestion of outside discoveries/information that conflict with their state propaganda as a priority for the state as well as having more opportunity to contain embarrassing errors and fake research.
Still, that does seem like something of a compromise on the la
Re: misinformation (Score:2)
That is indeed jappening. I have been reviewing papers from $notenglishspeakingcountry recently which were of way lower quality than what they used to write. I am talking top tier research institution that usually writes very good papers. The last 3 I reviewed from them were almost unreadable. My guess is that they pushed the writing to an AI translator rather than eriting it themselves.
In all 3 cases I had to request a reject because I could not understand the paper because of its poor language