The Telltale Words That Could Identify Generative AI Text 43
A new study suggests at least 10% of scientific abstracts in 2024 were processed using large language models, researchers from the University of Tubingen and Northwestern University report. Analyzing 14 million PubMed abstracts from 2010-2024, the team identified an unprecedented surge in certain "style words" following LLMs' widespread adoption in late 2022.
Words like "delves" and "showcasing" saw a 25-fold and 9-fold increase respectively in 2024 abstracts compared to pre-LLM trends. Common terms such as "potential" and "findings" also spiked in usage. The researchers drew parallels to studies measuring COVID-19's impact through excess deaths, applying a similar methodology to detect "excess word usage" in scientific writing.
Words like "delves" and "showcasing" saw a 25-fold and 9-fold increase respectively in 2024 abstracts compared to pre-LLM trends. Common terms such as "potential" and "findings" also spiked in usage. The researchers drew parallels to studies measuring COVID-19's impact through excess deaths, applying a similar methodology to detect "excess word usage" in scientific writing.
maybe (Score:4, Interesting)
On the other hand, there are more non English speakers writing papers, and they might be funneling them through LLMs to clean up the English.
Re: (Score:3)
Re: (Score:2)
Happened with Stack Exchange, back when that as relevant. Words like "performant", and questions awkwardly phrased as as "what are some X" to get around bans on asking for recommendations and lists.
The specific language mentioned in TFA sounds like how UK newspapers often phrase things. For example, if the Daily Heil buys some paparazzi long lens photos of some minor celebrity in a bikini on their private property, they will sanitize it by describing them as "showcasing" their body to make it sound like the
Re: (Score:1)
Re: (Score:2)
The reason these AI responses might use these words a lot, is that REAL people use these words a lot.
Not sure this is a good test. I can foresee a lot of false positives on this one.
Re: (Score:3)
No one here proposed this as a "test". The summary even points out that it is an "increase" not a unique trait. That said given your posting history a lot of us have a long standing belief that you're in fact a poorly coded AI beta so it's no surprise you use the word delves a lot :-P
Re: (Score:2)
Hmm....I guess if you looked at ALL my posting history...you'd see I pre-date AI as we know it today.
Re: (Score:2)
Re: (Score:2)
I am no native English speaker, so if I was to write such a paper I would start at looking at some (human or AI generated) to learn how they are made. If I see an excess of those specific words, probably I would assume that is the fashion and follow it.
We need... (Score:1)
...cryptographically secure methods of positively identifying text generated by LLMs
Using them is fine
Lying about using them is not
Re: (Score:2)
When the mechanic works on your car do the tools need to be certified? Certainly the mechanic has tools to make the job faster/easier.
Re: (Score:2)
Re: (Score:2)
Its about the tools, not the mechanic.
Re: We need... (Score:2)
I think there's a fine line where AI is no longer a mere tool or reference but providing the actual product, whether that be a drawing, a report reflecting actual original effortful analysis and critical thinking, etc. Defining line won't be easy though.
Re:We need... (Score:5, Funny)
"...cryptographically secure methods of positively identifying text generated by LLMs"
That can only be produced with the voluntary cooperation of the LLM. In the end, it can only prove that something was produced by a certain LLM. It cannot prove that something was *not* produced by an LLM. You've re-invented the evil bit.
No (Score:2)
Would be nice to have a LLM-text-confidence metric as a prior to more elaborate and human-driven bullshit detectors though...
Re:No (Score:4, Insightful)
If they are seen in LLM text, means they were statistically significant enough to begin with ("common" as the story says!). I'd use all four in a paper. Can we actually judge the content rather that the textual sugar-coating? Yes, it's harder, but absolutely necessary. Would be nice to have a LLM-text-confidence metric as a prior to more elaborate and human-driven bullshit detectors though...
So... two things.
One: this study found a 2,500% increase in the use of "delves" within abstracts. That you personally may or may not use that word - or any other one - isn't significant. What is significant is the statistical deviation from a few years ago before LLMs were in widespread use.
Two: that a word is "common" and thus likely used by LLMs ignores context. Those words are common, yes, and the LLMs are trained on text that has influenced their output to include them. What's important to consider is that they are common marketing words, not common within clinical, precise language used for scientific papers. What this study is revealing is that these LLMs are context-unaware, generating output that is linguistically appropriate outside of scientific papers. Out another way, LLMs trained on the general internet are likely to include profanity. An LLM asked to generate a large number of children's novels stands a decent chance of eventually including some profanity because it is context-unaware. It doesn't understand context. That - apparently - the word "delve" is on the rise presumably in place of words like "investigate", "test", and "explore" is exactly that.
Re: (Score:2)
found a 2,500% increase in the use of "delves" within abstracts.
Probably they trained the LLM on a corpus of Dwarven engineering papers.
Re: (Score:2)
The study showed "delves" increased from "almost never" to "incredibly rarely".
Re: (Score:2)
The study showed "delves" increased from "almost never" to "incredibly rarely".
You made me curious, so I went to the actual pre-print study to find the actual raw numbers. What I found instead was distressing.
"We hope that future work will meticulously delve into tracking LLM usage more accurately and assess which policy changes are crucial to tackle the intricate challenges posed by the rise of LLMs in scientific publishing."
Now I'm starting to think this a troll study.
Re: (Score:2)
There was also a previous paper about the use of "delve" increasing, no doubt included in this paper about "delve" increasing.
Re: (Score:2)
> One: this study found a 2,500% increase in the use of "delves" within abstracts.
They keep using that word. I do not think it means what they think it means.
Witch hunting, again (Score:2)
As a consumer of scientific research, if the output is sound, I don't care if the text was written or polished by AI. The science needs to be there, e.g., experiments have to stand up, or appear to, from the evidence presented. And I don't think you have to declare your use of tools either. Who, pre-LLM's, hasn't done a grammar check and spell check before publishing and felt the need to declare that they used MS Word? Or, do you think major papers are not ghost-written by tech writers (if your institution
Re:Witch hunting, again (Score:5, Interesting)
Re: (Score:2)
if the output is sound
That statement there is doing a lot of heavy lifting. No one is concerned about using AI to "polish your English". The fact of the matter is much of what is generated by AI is hallucinated rubbish and I would question if the "polished" version still has the same intent and meaning as the original paper, without even considering whether the use of AI is just another indication of shortcuts being done by the scientific community.
Writing papers isn't hard but it does need to be incredibly precise and purposefu
Potential For False Findings (Score:3)
I don't want to delve into this too much, but my findings suggest that the potential outcome of pinning AI on anyone using these words may not be appropriate.
Notably, within our shorty study of 1 people, the insights lead us to believe that this is not going to be particularly helpful.
--
I am simply without motivation. - Joel H. Hildebrand
Re: (Score:2)
Marketing words (Score:2)
They trained the LLMs on marketing bullshit, so of course it's going to use marketing bullshit in its generated bullshit.
Futility of such analysis (Score:2)
Well... (Score:2)
Actually, one of the problems with AI generated stuff, is that all forensic methods used to detect AI can be easily defeated, once they are published. Efforts to close Pandora's box seem to be useless, at least for now.
Concerned (Score:2)
I'm concerned about the findings of this article that delves into examples showcasing the potential misuses of AI.
Gaming the stats (Score:2)
Any time these studies come out, pointing out the patterns seen in AI papers, the AI models will be adjusted to reduce those words to "normal." And so the arms race begins.
Maybe ... (Score:2)
Why would LLM generate more "delves"? (Score:2)
Don't LLMs pick the words, and sequences of words, based on the frequency in their training material? Should the test for LLM generated text be that it is remarkably average in word choice?
Re: (Score:2)
But the training material isn't just scientific papers.
Obvious project (Score:2)
Whim (Score:2)
"Potential" and "findings"? I thought the giveaway word was "whimsical".
detecting AI is futile (Score:2)
Tomorrow the LLMs will be tuned to avoid using more 'delves' and whatever.
How about today? (Score:1)