The Telltale Words That Could Identify Generative AI Text 43

Posted by msmash on Monday July 01, 2024 @12:55PM from the jig-is-up dept.

A new study suggests at least 10% of scientific abstracts in 2024 were processed using large language models, researchers from the University of Tubingen and Northwestern University report. Analyzing 14 million PubMed abstracts from 2010-2024, the team identified an unprecedented surge in certain "style words" following LLMs' widespread adoption in late 2022.

Words like "delves" and "showcasing" saw a 25-fold and 9-fold increase respectively in 2024 abstracts compared to pre-LLM trends. Common terms such as "potential" and "findings" also spiked in usage. The researchers drew parallels to studies measuring COVID-19's impact through excess deaths, applying a similar methodology to detect "excess word usage" in scientific writing.

The Telltale Words That Could Identify Generative AI Text

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 43 Comments Log In/Create an Account

Comments Filter:

maybe (Score:4, Interesting)

by groobly ( 6155920 ) writes: on Monday July 01, 2024 @01:00PM (#64592543)

On the other hand, there are more non English speakers writing papers, and they might be funneling them through LLMs to clean up the English.

- Re: (Score:3)
  
  by timeOday ( 582209 ) writes:
  
  Yeah. Even besides any underlying change in anything real, verbiage in research is trendy, with words like "alignment" suddenly coming into prevalence. Not quite up there with businessman buzzword bingo, but not immune.
  - Re: (Score:2)
    
    by AmiMoJo ( 196126 ) writes:
    
    Happened with Stack Exchange, back when that as relevant. Words like "performant", and questions awkwardly phrased as as "what are some X" to get around bans on asking for recommendations and lists.
    The specific language mentioned in TFA sounds like how UK newspapers often phrase things. For example, if the Daily Heil buys some paparazzi long lens photos of some minor celebrity in a bikini on their private property, they will sanitize it by describing them as "showcasing" their body to make it sound like the
    - Re: (Score:1)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
- Re: (Score:2)
  
  by cayenne8 ( 626475 ) writes:
  
  Hell, as a native (US) English speaker...I used words like "delves", "showcasing", "potential" and "findings" all the time, stretching to things I've written decades and decades before this recent AI phenomena.
  The reason these AI responses might use these words a lot, is that REAL people use these words a lot.
  Not sure this is a good test. I can foresee a lot of false positives on this one.
  - Re: (Score:3)
    
    by thegarbz ( 1787294 ) writes:
    
    No one here proposed this as a "test". The summary even points out that it is an "increase" not a unique trait. That said given your posting history a lot of us have a long standing belief that you're in fact a poorly coded AI beta so it's no surprise you use the word delves a lot :-P
    - Re: (Score:2)
      
      by cayenne8 ( 626475 ) writes:
      
      That said given your posting history a lot of us have a long standing belief that you're in fact a poorly coded AI beta so it's no surprise you use the word delves a lot :-P
      Hmm....I guess if you looked at ALL my posting history...you'd see I pre-date AI as we know it today.
      - Re: (Score:2)
        
        by LazarusQLong ( 5486838 ) writes:
        
        so, you are ELIZA?
- Re: (Score:2)
  
  by nicubunu ( 242346 ) writes:
  
  I am no native English speaker, so if I was to write such a paper I would start at looking at some (human or AI generated) to learn how they are made. If I see an excess of those specific words, probably I would assume that is the fashion and follow it.
We need... (Score:1)

by MpVpRb ( 1423381 ) writes:

...cryptographically secure methods of positively identifying text generated by LLMs
Using them is fine
Lying about using them is not
- Re: (Score:2)
  
  by awwshit ( 6214476 ) writes:
  
  When the mechanic works on your car do the tools need to be certified? Certainly the mechanic has tools to make the job faster/easier.
  - Re: (Score:2)
    
    by fibonacci8 ( 260615 ) writes:
    
    If the mechanic is certified, and outsources work to someone who isn't certified, the mechanic should lose the certification.
    - Re: (Score:2)
      
      by awwshit ( 6214476 ) writes:
      
      Its about the tools, not the mechanic.
      - Re: We need... (Score:2)
        
        by Kiyooka ( 738862 ) writes:
        
        I think there's a fine line where AI is no longer a mere tool or reference but providing the actual product, whether that be a drawing, a report reflecting actual original effortful analysis and critical thinking, etc. Defining line won't be easy though.
- Re:We need... (Score:5, Funny)
  
  by Chris Mattern ( 191822 ) writes: on Monday July 01, 2024 @01:49PM (#64592705)
  
  "...cryptographically secure methods of positively identifying text generated by LLMs"
  That can only be produced with the voluntary cooperation of the LLM. In the end, it can only prove that something was produced by a certain LLM. It cannot prove that something was *not* produced by an LLM. You've re-invented the evil bit.
  
No (Score:2)

by fleeped ( 1945926 ) writes:

If they are seen in LLM text, means they were statistically significant enough to begin with ("common" as the story says!). I'd use all four in a paper. Can we actually judge the content rather that the textual sugar-coating? Yes, it's harder, but absolutely necessary.
Would be nice to have a LLM-text-confidence metric as a prior to more elaborate and human-driven bullshit detectors though...
- Re:No (Score:4, Insightful)
  
  by PsychoSlashDot ( 207849 ) writes: on Monday July 01, 2024 @01:46PM (#64592691)
  
  If they are seen in LLM text, means they were statistically significant enough to begin with ("common" as the story says!). I'd use all four in a paper. Can we actually judge the content rather that the textual sugar-coating? Yes, it's harder, but absolutely necessary. Would be nice to have a LLM-text-confidence metric as a prior to more elaborate and human-driven bullshit detectors though...
  So... two things.
  
  One: this study found a 2,500% increase in the use of "delves" within abstracts. That you personally may or may not use that word - or any other one - isn't significant. What is significant is the statistical deviation from a few years ago before LLMs were in widespread use.
  
  Two: that a word is "common" and thus likely used by LLMs ignores context. Those words are common, yes, and the LLMs are trained on text that has influenced their output to include them. What's important to consider is that they are common marketing words, not common within clinical, precise language used for scientific papers. What this study is revealing is that these LLMs are context-unaware, generating output that is linguistically appropriate outside of scientific papers. Out another way, LLMs trained on the general internet are likely to include profanity. An LLM asked to generate a large number of children's novels stands a decent chance of eventually including some profanity because it is context-unaware. It doesn't understand context. That - apparently - the word "delve" is on the rise presumably in place of words like "investigate", "test", and "explore" is exactly that.
  
  - Re: (Score:2)
    
    by Black Parrot ( 19622 ) writes:
    
    found a 2,500% increase in the use of "delves" within abstracts.
    Probably they trained the LLM on a corpus of Dwarven engineering papers.
  - Re: (Score:2)
    
    by penguinoid ( 724646 ) writes:
    
    The study showed "delves" increased from "almost never" to "incredibly rarely".
    - Re: (Score:2)
      
      by PsychoSlashDot ( 207849 ) writes:
      
      The study showed "delves" increased from "almost never" to "incredibly rarely".
      You made me curious, so I went to the actual pre-print study to find the actual raw numbers. What I found instead was distressing.
      
      "We hope that future work will meticulously delve into tracking LLM usage more accurately and assess which policy changes are crucial to tackle the intricate challenges posed by the rise of LLMs in scientific publishing."
      
      Now I'm starting to think this a troll study.
      - Re: (Score:2)
        
        by penguinoid ( 724646 ) writes:
        
        There was also a previous paper about the use of "delve" increasing, no doubt included in this paper about "delve" increasing.
  - Re: (Score:2)
    
    by Bu11etmagnet ( 1071376 ) writes:
    
    > One: this study found a 2,500% increase in the use of "delves" within abstracts.
    They keep using that word. I do not think it means what they think it means.
Witch hunting, again (Score:2)

by cliffjumper222 ( 229876 ) writes:

As a consumer of scientific research, if the output is sound, I don't care if the text was written or polished by AI. The science needs to be there, e.g., experiments have to stand up, or appear to, from the evidence presented. And I don't think you have to declare your use of tools either. Who, pre-LLM's, hasn't done a grammar check and spell check before publishing and felt the need to declare that they used MS Word? Or, do you think major papers are not ghost-written by tech writers (if your institution
- Re:Witch hunting, again (Score:5, Interesting)
  
  by fleeped ( 1945926 ) writes: on Monday July 01, 2024 @01:32PM (#64592657)
  
  In a publish-or-perish world where paper and citation counts matter, and where the grammar/spell check tool can also fake the hard work, the temptation for misuse is too great.
  
- Re: (Score:2)
  
  by thegarbz ( 1787294 ) writes:
  
  if the output is sound
  That statement there is doing a lot of heavy lifting. No one is concerned about using AI to "polish your English". The fact of the matter is much of what is generated by AI is hallucinated rubbish and I would question if the "polished" version still has the same intent and meaning as the original paper, without even considering whether the use of AI is just another indication of shortcuts being done by the scientific community.
  Writing papers isn't hard but it does need to be incredibly precise and purposefu
Potential For False Findings (Score:3)

by Arzaboa ( 2804779 ) writes: on Monday July 01, 2024 @01:35PM (#64592667)

I don't want to delve into this too much, but my findings suggest that the potential outcome of pinning AI on anyone using these words may not be appropriate.
Notably, within our shorty study of 1 people, the insights lead us to believe that this is not going to be particularly helpful.
--
I am simply without motivation. - Joel H. Hildebrand

- Re: (Score:2)
  
  by LazarusQLong ( 5486838 ) writes:
  
  I thought the far right liked excess deaths, just so long as they were amongst the people with more melanin in their skin?
Marketing words (Score:2)

by ebunga ( 95613 ) writes:

They trained the LLMs on marketing bullshit, so of course it's going to use marketing bullshit in its generated bullshit.
Futility of such analysis (Score:2)

by sinij ( 911942 ) writes:

Potential findings that showcase what is essentially a frequency analysis is phrenology [wikipedia.org] of AI detection.
Well... (Score:2)

by LordHighExecutioner ( 4245243 ) writes:

...in my papers the word "convolutional" appears quite often (I work in DSP field), but I do not use LLM, I swear!
Actually, one of the problems with AI generated stuff, is that all forensic methods used to detect AI can be easily defeated, once they are published. Efforts to close Pandora's box seem to be useless, at least for now.
Concerned (Score:2)

by nealric ( 3647765 ) writes:

I'm concerned about the findings of this article that delves into examples showcasing the potential misuses of AI.
Gaming the stats (Score:2)

by Tony Isaac ( 1301187 ) writes:

Any time these studies come out, pointing out the patterns seen in AI papers, the AI models will be adjusted to reduce those words to "normal." And so the arms race begins.
Maybe ... (Score:2)

by cascadingstylesheet ( 140919 ) writes:

... the LLMs are more literate, lol
Why would LLM generate more "delves"? (Score:2)

by aegl ( 1041528 ) writes:

Don't LLMs pick the words, and sequences of words, based on the frequency in their training material? Should the test for LLM generated text be that it is remarkably average in word choice?
- Re: (Score:2)
  
  by jonbryce ( 703250 ) writes:
  
  But the training material isn't just scientific papers.
Obvious project (Score:2)

by RUs1729 ( 10049396 ) writes:

To develop an AI to find out whether a given text has been written by a human or generated by an AI.
Whim (Score:2)

by dohzer ( 867770 ) writes:

"Potential" and "findings"? I thought the giveaway word was "whimsical".
detecting AI is futile (Score:2)

by Pravetz-82 ( 1259458 ) writes:

If you can quantify some measurement of an AI output (text, music, images, etc) then the model can be trained to account for it.
Tomorrow the LLMs will be tuned to avoid using more 'delves' and whatever.
How about today? (Score:1)

by melanopsin ( 10167413 ) writes:

AI probably has the dictionary of Telltale Words by now. LOL

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

maybe (Score:4, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

We need... (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: We need... (Score:2)

Re:We need... (Score:5, Funny)

No (Score:2)

Re:No (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Witch hunting, again (Score:2)

Re:Witch hunting, again (Score:5, Interesting)

Re: (Score:2)

Potential For False Findings (Score:3)

Re: (Score:2)

Marketing words (Score:2)

Futility of such analysis (Score:2)

Well... (Score:2)

Concerned (Score:2)

Gaming the stats (Score:2)

Maybe ... (Score:2)

Why would LLM generate more "delves"? (Score:2)

Re: (Score:2)

Obvious project (Score:2)

Whim (Score:2)

detecting AI is futile (Score:2)

How about today? (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals