The Editors Protecting Wikipedia from AI Hoaxes (404media.co) 59
A group of Wikipedia editors have formed WikiProject AI Cleanup, "a collaboration to combat the increasing problem of unsourced, poorly-written AI-generated content on Wikipedia." From a report: The group's goal is to protect one of the world's largest repositories of information from the same kind of misleading AI-generated information that has plagued Google search results, books sold on Amazon, and academic journals. "A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar 'styles' using ChatGPT," Ilyas Lebleu, a founding member of WikiProject AI Cleanup, told me in an email. "Discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles, which we quickly wanted to formalize into an organized project to compile our findings and techniques."
In many cases, WikiProject AI Cleanup finds AI-generated content on Wikipedia with the same methods others have used to find AI-generated content in scientific journals and Google Books, namely by searching for phrases commonly used by ChatGPT. One egregious example is this Wikipedia article about the Chester Mental Health Center, which in November of 2023 included the phrase "As of my last knowledge update in January 2022," referring to the last time the large language model was updated.
In many cases, WikiProject AI Cleanup finds AI-generated content on Wikipedia with the same methods others have used to find AI-generated content in scientific journals and Google Books, namely by searching for phrases commonly used by ChatGPT. One egregious example is this Wikipedia article about the Chester Mental Health Center, which in November of 2023 included the phrase "As of my last knowledge update in January 2022," referring to the last time the large language model was updated.
Plausibly (Score:1)
One could use their change-sets to fuel an AI that spots AIs.
Or for the more nefariously minded, train an LLM that rewrites LLM trash to sound less like LLM trash.
Re: (Score:2)
"Or for the more nefariously minded, train an LLM that rewrites LLM trash to sound less like LLM trash."
That is the same idea like "Why don't we train an AI to detect good AI images and throw away all the bad ones".
If you could do this, the generator AIs would already have it integrated (in fact you could integrate such a tool into the training so you do not even need to generate the bad images afterward).
A LLM that can rewrite LLM trash to sound better can be used as alternative to the LLMs that write tras
Re: (Score:1)
That's called a generative adversarial network and it's the foundation for a lot of the modern generative tech.
But also I do mean something more specific. "Less detectable to people trying to do detection" at the cost of "Less like real writing" is a thing a lot of people(spammers) want from LLMs.
Re: (Score:2)
> That's called a generative adversarial network
That's exactly the point. If you have the classifier, you use it to train the generator, so you don't need to classifier afterward.
> But also I do mean something more specific. "Less detectable to people trying to do detection" at the cost of "Less like real writing" is a thing a lot of people(spammers) want from LLMs.
Less detectable is more or less the same goal as having text that reads fluently. The things you can use to detect LLM content are exactly
Re: (Score:1)
It's not because, if you look, the project in TFA uses metrics and key words and other triggers to flag their targets, not just "is it gibberish/wrong" which is a standard all of wikipedia nominally tries to enforce on all edits, ai or not.
Re: (Score:2)
Let me get this straight... the same secretive, incestuous cabal of Fucked Up Antisemitic admins that block people for weeks-on-end and remove talkpage access without any review... are supposedly "protecting wikipedia from AI"?
Yeah, I call bullshit.
Re: (Score:2)
Remarkably close to this, Wikipedia was promoting a tool [mozilla.org] based on an LLM that allows users to add statements from new sources, although in this case the caveat is that the user has to find a citable source first.
Not AI (Score:5, Insightful)
I wish people would stop referring to large language models as "Artificial Intelligence."
It is sophisticated pattern-matching software. It doesn't in any way "know" what the text it produces means; it just makes text that looks like the patterns of text that are made by humans who do.
Re: (Score:3)
^^^^THIS^^^^
The misnamed "AI" output is a mashup of whatever else mentions the same salient terms. It has not to me shown any evidence of being thought out, It shows no evidence of original content or analysis of concept. It ingests everything, including dross, and doesn't seem to be able to tell the difference in an analytical way..
There do seem to be those who believe in the "turtles all the way down" concept of LLMs that analyze other LLM outputs, and perform useful functions like editing for clarity, or
Re: (Score:3)
The LLMs analyzing output from other LLMs tends to go off the rails with much more "hallucination" effects. Garbage in, garbage out. Even output that is intended to closely resemble natural language has enough garbage in it to screw up the learning. You take a mimeograph of a mimeograph of a mimeograph all the way down and you end up with something nasty.
(for those who don't know what mimeographs are, they're a messy organic solution allowing documents to reproduce. Also get off my lawn!)
However GPT4 ha
Re: (Score:2)
Everyone understands what is meant. We're not all on the spectrum like you.
Hey! Us guys on the spectrum invented, like, transistors & stuff! iPhones! Chia pets! If it weren't for us, there wouldn't even be any large language models.
This again? (Score:2)
Stop. It's not going to change and you don't look smart by saying this.
You're not the arbiter of this. You can't even properly define "Intelligence".
Stop polluting Slashdot with this utterly uninteresting and unending pointless 'insight".
Re: (Score:2)
And stop saying crypto when they mean blockchain-based currencies.
Don't fight it, we already lost. Neuronal networks are now AI, because the term is catchier than all alternatives. Try to get people to say ML ...
Re:Not AI (Score:4, Insightful)
Scary fact: That's how you we operate as well. You and I don't know what we are going to say 5 words from now: it just flows out of a complicated statistical model in our brains. Sometimes we say things then realize they are wrong only after we hear ourselves say it. And sometimes we still don't know.
Decades ago, a program that played Chess was considered "Artificial Intelligence." Then we moved the goal posts to "Well, they can't beat a grand master." (Most of us can't!) Then we felt safe behind the Turing Test: Phew! We humans could still look down upon AI as not *really* intelligent. Today's AIs are based on neural networks, similar to how biological brains operate. They pass the Turing Test with flying colors, yet we still refuse to say they are "Artificial Intelligence." What test must they pass next?
This really sounds like the No True Scotsman [wikipedia.org] fallacy, or moving the goal posts. Science fiction is rife with characters like in Star Trek: TNG who claimed that Commander Data wasn't really alive or wasn't really intelligent. No matter how smart these systems get, they are never worthy of the label "Artificial Intelligence."
For all I know, there is some being looking at our conversation on a monitor telling his colleague "Nice program, but it *still* isn't artificial intelligence."
Re:Not AI (Score:4, Interesting)
"What test must they pass next?"
Critical Thinking Test.
When "AI" can actually QUESTION the data fed to it, and thus the commands it has been given. Otherwise, it is merely the extension of a human's desires, providing us with what we programmed it to do. We humans possess critical thinking, so a human equivalent should as well.
Re: (Score:2)
When "AI" can actually QUESTION the data fed to it, and thus the commands it has been given.
Today's LLMs are perfectly capable of questioning things. You can go and do that right now. I have a colleague who is developing an LLM to debug hardware problems and it regularly asks follow-up questions. Most current LLMs are trained to respond imperatively not interrogatively so you don't see it often. I argue that you have set the bar too low. Questioning things isn't enough since even Eliza did that.
Otherwise, it is merely the extension of a human's desires, providing us with what we programmed it to do.
Ahh now THAT is an interesting question. Can you give an AI agency, and if so, what would it do wi
Re: (Score:2)
o if I were to ask you to perform a calculation in your head, would you just go "well, 284 + 790, that clearly is a number, what feels most right to me is 1000"?
I suspect many people would do exactly that. Are they not intelligent?
Or would you contemplate the problem a bit and do the actual addition, finding out what you're going to answer, before answering?
Are you saying that the ability to speak to oneself is what makes something intelligent? In that case, we have had AI for 30 years. Are you suggesting that people who cannot think silently to themselves are not intelligent?
Re: (Score:2)
For all I know, there is some being looking at our conversation on a monitor telling his colleague "Nice program, but it *still* isn't artificial intelligence."
...and the colleague replies, "I'm not sure that it's even intelligence."
Re: (Score:2)
Except that Artificial Intelligence has been the term since the 70s, despite there being no Intelligence. It is a research field to investigate and try to have someting resembling intelligence but without explicitly programming it all. Machine learning, or machine adaptation.
Re: Not AI (Score:3)
Artificial intelligence is a term that's been around for a long time. It's not going away anytime soon, but the goal posts keep moving. The A* traversal algorithm was considered artificially intelligent. The threshold for artificial intelligence used to be a program that could be the world champion at chess. At one time artificial intelligence was synonymous with computer vision. If you could make software that recognized and identified objects, you had artificial intelligence.
Re: (Score:3)
I wish people would stop referring to large language models as "Artificial Intelligence."
Artificial Intelligence is a field of scientific endeavor. Like Mathematics.
LLMs are a part of artificial intelligence. Like addition is a part of mathematics.
It should not be confused with the entire field... but it is a part of it. Therefore the term is applicable -even if wildly misleading as commonly used.
Re: (Score:3)
The term you're looking for is "strong AI" or "artificial general intelligence" (AGI.) As others have pointed out, the term "artificial intelligence" has always referred to all forms of research into automated decision-making. Perhaps Hollywood has misled plebs on this point by bombarding the public with portrayals of human-level thinking by machines, but bringing that viewpoint into a community like Slashdot is only ever going to get you shouted down.
Re: (Score:2)
"Hacking" used to imply bad actors shenanigans.
"You can use file sharing but you're not allowed to upload" is a wrong use of upload, there's only downloading.
"Piracy" for unlicenced copying. (Which is legal in many cases.)
These are the ones that come to mind, ruined mostly by crappy journalism, and we've lost the war. I just quietly chant this, whenever I read "AI": Artificial intelligence has as much to do with real intelligence as artificial flowers have to do with real flower
They will lose this fight (Score:1)
It is easier to destroy than to create, and with AI generating the poison content there is no long term scenario where the legitimate content wins out that doesn't involve forcing contributors to register with verified government ID and paid admins policing them.
Re: (Score:2)
Current LLM content doesn't include valid sources, so it can be removed immediately per Wikipedia's guidelines. https://en.wikipedia.org/wiki/... [wikipedia.org]
And pages can already be set to show the latest content validated by contributors with more experience. Maybe that will become more common. https://en.wikipedia.org/wiki/... [wikipedia.org]
Re:They will lose this fight (Score:5, Insightful)
LLM cannot give valid sources. But systems involving LLM can. See for example Perplexity, what integrates a websearch with a LLM so it can source its text. This may or may not result in useful texts, but it provides valid sources to look up if the text is correct.
Re: (Score:2)
Generally those using chat AI outputs do so to summarize writing that is already there. So one could write up a bad wikipedia article, with valid citations, then have the LLM "clean it up". Especially useful for non-native speakers of the language.
Re: They will lose this fight (Score:2)
Obviously, I have an opinion: Wikipedia sucks because of its crusading editors, and I could care les
buy wikipedia? (Score:2)
I don't know the solution but Wikipedia seems to have become an indispensable asset to the free world. Could an international consortium buy Wikipedia with a clear and un-editable mission statement? Something like the E.U. but hopefully, more countries including the U.S. Not government specifically, but one that governments support and participate in?
I can't imagine the consequences if Wikipedia just disappeared or become completely biased. They might be hard to quantify, but I have no doubt they would be
Re: (Score:1)
Re: (Score:2)
Re: (Score:2, Informative)
Ah, like Conservapedia, chock full of falsehoods that present a deliberately biased view. Starts with the assumption that existing wikipedia is liberally biased (or biased in other ways opposed to Conservapedia authors), and existing print encyclopedias are also highly biased, therefore balance the scales by biasing in the opposite way. The result is a wiki encyclopedia firmly in favor of one, and only one, style of creationism forming a sizeable bulk of the content, along with conspiracy theories, diatri
Re: (Score:3)
Wikipedia is a project of Wikimedia Foundation. And Wikimedia Foundation isn't a corporation, it's an American 501(c)(3) nonprofit organization. https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
Re: (Score:3)
Ah, but "international" means un-American :-) Some people will assume it must be biased. Effectively there's no way to avoid bias or the appearance of bias. The best you can do is be open and clear with everything, and Wikipedia mostly does that already.
Re: (Score:2)
Re: (Score:3)
While American democracy is paid for by big donors who give money in exchange for favorable laws and decisions, Wikipedia is written by people like you and me who don't care about the big donors.
Re: (Score:2)
The Wikimedia Foundation reached in 2021 its goal of collecting a 100 million dollar endowment, where it can be self-funding. From the 2023 financial report: +10 millions in investment profit, +14 millions in donations, -5 millions in operational costs. So it's right now operating at a profit, and would still be able to sustain Wikipedia even if donations disappeared. They made it, they're now long-term independent from donors. https://wikimediaendowment.org... [wikimediaendowment.org]
Re: (Score:2)
Re: (Score:2)
Wikimedia Foundation isn't a corporation, it's an American 501(c)(3) nonprofit organization.
A 401(c) is by definition a non-profit corporation. They are funded by contributions, which reminds me I need to send them some money.
I am not sure it matters whether content is generated by AI or some other process. The issue is whether an entry is factually accurate, truthful and not spun to someones interest. Its pretty obvious that is not the case for a lot of entries now. Because Wikipedia is the first goto for many of us, there are lots of people trying to influence the content.
There was recently a
Re: (Score:2)
I can't imagine the consequences if Wikipedia just disappeared or become completely biased
Already is completely biased, and has been that way for years now.
Wikipedia has a lot of bot generated content (Score:3)
Re: Like AI hoaxes are the problem. (Score:2, Informative)
You don't know what leftists are.
Hint: they aren't fascists by definition. If they are fascists then they aren't leftists, and vice versa.
Re: Like AI hoaxes are the problem. (Score:2)
Words have meanings and those meanings matter. If this makes you sad, by all means, cry more.
Re: Like AI hoaxes are the problem. (Score:2)
Self identifying leftists behaving strikingly like fascists then.
Anyone who tried to seriously contribute to Wikipedia knows that there is some truth behind the unhinged comment of the AC above.
When some groups of editors have decided to do something to a page, there's nothing you can do, there are so many rules that they will always find one to throw at their opponent, yet they ignore any rule thrown at them, even if they clearly violate it.
Others have decided that a page is their page and will simply reve
Re: (Score:2)
Self identifying leftists behaving strikingly like fascists then.
Get some more words in your vocabulary. Try "authoritarian".
Re: Like AI hoaxes are the problem. (Score:2)
Sure, authoritarian. That doesn't make my argument any less true.
Re: Like AI hoaxes are the problem. (Score:2)
And when they are called on their bullshit too much, and it attracts too much attention, they will simply lock the page, so they can push their bullshit unhindered, like the idea that Yasuke was a samurai, or then renaming the articles about "muslim grooming gangs in the UK" to "moral panic about grooming gangs in the UK".
Don't tell me that's not biaised.
Re: (Score:3)
The construct leftist-authoritarian-corporate-state is inherently meaningless. Socialists are somewhat OK with corporations, but outside of conservative ravings socialism is not the same thing as far left or communist governments. For example Porsche-Volkswagen has a large percent of ownership by the German State of Bavaria. The Norwegian government has a hugely successful sovereign wealth fund which invests in the private sector.
Although fascism and communism are both inheren
What's 'unnatural' writing thse days (Score:2)
No typos, no spelling errors, grammatically correct, multi-syllable words, very suspicious.:-)
Re: (Score:3)
This is actually a great way of detecting copyright violations on Wikipedia. I've been suspicious a few times: perfect text appearing in an article on a trail somewhere, or twenty episodes of some kid's show getting plot summaries. Most of the time a quick search shows it's been copied and pasted from some commercial site. For the plot summaries there's also a style of writing that makes you know it's from a TV company website: "There's trouble in the house when Danny brings back a xylophone."
Re: (Score:2)
"There's trouble in the house when Danny brings back a xylophone."
I had a neighbor who played the xylophone and it brought trouble to ALL the houses around.:-)