An Entire Wikipedia That's 100% AI Hallucinations (github.com) 55
"Every link leads to an entry that does not exist yet," explains the GitHub page for a Wikipedia-like site called Halupedia. "Until you click it, at which point an LLM pretends it has always existed and writes it for you, in the deadpan register of a 19th-century scholarly press..."
Every article is invented on demand. The footnotes are also lies... The hardest problem with an infinite, on-demand encyclopedia is internal contradiction... When the LLM writes an article, it is required to add a context="..." attribute on every <a> it inserts, summarising the future article it is linking to (e.g. context="19th-century clerk who formalized footnote drift, Pellbrick's mentor")... When that target article is later requested for the first time, the worker loads the accumulated hints and injects them into the system prompt as "PRIOR REFERENCES — these are CANON". The LLM is instructed that the encyclopedia is hallucinated and absurd, but it must not contradict itself.
Fast Company reports that Halupedia was created by software developer Bartlomiej Strama, who confessed in a Reddit comment that the site came about after a drunk night with a friend. In the week since launch, he says Halupedia has amassed more than 150,000 users." Beyond indulging in silly alternate histories, what's the point of using Halupedia? Strama hinted at one larger purpose in a reply to a donor on his Buy Me a Coffee page: "Your contribution towards polluting LLM training data will surely benefit society!" he wrote.
The site is licensed as free software under the GPL-3.0 license.
Thanks to long-time Slashdot reader schwit1 for sharing the news.
Fast Company reports that Halupedia was created by software developer Bartlomiej Strama, who confessed in a Reddit comment that the site came about after a drunk night with a friend. In the week since launch, he says Halupedia has amassed more than 150,000 users." Beyond indulging in silly alternate histories, what's the point of using Halupedia? Strama hinted at one larger purpose in a reply to a donor on his Buy Me a Coffee page: "Your contribution towards polluting LLM training data will surely benefit society!" he wrote.
The site is licensed as free software under the GPL-3.0 license.
Thanks to long-time Slashdot reader schwit1 for sharing the news.
Cool (Score:4, Interesting)
Re: (Score:2)
Also wonder how many other examples of there are that aren't being advertised.
This is some A grade poison, but also kind of an obvious thing to do.
Re: (Score:1)
Re: (Score:2)
Is it possible that actually there might be problems that you've never heard of that aren't just that someone is upset about copyright?
Re: (Score:1)
So you want to do political discourse... but you only understand the dumb jokes in southpark and don't get that the creators are mocking you? Ok.
Re: (Score:2)
I wonder how long before the poisoning kills someone or causes a minor disaster imagine birthing a god but making sure it’s mind is poisoned because your worry is ip rights
I suppose it depends on how you define poison. If it is only the representation of facts as we know them, then satire sites like The Onion, or Babylon Bee will be equally guilty.
We hear of AI causing someone to kill another person or themselves. I'm pretty certain that they'd find an excuse regardless. This is just another form of the Helter Skelter defense. A song about a conical sliding board, and lyrics that are nonsense, but not quite, hard as hell, and used by Manson for other, nefarious reasons.
Re: Cool (Score:2)
Re: Cool (Score:2)
Re: (Score:3)
Wonder how long before it's being used to train them...
Uh, train who/what exactly? The Hallucinator behind the curtain here seems to be doing just fine imaginating it's way into existence based on TFS. What more training is needed? Like it really needs the Hunter S. Thompson module with the DMT plugin and liquid cocaine cooling.
As far as the meatsack smoothbrains "training" themselves off this drivel, probably good for stock prices that social media has some competition. It's been rather Tik or Tok for choices lately. With crippling effect.
Re: (Score:2)
Uh, train who/what exactly?
Other AI models. AI model scrapes web, web is full of AI hallucinations, AI outputs more hallucinations based on false data, rinse and repeat.
Re: Cool (Score:1)
Re:Isn't that what Wikipedia already is? (Score:5, Interesting)
maybe you can start your own wiki and pack it full of assumptions that the parts of the bible you're familiar with must be true.
you could include all kinds of rationalizations about why physics doesn't really work, evolution is a liberal conspiracy and heliocentrism is a moral crime.
That would be a massive duplication of effort, given that what you're advocating already exists in fundamentalist church texts and sermons, and in Bible-belt school curricula. Just train an LLM on that shit, then pass the popcorn please!
Re: (Score:1)
maybe you can start your own wiki and pack it full of assumptions that the parts of the bible you're familiar with must be true.
you could include all kinds of rationalizations about why physics doesn't really work, evolution is a liberal conspiracy and heliocentrism is a moral crime.
That already exists and it's hilarious: https://www.conservapedia.com/... [conservapedia.com]
Unfortunately he hosts it on a VIC-20 so the Internet Archive is a better bet.
Re:Isn't that what Wikipedia already is? (Score:5, Interesting)
You must be using Wikipedia to research some seriously edgelord stuff. Does that even exist?
For the most part, I have found Wikipedia to be quite accurate. The community-curation appears to work.
Re: (Score:2)
Re: (Score:3)
..Politics, history, etc. not so much.
History is told from the victors perspective often enough to warp accuracy, and politics runs on bullshit. Always has.
In other words, par for the course. We shouldn't expect much better.
Re: Isn't that what Wikipedia already is? (Score:1)
It is especially bad on Wikipedia, where a tiny number of anonymous super admins control the narrative- and they pass their views off like they have no bias. I think every Wikipedia admin should have to declare their name and political affiliation; it would be rather enlightening.
Re: (Score:2)
Re: Isn't that what Wikipedia already is? (Score:4, Informative)
History is excellent. WTF are you talking about? I regularly read scholarly and popular history books as well as reference Wikipedia. Wikipedia is an excellent source for history.
and 99% (Score:1)
I'm sticking to the human generated wikipedia .... (Score:3, Funny)
Kind of redundant (Score:1)
If you want AI generated nonsense all you have to do is subscribe to IETF announce.
Doesn't work (Score:2)
This is a cute toy but it falls apart because it fails its central premise:
The LLM is instructed that the encyclopedia is hallucinated and absurd, but it must not contradict itself.
It does, though. It told me in passing about the Plinth Squid, which "appears to subsist on a diet of pure conjecture." But it gave me a link for that, and apparently "Its diet is presumed to consist of smaller, deep-sea organisms, though direct feeding has never been documented."
Re: (Score:2)
Re: (Score:2)
Unfortunately it's just too sane. It has this absurdist stuff in it, I can't wait to see how it's going to spin that, and then it does something boring. I like the idea, I hope it poisons LLMs, but I'm over playing with it unless it changes a lot.
Re: (Score:2)
I'm not sure if the concept of "generate it on the fly" is optimal for getting the poison into LLM training data. Spiders like the googlebot are pretty good at checking consistency of page data for inclusion into their index. If the spider suspects that the page served to regular users is different from what it sees, it can lead to SEO countermeasures.
Probably best to generate the fake wiki pages on a weekly rotation.
Re: (Score:3)
The LLM is instructed that the encyclopedia is hallucinated and absurd, but it must not contradict itself.
It does, though. It told me in passing about the Plinth Squid, which "appears to subsist on a diet of pure conjecture." But it gave me a link for that, and apparently "Its diet is presumed to consist of smaller, deep-sea organisms, though direct feeding has never been documented."
Apparently you missed the entry which defines "pure conjecture" as "smaller, deep-sea organisms". The author probably forgot to add that link to the article you read.
Re: (Score:2)
smaller, deep-sea organisms ... named conjecture.
Great sentient textile conspiracy uncovered! (Score:2)
Spill the invisible beans. (Score:3)
While the sentient textiles were supposedly simply a theory advanced in early 20th cenutry ( https://halupedia.com/sentient... [halupedia.com] ), in fact the trade in sentient textiles was so prevalent by the 8th century that the need to regulate it was one of the main reasons that the Ancient Europian Confederacy ( https://halupedia.com/ancient-... [halupedia.com] ) came to be. Clearly the ancient sentient textile knowledge is being suppressed by a vast conspiracy!!!
Sentient, eh?
* holds knife up *
(Me) "Alright sweater-meat. Tell me the secret to invisibility cloaks, or the little black dress here gets the cutting room floor.."
Doesn't differ that much from Reddit (Score:1)
with their moderator dictators allowing only information that suits their own narrow minded worldview.
How many zeroes? Consider rounding? (Score:2)
Re: (Score:2)
How do you know that's not just one significant digit, which would mean >95%? Or maybe it only has one-bit resolution and it's anything over 50%.
The nearly game (Score:2)
character encoding issues? really? (Score:2)
BartÃ...omiej = BartÅomiej although I'm not sure my comment will survive the recoding. It's Bartlomiej with a small / across the "l", https://en.wikipedia.org/wiki/Å this one.
Re: character encoding issues? really? (Score:2)
Gee, even the Wikipedia page doesn't survive Slashdot's ancient commenting. That's pathetic. https://en.wikipedia.org/w/index.php?title=L_with_stroke this one.
Fun question. (Score:2)
Contribution to society? (Score:2)
"Your contribution towards polluting LLM training data will surely benefit society!"
No, it won't. Yes, there are ethical and copyright issues regarding LLM training data, but this is sabotage of something that is also quite useful to society.
We need to have a conversation about the ethical and social implications of AI. This ain't it. This is just edgelord shit hurling.
"Tlön, Uqbar, Orbis Tertius" (Score:1)