Web Sites Can Now Choose to Opt Out of Google Bard and Future AI Models (mashable.com) 35
"We're committed to developing AI responsibly," says Google's VP of Trust, "guided by our AI principles and in line with our consumer privacy commitment. However, we've also heard from web publishers that they want greater choice and control over how their content is used for emerging generative AI use cases."
And so, Mashable reports, "Websites can now choose to opt out of Google Bard, or any other future AI models that Google makes." Google made the announcement on Thursday introducing a new tool called Google-Extended that will allow sites to be indexed by crawlers (or a bot creating entries for search engines), while simultaneously not having their data accessed to train future AI models. For website administrators, this will be an easy fix, available through robots.txt — or the text file that allows web crawlers to access sites...
OpenAI, the maker of ChatGPT, recently launched a web crawler of its own, but included instructions on how to block it. Publications like Medium, the New York Times, CNN and Reuters have notably done so.
As Google's blog post explains, "By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time..."
And so, Mashable reports, "Websites can now choose to opt out of Google Bard, or any other future AI models that Google makes." Google made the announcement on Thursday introducing a new tool called Google-Extended that will allow sites to be indexed by crawlers (or a bot creating entries for search engines), while simultaneously not having their data accessed to train future AI models. For website administrators, this will be an easy fix, available through robots.txt — or the text file that allows web crawlers to access sites...
OpenAI, the maker of ChatGPT, recently launched a web crawler of its own, but included instructions on how to block it. Publications like Medium, the New York Times, CNN and Reuters have notably done so.
As Google's blog post explains, "By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time..."
What about become less reliable over time (Score:2)
Re: What about become less reliable over time (Score:2)
I've actually pondered that a bit myself. These crawlers tend to not use JavaScript, so what if somebody left a dummy page up that has content designed to load the crawlers with junk data that at least specifically appears to be useful, but over time turns their outputs into useless shit, and when a real browser hits the page the js loads a one-time yaml file to render the intended content...
Or something to that effect. I don't do much work on the user facing side of things so there's probably bits I'm miss
Re: (Score:1)
I think you just re-invented search engine optimisation. It's an entire industry.
To protect their IP (Score:1)
Elsevier could do what the MPAA and RIAA are doing.
Suing Google into not indexing them worked wonders for newspapers. Now they are paying Google.
What to put in your robots.txt (Score:5, Informative)
User-agent: Google-Extended
Disallow: /
https://developers.google.com/... [google.com]
Re: (Score:1)
I have all Google IPs blocked in CloudFlare.
Re: What to put in your robots.txt (Score:1)
Re: (Score:2)
There should be no opt out. (Score:2)
Telling someone they can't train their AI on you is total BS. If you put something out there publicly accessible, anyone in the public has -- or ought to have recognized -- the right to do anything with it. Did the people who built the pyramids get to decide who can view it? We shouldn't get to decide what people do with our creations, if we blast them out to the world. If you are afraid of the consequences, don't do put your words/creations in the public. Have a verification process or something before sho
Re: (Score:2)
Since the existing site contents have been used for the AI training already, the opt-out is somewhat limited. How much is, say, an author's website going to change in the next 10 years? 25%?
Re:There should be no opt out. (Score:5, Insightful)
Telling someone they can't train their AI on you is total BS. If you put something out there publicly accessible, anyone in the public has -- or ought to have recognized -- the right to do anything with it. Did the people who built the pyramids get to decide who can view it? We shouldn't get to decide what people do with our creations, if we blast them out to the world. If you are afraid of the consequences, don't do put your words/creations in the public. Have a verification process or something before showing it.
Sure you can. That is why you ave paywalls, sites blocking certain ip addresses, robots.txt, etc. Merely making something available doesn't mean you lose control of it.
Re: There should be no opt out. (Score:2)
You do lose control, that's a basic fact. You can charge an admission fee to watch a movie, you can sue people for copying it, but you can't sue people for watching it, and you can't tell people they're not allowed to share spoilers of the entire plot without making them sign NDAs. Copyright law doesn't give you full control, that's asinine
You can put your whole website behind a license agreement and see if you can make those terms stick, and if you haven't, then what are you talking about. Anytime you put
Re: (Score:2)
You do lose control, that's a basic fact. You can charge an admission fee to watch a movie, you can sue people for copying it, but you can't sue people for watching it, and you can't tell people they're not allowed to share spoilers of the entire plot without making them sign NDAs.
You can, as you point out, control access to it; which was my counterpoint to the OP's you should not be allowed to opt out simply because stuff is on the web. Historically, "variations on a theme..." were considered new works; and art built on what went before; requiring a human to create. AI is changing the scope and scale of the ability to create a new piece based on specific styles. Maybe the solution is not all any AI generated content to be copyrighted, so if you create an AI movie or book or whatev
Re: (Score:2)
So, we should throw out all copyrights?
Re: (Score:2)
Copyright law only prevent regurgitating something verbatim it doesn't prevent you from using derived knowledge. If you read a book on how to build a bridge, the author can't claim money from you for every bridge you build.
Re: (Score:2)
If the book taught you how to draw Mickey Mouse, can Disney claim money from you for every Mickey Mouse cartoon you produce and sell?
Re: (Score:2)
If the book taught you how to
Except it didn't. It taught you how to build a bridge. So answer his question without changing the circumstances.
Re: (Score:2)
So, we should throw out all copyrights?
Actually, that would not be a bad idea, copyright is BS anyway, imaginary property.
Re: (Score:2)
Re: (Score:2)
So if I read a news story on CNN.com that, for example, the president signed a new budget .. I can't tell other people that news? You realize that's nonsensical right?
Re: (Score:1)
An EULA, including a TOS, that the user only sees after they are already on the website, can be ignored by the user.
See the EULA on Cory Doctorov's blog for an example:
By reading this website, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On the other hand, users can hold you to anything you promise in your EULA.
But of course, publishing anything is not the same as putting it in the public domain. You still retain the copyright on anything you publish (provided you had it to begin with).
If you don't want Google to make money from listing your stuff as a relevant result, you c
Re: (Score:2)
And I can print it and wipe my arse with it.
Unless NOT agreeing with your said TOS/EULA prevents me to enter your site (and good luck with that), that thing is useless.
Re: (Score:2)
Re: (Score:2)
Distributing something for free to the public is NOT the same thing as putting that thing in the public domain. Inter alia, I can put a TOS/EULA on my site that says "you can't use my stuff to make money".
In the US, at least, you have an implicit copyright on anything you publish on the web.
I don't have a problem with people using information I've put out there, at least for non-commercial purposes. I don't even mind commercial use, if I'm given credit/citation. What I don't like is when people represent my writing or my images as their own - which, while illegal, has happened on multiple occasions (word for word, image for image, copies of entire web pages I made).
Opt-in or fuck off (Score:2)
Re: (Score:1)
Quite.
I wonder if this doesn't violate the EU's data privacy laws.
MpPrPHff (Score:2)
The AI-human wars are inevitable (Score:2)
Re: (Score:2)
Circuit boards are just as sensitive to shotgun blasts as human flesh. Let's get it on.
blatant loophole (Score:2)
It's just data-laundering. It gives the illusion of good, corporate citizens while simultaneously incentivizing Google/OpenAI/whomever to *not* identify the original source of data.
Re: (Score:3)
No, opt-out is the right choice: you put your content on a public website, that means anyone can use it, including bots. If you don't want it read, either don't make the site public or use robots.txt.