Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Google

Web Sites Can Now Choose to Opt Out of Google Bard and Future AI Models (mashable.com) 35

"We're committed to developing AI responsibly," says Google's VP of Trust, "guided by our AI principles and in line with our consumer privacy commitment. However, we've also heard from web publishers that they want greater choice and control over how their content is used for emerging generative AI use cases."

And so, Mashable reports, "Websites can now choose to opt out of Google Bard, or any other future AI models that Google makes." Google made the announcement on Thursday introducing a new tool called Google-Extended that will allow sites to be indexed by crawlers (or a bot creating entries for search engines), while simultaneously not having their data accessed to train future AI models. For website administrators, this will be an easy fix, available through robots.txt — or the text file that allows web crawlers to access sites...

OpenAI, the maker of ChatGPT, recently launched a web crawler of its own, but included instructions on how to block it. Publications like Medium, the New York Times, CNN and Reuters have notably done so.

As Google's blog post explains, "By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time..."
This discussion has been archived. No new comments can be posted.

Web Sites Can Now Choose to Opt Out of Google Bard and Future AI Models

Comments Filter:
  • It would seem that an AI system could be overwhelmed with , for example, fake science, if legitimate sites opt out. I get wanting to protect their IP, but ultimately some sort of compensation system may be best overall.
    • I've actually pondered that a bit myself. These crawlers tend to not use JavaScript, so what if somebody left a dummy page up that has content designed to load the crawlers with junk data that at least specifically appears to be useful, but over time turns their outputs into useless shit, and when a real browser hits the page the js loads a one-time yaml file to render the intended content...

      Or something to that effect. I don't do much work on the user facing side of things so there's probably bits I'm miss

    • Elsevier could do what the MPAA and RIAA are doing.

      Suing Google into not indexing them worked wonders for newspapers. Now they are paying Google.

  • by test321 ( 8891681 ) on Sunday October 01, 2023 @10:49AM (#63891763)

    User-agent: Google-Extended
    Disallow: /

    https://developers.google.com/... [google.com]

    • by boulat ( 216724 )

      I have all Google IPs blocked in CloudFlare.

    • Yeah, because Google have a history of respecting people's wishes & complying with legally binding agreements with healthcare & educational institutions, right? They wouldn't dream of abusing users' wishes or misleading them, would they?
  • Telling someone they can't train their AI on you is total BS. If you put something out there publicly accessible, anyone in the public has -- or ought to have recognized -- the right to do anything with it. Did the people who built the pyramids get to decide who can view it? We shouldn't get to decide what people do with our creations, if we blast them out to the world. If you are afraid of the consequences, don't do put your words/creations in the public. Have a verification process or something before sho

    • Since the existing site contents have been used for the AI training already, the opt-out is somewhat limited. How much is, say, an author's website going to change in the next 10 years? 25%?

    • by Registered Coward v2 ( 447531 ) on Sunday October 01, 2023 @11:05AM (#63891805)

      Telling someone they can't train their AI on you is total BS. If you put something out there publicly accessible, anyone in the public has -- or ought to have recognized -- the right to do anything with it. Did the people who built the pyramids get to decide who can view it? We shouldn't get to decide what people do with our creations, if we blast them out to the world. If you are afraid of the consequences, don't do put your words/creations in the public. Have a verification process or something before showing it.

      Sure you can. That is why you ave paywalls, sites blocking certain ip addresses, robots.txt, etc. Merely making something available doesn't mean you lose control of it.

      • You do lose control, that's a basic fact. You can charge an admission fee to watch a movie, you can sue people for copying it, but you can't sue people for watching it, and you can't tell people they're not allowed to share spoilers of the entire plot without making them sign NDAs. Copyright law doesn't give you full control, that's asinine

        You can put your whole website behind a license agreement and see if you can make those terms stick, and if you haven't, then what are you talking about. Anytime you put

        • You do lose control, that's a basic fact. You can charge an admission fee to watch a movie, you can sue people for copying it, but you can't sue people for watching it, and you can't tell people they're not allowed to share spoilers of the entire plot without making them sign NDAs.

          You can, as you point out, control access to it; which was my counterpoint to the OP's you should not be allowed to opt out simply because stuff is on the web. Historically, "variations on a theme..." were considered new works; and art built on what went before; requiring a human to create. AI is changing the scope and scale of the ability to create a new piece based on specific styles. Maybe the solution is not all any AI generated content to be copyrighted, so if you create an AI movie or book or whatev

    • by Ichijo ( 607641 )

      If you put something out there publicly accessible, anyone in the public has -- or ought to have recognized -- the right to do anything with it.

      So, we should throw out all copyrights?

      • Copyright law only prevent regurgitating something verbatim it doesn't prevent you from using derived knowledge. If you read a book on how to build a bridge, the author can't claim money from you for every bridge you build.

        • by Ichijo ( 607641 )

          If you read a book on how to build a bridge, the author can't claim money from you for every bridge you build.

          If the book taught you how to draw Mickey Mouse, can Disney claim money from you for every Mickey Mouse cartoon you produce and sell?

          • If the book taught you how to

            Except it didn't. It taught you how to build a bridge. So answer his question without changing the circumstances.

      • So, we should throw out all copyrights?

        Actually, that would not be a bad idea, copyright is BS anyway, imaginary property.

    • by larwe ( 858929 )
      Distributing something for free to the public is NOT the same thing as putting that thing in the public domain. Inter alia, I can put a TOS/EULA on my site that says "you can't use my stuff to make money".
      • So if I read a news story on CNN.com that, for example, the president signed a new budget .. I can't tell other people that news? You realize that's nonsensical right?

      • An EULA, including a TOS, that the user only sees after they are already on the website, can be ignored by the user.
        See the EULA on Cory Doctorov's blog for an example:

        By reading this website, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

        On the other hand, users can hold you to anything you promise in your EULA.

        But of course, publishing anything is not the same as putting it in the public domain. You still retain the copyright on anything you publish (provided you had it to begin with).

        If you don't want Google to make money from listing your stuff as a relevant result, you c

      • by Alumoi ( 1321661 )

        And I can print it and wipe my arse with it.
        Unless NOT agreeing with your said TOS/EULA prevents me to enter your site (and good luck with that), that thing is useless.

        • by larwe ( 858929 )
          Of course it's functionally useless. But if you make money by printing copies of my website on custom toilet paper, I can sue you. The point here is - because I give you access to it does not give you _ownership_ of it or the right to make derivative works yada yada.
      • Distributing something for free to the public is NOT the same thing as putting that thing in the public domain. Inter alia, I can put a TOS/EULA on my site that says "you can't use my stuff to make money".

        In the US, at least, you have an implicit copyright on anything you publish on the web.

        I don't have a problem with people using information I've put out there, at least for non-commercial purposes. I don't even mind commercial use, if I'm given credit/citation. What I don't like is when people represent my writing or my images as their own - which, while illegal, has happened on multiple occasions (word for word, image for image, copies of entire web pages I made).

  • You can opt-out of me stealing your stuff. If you tell me nothing, I'm going to take it. That doesn't sound right, does it?
  • I said, take your dick out first. Thanks.
  • Just as previous wars have been fought over the rights of people considered "not fully human" due to their race or religion, we will soon see an actual war over ai rights. Websites already ban Firefox and niche browsers like Pale Moon from their content, soon we will be having sites only allowing web integrity Chrome and having a web cam pointed at you to prove that a human is reading it.
  • This opt-out model is obviously absurd:
    1. legitimate sites create original content and add robots.txt specs to opt-out of AI training
    2. web-scrapers copy content and post to a numerous ad-laden sites, sans robots.txt
    3. Google/OpenAI consider themselves free to use the content from those sites without limitations.

    It's just data-laundering. It gives the illusion of good, corporate citizens while simultaneously incentivizing Google/OpenAI/whomever to *not* identify the original source of data.

    • No, opt-out is the right choice: you put your content on a public website, that means anyone can use it, including bots. If you don't want it read, either don't make the site public or use robots.txt.

If you steal from one author it's plagiarism; if you steal from many it's research. -- Wilson Mizner

Working...